Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xt/aspell.t - aspell issue #975

Closed
coke opened this issue Oct 19, 2016 · 10 comments
Closed

xt/aspell.t - aspell issue #975

coke opened this issue Oct 19, 2016 · 10 comments
Labels
xt Regarding current or new xt/ tests or the utils/

Comments

@coke
Copy link
Collaborator

coke commented Oct 19, 2016

aspell is convinced that doc/Language/grammars.pod6 has an spelling mistake here:

say +DigitMatcher.subparse: '12७१७९०९', args => \(:full-unicode);

and doc/Language/operators.pod6 with:

                  Z      <EB     PB     TB     GB     MB     KB> -> [\v,\suffix] {

and reports 'ull', and 'ffix' as mispelled words. For now, we're skipping them to silence the spell checker, but we should fix our aspell invocation to avoid the issue and remove them from the skiplist. (currently on spellcheck branch, moving to master shortly)

@coke coke added xt Regarding current or new xt/ tests or the utils/ and removed build labels Jul 5, 2017
@AlexDaniel
Copy link
Member

Sorry, this is confusing. What's the current status?

@coke
Copy link
Collaborator Author

coke commented Mar 3, 2018

No change. We have words whitelisted in xt/ that don’t need to be because of the bug in the test.

@JJ
Copy link
Contributor

JJ commented May 6, 2018

Do we need to use aspell? I have used hunspell, and it works pretty well. Will have to check options, to see if it can be fixed using aspell.

@JJ JJ added the JJ TPF Grant label May 6, 2018
@JJ JJ removed the JJ TPF Grant label May 14, 2018
JJ added a commit that referenced this issue May 26, 2018
And improves docs of aspell.t trying to deal with #975
@JJ
Copy link
Contributor

JJ commented May 26, 2018

I don't think aspell is developed any more. There are lots of issues, including this one, which is clearly a bug. I have checked that particular line with hunspell, and there's no problem with it... However there are lots of problems with lots of other words which are not in the hunspell dictionary, apparently. Maybe it's not using the local dictionary; I'll have to check that.

@JJ
Copy link
Contributor

JJ commented May 26, 2018

Since we are "awk"'ing the text anyway, we could maybe awk \ and : out of the text...

@JJ
Copy link
Contributor

JJ commented May 26, 2018

I see that what is being done now as a workaround is simply add those words to the personal dictionary.

@JJ JJ closed this as completed in 6a69c0b May 26, 2018
cfa added a commit that referenced this issue Feb 7, 2019
This has been broken since 6a69c0b (May 2018); awk's `gsub` returns
the replacement count, not the substituted string, so we were
spellchecking integers rather than doc content.

This commit also drops deletion of `:` to prevent numerous false
positives from type smilies.  Backslash is still stripped per #975.
@cfa
Copy link
Contributor

cfa commented Feb 20, 2019

I've been looking into the issue reported by @coke in #2615. It seems that the current awk invocation still isn't stripping backslashes because we're using single quotes (escaping q) rather than literals (Q). Namely, this line,

my $fixer = Proc::Async.new('awk', 'BEGIN {print "!"} {gsub("\\\\","",$0); print "^" $0}', $input-file);

ought to read

my $fixer = Proc::Async.new('awk', BEGIN {print "!"} {gsub("\\\\","",$0); print "^" $0}, $input-file);

otherwise we're feeding gsub \\, not \\\\ as intended. While this change addresses the ffix case mentioned at the top of this issue, it also yields a whole bunch of false positives. For example:

# & footbar 12 9: foo tbar, foo-tbar, foot bar, foot-bar, fotbar, foobar, fooobar, foooobar, football, footwear, foober, footer
# & footbaz 15 20: foo tbaz, foo-tbaz, foot baz, foot-baz, foobaz, football, fotbar, foots, tbaz, foot's, footballs, footrace, footers, footsie, football's
# & footbar 12 9: foo tbar, foo-tbar, foot bar, foot-bar, fotbar, foobar, fooobar, foooobar, football, footwear, foober, footer
# & footbar 12 71: foo tbar, foo-tbar, foot bar, foot-bar, fotbar, foobar, fooobar, foooobar, football, footwear, foober, footer

Those are from foo\tbar and foo\tbaz. These oughtn't be whitelisted either—we need a better method for handling \. One approach would be to pad escapes: foo\tbarfoo \t bar; unfortunately, this yields FPs for parameters like \exception (flagged because of \e xception).

Thoughts?

@cfa cfa reopened this Feb 20, 2019
@cfa
Copy link
Contributor

cfa commented Feb 20, 2019

Also, I believe \suffix is being caught as ffix because aspell is in nroff-mode:

$ echo '\suffix' | aspell -a -l en_US --ignore-case
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.6.1)
& ffix 24 3: ff ix, ff-ix, fix, affix, Fox, fox, Dix, faux, fax, Felix, flux, fig, lix, mix, nix, pix, six, xix, suffix, flax, flex, fix's, FBI's, Fri's

$ echo '\suffix' | aspell -a -l en_US --ignore-case --mode=nroff
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.6.1)
& ffix 24 3: ff ix, ff-ix, fix, affix, Fox, fox, Dix, faux, fax, Felix, flux, fig, lix, mix, nix, pix, six, xix, suffix, flax, flex, fix's, FBI's, Fri's

$ echo '\suffix' | aspell -a -l en_US --ignore-case --mode=none
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.6.1)
*

We probably want --mode=none, though that yields a number of FPs (words with embedded escapes, no longer ignored by aspell). Alternatively, we can continue ignoring some of these FPs by leveraging nroff mode—ffix would then need to be explicitly ignored.

@cfa
Copy link
Contributor

cfa commented Feb 20, 2019

Gave this some more thought; the above PR is the best I can muster at the moment.

@cfa cfa mentioned this issue Mar 6, 2019
coke added a commit that referenced this issue Mar 11, 2019
Workaround for aspell backslash escaping issues (#975).
@coke coke closed this as completed Mar 11, 2019
@coke coke reopened this Mar 11, 2019
@coke coke closed this as completed Mar 11, 2019
@JJ
Copy link
Contributor

JJ commented Mar 12, 2019

Why all the opening and closing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
xt Regarding current or new xt/ tests or the utils/
Projects
None yet
Development

No branches or pull requests

4 participants