Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Words missing in US #181

Closed
marcoagpinto opened this issue Jun 8, 2017 · 7 comments
Closed

Words missing in US #181

marcoagpinto opened this issue Jun 8, 2017 · 7 comments

Comments

@marcoagpinto
Copy link

Hello @kevina

This will probably be the last time I suggest words here because in the past I have suggested many hundreds or even thousands and only a few have been added.

During these three-four years as the maintainer of the forked GB speller, I have managed to add ~29K words to it.

The words I am suggesting here aren't from my wordlist, but from LanguageTool's list of missing words.

I still haven't tested them in my GB speller because I decided to test them first in your US speller.

Here are the words:
a posteriori
a priori
ab initio
abaxial
ad hoc
ad infinitum
ad nauseam
adjoint
aggregator
aggregators
airframe
airframes
alimentation
amidst
amine
amines
amongst
annunciate
annunciated
annunciates
annunciating
annunciator
annunciators
ansatz
antihermitian
antilinear
antineutrino
antineutrinos
antiproton
antiprotons
archetypical
assistive
au fait
automata
automorphism
autosave
backlighting
Belarusian
bellcrank
bellcranks
bilinear
bisectional
bisectionally
boomset
boomsets
borate
borates
borescope
borescopes
bosonic
breadthwise
Brexit
bulleted
callout
callouts
cancellable
candela
candelas
cavitation
cavitations
CD-ROM
CDs
Cherkasy
Chernihiv
Chernivtsi
chirality
clinometer
clinometers
codec
codecs
collet
collets
collider
colliders
colorable
combustor
combustors
compensator
compensators
complicatedness
computable
conditionality
construal
contra
contravariant
copywriting
counterstroke
counterstrokes
countlessly
criticality
crosshair
crosshairs
crosshead
crossheads
crossways
crowdsourcing
cuprous
decrement
decrementing
de-energize
de-energized
de-energizes
de-energizing
deficiently
degrease
degreased
degreases
degreasing
déjà vu
deliverability
diagonalization
dichroic
differentiator
differentiators
diffusivity
diluent
diluents
discharger
dischargers
discretionally
discretization
dizzyingly
Dnipro
doubler
doublers
Doxygen
droopily
Drupal
durableness
DVDs
efflux
eigenbasis
eigenspace
eigensystem
enduringly
entrain
entrained
entraining
entrains
et al
etcetera
eukaryote
eukaryotes
eukaryotic
evaluator
evaluators
ex officio
exempli gratia
ex-officio
exoplanetary
extractive
eyewear
failover
failovers
fait accompli
falcate
fermionic
fiancé
fiancée
fiancées
fiancés
filmlike
findability
findable
fittingness
formulae
gappy
glitchier
glitchiest
glitchy
gluons
glycol
good-natured
graphene
grayscale
handgrip
handgrips
haymaker
haymakers
headwear
helicity
heliocentrism
hereunder
Hermitian
heteromerous
higgledy-piggledy
highflier
highfliers
hydrophilic
hypergeometric
iBook
iBooks
igniter
igniters
illiquid
in toto
indistinguishability
inflowing
influencer
influencers
informatics
ingoing
inter alia
interlay
interlayed
interlaying
interlays
involvedness
ipso facto
irrespectively
isometry
Ivano-Frankivsk
Javadoc
jQuery
ketone
ketones
Kharkiv
Khmelnytskyi
Kremenchuk
Kropyvnytskyi
Kyiv
Laplacian
liftable
limitlessly
linctus
livelily
Luhansk
lumen
lumens
luminaire
luminaires
Lutsk
lux
luxes
Lviv
macaron
meddlesomeness
mediagenic
megohm
megohms
metameric
methodize
methodized
methodizes
methodizing
microswitch
microswitches
millennialist
minima
Minkowski
misalign
misaligning
misaligns
misrigged
mitzvah
mitzvahs
modularized
modus operandi
modus vivendi
Molière
muchly
multichannel
multifunction
multipart
mutagenic
mutatis mutandis
naïve
née
nitric
nitrosamine
nitrosamines
nitty-gritty
nota bene
Oculus
Odesa
oilfree
oilpaper
oilpapers
onboard
open-mindedly
operationalize
operationalized
operationalizes
operationalizing
orthonormal
outsized
overarch
overarched
overarches
overinflate
overinflated
overinflates
overinflating
oversized
overtighten
overtightened
overtightening
overtightens
oxidative
pacta sunt servanda
pari passu
passivate
passivated
passivates
passivating
penetrant
penetrants
pentane
pentanes
per annum
per capita
per centum
per diem
perplexingly
phonons
pictogram
pictograms
pitot
pitots
Poincaré
potentiometer
potentiometers
potlife
potlives
ppb
prearrival
predefine
predefines
predefining
preowned
preventability
preventively
prima facie
quadrennially
quinquennial
quinquennially
quod vide
radian
radians
radome
radomes
raison d'être
reasonability
reckonable
reclaimer
reclaimers
regenerator
regenerators
regrease
regreased
regreases
regreasing
reified
reifies
reify
reifying
reinflate
reinflated
reinflates
reinflating
reinstallation
repairability
repeatability
reposition
repositioned
repositioning
repositions
repressurize
repressurized
repressurizes
repressurizing
reseller
resellers
resistivity
reticle
reticles
retractive
retractives
retractor
retractors
retrim
retrimmed
retrimming
retrims
retune
retuned
retunes
retuning
reverser
reversers
ringfence
ringfenced
ringfences
ringfencing
Rivne
robotically
rôle
rôles
safetied
safetying
Schengen
scintillator
scintillators
screencast
screencasts
screenful
screenfuls
screwjack
screwjacks
searchable
Segway
Segways
self-adjoint
semiautomatically
semilinear
seriatim
sesquilinear
shitake
sideslip
sideslipped
sideslipping
sideslips
siemens
sievert
sieverts
single-handedly
slantways
spacetime
spanwise
speedbrake
speedbrakes
spick
spicks
spinless
spinor
spinors
splined
statin
statins
steradian
steradians
strategize
strategized
strategizes
strategizing
subassemblies
subassembly
subcritical
subcriticality
subfolder
subfolders
submenu
submenus
subpart
subparts
subtextual
subvocalization
sui generis
sui juris
summa cum laude
superactivity
supercritical
supercriticality
supermicron
supersensitive
supersymmetric
supersymmetry
suppressive
suppressively
surreally
switchgear
Symbian
tabula rasa
taskbar
taskbars
teensy-weensy
telematics
tensiometer
tensiometers
termbase
termbases
Ternopil
tesla
thereunder
timeous
timeously
tooltip
tooltips
touchpoint
touchpoints
traceability
transitorily
translatability
trichloroethane
trichloroethylene
ullage
ullages
ultracold
ultrafine
ultrasensitive
unachievable
unbefitting
unclip
unclipped
unclipping
unclips
unclutter
uncluttering
unclutters
uncomplicatedly
underheat
underheated
underheating
underheats
underinflate
underinflated
underinflates
underinflating
underprovide
underprovided
underprovides
underproviding
undersize
undersupplied
undersupplies
undersupply
undersupplying
unencrypted
unendorsed
unenlightening
unflawed
unforthcoming
unfussy
unhidden
unideal
unimolecular
uninsulated
uninvolved
unlinked
unlockable
unmethodical
unordered
unpopulated
unproblematic
unpunctual
unquantifiable
unrevealing
unrushed
unserviceable
unsymmetrically
unviable
usualness
utile
veraciousness
verboseness
verifiably
victimologies
victimology
viewport
viewports
Vinnytsia
vis-à-vis
volumetric
walkie-talkie
walkie-talkies
walkthrough
walkthroughs
waterless
waypoint
waypoints
weathertight
weber
webers
wherefrom
whereinto
wherethrough
whereunto
wirelessly
worksite
worksites
workspace
workspaces
workwear
XMetaL
XMLmind
Zaporizhia
Zhytomyr

@biljir
Copy link
Collaborator

biljir commented Jun 9, 2017

I thought I would jump in and discuss some of these words before Kevin asked me for comment. I will group some words that present the same issues together.

First of all, I am not sure of the status of support in the software for phrasal words. If such support is present, then the following should certainly (IMHO) be accepted:
a posteriori, a priori, ad hoc, ad infinitium, ad nauseam,déjà vu (but without accents), et al, fait accompli, in toto, ipso facto, modus operandi, per annum, per capita, per diem, prima facie, raison d'être (perhaps without accent), sui generis, summa cum laude.

Similarly, if hyphenated words are processed, the following words are worthy:
CD-ROM, good-natured, higgledy-piggledy, nitty-gritty, single-handedly, teensy-weensy, vis-à-vis (without the accent), walkie-talkie.

The list includes a number of words with accented letters, such as fiancé, naïve, née and rôle. In most cases, these represent old-fashioned fussy spellings of words commonly spelled without accents, and I would not recommend adding them. Exceptions are fiancé and fiancée and their plurals - where I think the accented spellings are more common, perhaps because these words are more likely to be misinterpreted without the accents. (fiance looks a lot like a misspelling of finance).

amidst and amongst may have been left out due to being regarded as British forms of amid and among. Both of these words are not uncommon in American writing, especially amidst, and I suggest both should be accepted.

The following names are old and seldom used variants of Polish, Russian and Ukrainian place names. I would not choose to recognize them: Dnipro, Kharkiv, Kyiv, Lviv, Odesa. There may be other such names I don't recognize.

Other words of interest with commentary:

automata - plural of automaton. Given Kevin's habit of accepting computer-sciencey words, I assume he will add this one.
Brexit - kind of a judgment call. This word is used a lot now, but will it continue to have high usage? I think the answer is Yes, and it should be added.
crosshairs - Yes. But I'm not sure how valid the putative singular crosshair is. I would be hesitant about adding it.
gluons - If the singular gluon is accepted, I can't imagine why the plural would not be. And if neither is, I think they should be added.
minima - scientific plural of minimum. If maxima is also not present, it should be added.
shitake - should not be added, variant of shiitake
spacetime - should not be added, variant of space-time
spick - should not be added, variant of spic. Used with a non-perjorative meaning in the phrase "spick and span" however. In no case should spicks be added.

Other words I think it would be good to add, with no commentary:
archetypical, assistive, Belarusian, bulleted, CDs, collider(s), DVDs, formulae, haymaker(s), overinflate(d/ing/s), oversized, perplexingly, pictogram(s), preowned, radian(s), reinflate(d/ing/s), reseller(s), Segway(s), unachievable, unbefitting, unclutter(ed/ing/s), underinflate(d/ing/s), unenlightening, uninvolved, unordered, unpopulated, unrevealing, worksite(s).

@kevina
Copy link
Member

kevina commented Jun 11, 2017

@marcoagpinto

This will probably be the last time I suggest words here because in the past I have suggested many hundreds or even thousands and only a few have been added.

Most of your words are simply not common enough to be included in a spell checker dictionary. I consider many of your additions at level 80 in scowl, which means they are suitable for things like word games but not for a speller dictionary. Please do consider to suggest. I might not get to the list right away as I am often busy with other things, but I will eventually get to them with @biljir help.

@marcoagpinto
Copy link
Author

@kevina

Thanks, Kevin.

I was kind of stressed when I opened this case.

I will continue posting words then, once a month.

Kind regards,

@kevina kevina modified the milestone: 2017-08 Release Jul 27, 2017
@kevina
Copy link
Member

kevina commented Jul 29, 2017

@biljir A few notes on your suggestions that I am not likely to add unless you can give me a good reason to do so:

The words "formulae" and "oversized" are marked as variants.

As a verb the words "underinflate" and "overinflate" are not really common, I will add the more common adjective forms though (underinflated and overinflated).

The following other words I also don't seam that common: perplexingly unbefitting unenlightening

@biljir
Copy link
Collaborator

biljir commented Jul 29, 2017

formulae is the standard plural of formula in technical/scientific writing. It really should be accepted (as should minima and automata, to which you offered no objection). (And formulas should also be accepted - they are each preferable in their own domain, like antennas and antennae, both of which I hope you accept.)

I think you're mistaken about perplexingly, but I have no "good reason" beyond my own intuition.

@kevina
Copy link
Member

kevina commented Jul 29, 2017

@biljir thanks I will add "formulae" and "perplexingly"

kevina added a commit that referenced this issue Jul 31, 2017
kevina added a commit that referenced this issue Aug 1, 2017
kevina added a commit that referenced this issue Aug 23, 2017
@kevina
Copy link
Member

kevina commented Aug 23, 2017

Concerning accents words, they are not included in en_US but are in some other dictionaries. The dictionaries that include them already have fiancé and fiancée.

Hyphenated words and word phrases are currently filtered out. In the future I may include ones where one of the words is not normally considered a word or is very uncommon on its own.

I am closing this issue, if there is anything I missed fell free to reopen.

@kevina kevina closed this as completed Aug 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants