Skip to content
Permalink
Browse files

[en] Updafe en_GB spellcheck dictionary to 2019.09.01 (v2.76)

  • Loading branch information...
yakovru committed Sep 4, 2019
1 parent fe957b2 commit 01a78379741652f562e92f4ee2879718b38448b4
@@ -5,17 +5,18 @@ LGPL licence.

It has been extensively updated by David Bartlett, Brian Kelk,
Andrew Brown and Marco A.G.Pinto:
— Numerous Americanisms have been removed;
— Numerous American spellings have been corrected;
— Numerous Americanisms/spellings have been removed;
— Missing words have been added;
— Many errors have been corrected;
— Compound hyphenated words have been added where appropriate;
— Thousands of proper/places names have been added;
— Thousands of possessives have been added to nouns and proper names.
— Thousands of possessives have been added;
— Thousands of plurals have been added;
— Over two thousand duplicates have been removed.

Valuable inputs to this process were received from many other
people — far too numerous to name. Serious thanks to you all
for your greatly appreciated help.
people — far too numerous to name. Serious thanks to all for
your greatly appreciated help.

This wordlist is intended to be a good representation of
current modern British English and thus it should be a good
@@ -57,22 +58,22 @@ OOo Issue 63541 — remove *dessicated
2010-03-09 (nemeth AT OOo)
— UTF-8 encoded dictionary:
— Fix em-dash problem of OOo 3.2 by BREAK
— Suggesting words with typographical apostrophes
— Recognising words with Unicode f ligatures
— Add phonetic suggestion (© 2000 Björn Jacke)
— Suggesting words with typographical apostrophes
— Recognising words with Unicode f ligatures
— Add phonetic suggestion (© 2000 Björn Jacke)
2013-08-25 — GB forked by Marco A.G.Pinto
2016-06-10 — NOSUGGEST added to this clean version of the GB .AFF (Marco A.G.Pinto)
2016-06-21 — COMPOUNDING added to this clean version of the GB .AFF (Áron Budea)
2016-08-01 — GB changelog is no longer included in the README file
2016-09-11 — .AFF + .DIC now use Linux line endings
2017-10-08 — Mozilla: used <em:maxVersion>*</em:maxVersion> to work with all future versions
except Thunderbird
2016-09-11 — .AFF + .DIC now use UNIX line endings
2017-10-08 — Mozilla: used <em:maxVersion>*</em:maxVersion> to work with all future
versions, except Thunderbird
2017-12-16 — Added to the .AFF:
ICONV 1
ICONV ’ '
Thanks to Jeroen Ooms
ICONV 1
ICONV ’ '
Thanks to Jeroen Ooms
2018-05-01 — Andrew Ziem suggested a list of 328 names of famous people on Kevin's GitHub:
"These 328 name tokens were derived from the top 100 lists in Google Trends via
"These 328 name tokens were derived from the top 100 lists in Google Trends via
this repository (https://github.com/az0/google-trend-names). The geography was
set to US, and it spanned dates from 2004 to 2018."
2018-08-01 — Slightly higher quality icon
@@ -85,7 +86,7 @@ to
My scientist friend, Peter McGavin, told me that in NZ they use British, so I decided
to do something about it. I did the same for UK. I searched on Wikipedia for "towns",
"counties", "villages", "boroughs", "suburbs", etc. and based me on:
 — https://en.wikipedia.org/wiki/List_of_towns_in_England;
— https://en.wikipedia.org/wiki/List_of_towns_in_England;
  — https://en.wikipedia.org/wiki/List_of_towns_in_New_Zealand;
  — https://en.wikipedia.org/wiki/List_of_civil_parishes_in_England;
  — https://en.wikipedia.org/wiki/List_of_civil_parishes_in_Scotland;
@@ -101,21 +102,32 @@ to
© The Clergy of the Church of England Database Project, 2005.
2018-10-01 — Added the cities from Australia by population:
 — https://en.wikipedia.org/wiki/List_of_cities_in_Australia_by_population
— Added tons of cities from the US with a 10 000+ population.
This list was supplied by Michael Holroyd on Kevin Atkinson's GitHub.
— Added tons of possessives to nouns, thanks to Jörg Knobloch.
— Added tons of cities from the US with a 10 000+ population.
This list was supplied by Michael Holroyd on Kevin Atkinson's GitHub.
— Added tons of possessives to nouns, thanks to Jörg Knobloch.
2018-12-01 — Added the cities from Canada:
 — https://en.wikipedia.org/wiki/List_of_cities_in_Canada
2019-02-01 — Improved flag "5" thanks to the GitHub user Ding-adong:
Some "swomen's" and "women's" entries were missing.
Some "swomen's" and "women's" entries were missing.
— Fixed flag "3": -ists, -ists, -ist's → -ist, -ists, -ist's.
— Improved flag "N".
2019-03-01 — Added the LGPL_V3 License .txt into the Extension.
— Ding-adong added a flag "=" for suffixes: -lessness, -lessnesses, -lessness's.
— Ding-adong added a flag "=" for suffixes: -lessness, -lessnesses, -lessness's.
— Ding-adong changed the prefix flag "O" to "^" since "O" was both prefix and suffix.
— Small fixes and enhancements on flags "z" and "O" by Ding-adong.
2019-04-01 — Improved flag "P" thanks to the GitHub user Ding-adong, giving also -nesses which
increased the wordlist in about 1800 valid words.
2019-04-01 — Improved flag "P" thanks to Ding-adong, giving also -nesses which
increased the wordlist in ~1800 valid words.
2019-07-01 — Major cleanup of the .dic by removing hundreds of duplicates, merging flags, adding
possessives and plurals.
2019-08-01 — Major cleanup of the .dic by removing hundreds of duplicates, merging flags, adding
possessives and plurals.
— Improved flags: "i", "n", "N", "O", "W", "Z" and "2":
— Flag "2" increased the wordlist in ~400 valid words;
— Flag "i" increased the wordlist in ~200 valid words;
— Flag "n" increased the wordlist in ~1000 valid words.
2019-09-01 — Major cleanup of the .dic by removing hundreds of duplicates, merging flags, adding
possessives and plurals.
— Improved flags: "O", "W", "Z" and "3".
-------

MARCO A.G.PINTO:
@@ -131,7 +143,7 @@ The sources used to verify the spelling of the words I included in the dictionar
2) Collins Dictionary;
3) Macmillan Dictionary;
4) Cambridge Dictionary;
5) Merriam-Webster Dictionary (used with caution ⚠);
5) Merriam-Webster Dictionary (used with caution ⚠);
6) Wiktionary (used with caution ⚠);
7) Wikipedia (used with caution ⚠);
8) Physical dictionaries.
Binary file not shown.
@@ -4,14 +4,15 @@
#
# Sources used to verify the spelling of the words which
# Marco Pinto included in the dictionary:
# 1) Oxford Dictionaries; 4) Wiktionary (used with caution);
# 2) Collins Dictionary; 5) Wikipedia (used with caution);
# 3) Macmillan Dictionary; 6) Physical dictionaries.
# 1) Oxford Dictionaries; 5) Merriam-Webster Dictionary (used with caution ⚠);
# 2) Collins Dictionary; 6) Wiktionary (used with caution ⚠);
# 3) Macmillan Dictionary; 7) Wikipedia (used with caution ⚠);
# 4) Cambridge Dictionary; 8) Physical dictionaries.
#
# Main difficulties developing this dictionary:
# 1) Proper names;
# 2) Possessive forms;
# 3) Plurals.
#
# David Bartlett, Andrew Brown, Marco A.G.Pinto.
# V 2.73, 2019-06-01
# V 2.76, 2019-09-01
@@ -3,5 +3,5 @@
# Get en_us_wordlist.xml from https://github.com/mozilla-b2g/gaia/raw/master/apps/keyboard/js/imes/latin/dictionaries/en_gb_wordlist.xml

./unmunch en-GB.dic en-GB.aff > en_GB1.txt
cat en_GB1.txt | java -cp languagetool.jar:languagetool-dev-4.6-SNAPSHOT.jar org.languagetool.dev.archive.WordTokenizer en | sort -u > en_GB.txt
cat en_GB1.txt | java -cp languagetool.jar:languagetool-dev-4.7-SNAPSHOT.jar org.languagetool.dev.archive.WordTokenizer en | sort -u > en_GB.txt
java -cp languagetool.jar org.languagetool.tools.SpellDictionaryBuilder -i en_GB.txt -info en_GB.info -freq en_gb_wordlist.xml -o en_GB.dict

0 comments on commit 01a7837

Please sign in to comment.
You can’t perform that action at this time.