This crawler was written to parse Wiktionary pages (which tend to be a mess, sadly) into the speling format, which can be used by programs which require these wordlists.
$ sudo pip install urlnorm
Depending on your language, you may need to install more dependencies.
Here are the list of language specific dependencies:
- zh (Chinese, simplified and traditional):
sudo pip install mafan BeautifulSoup4
- th (Thai):
sudo pip install BeautifulSoup4
- lo (Lao):
sudo pip install BeautifulSoup4
$ python main.py
That's all you have to do. All configuration is done in config.py
.
Refer to General config for more details.
Refer to How it works for more details.