The Language Learning Toolkit (LLTK) performs a variety of tasks useful for (human) language learning.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lltk
requirements
tests
.gitignore
.travis.yml
LICENSE.txt
README.md
setup.py

README.md

Language Learning Toolkit

Build Status PyPi version PyPi license

The Language Learning Toolkit combines several approaches such as natural language processing and web scraping to perform a variety of tasks useful for (human) language learning. This includes:

  • Part-of-speech tagging (POS) supported by Pattern
  • Phonetic transcriptions in accordance with the International Phonetic Alphabet (IPA)
  • Audiosamples (Forvo, Google Translate)
  • Textsamples/Sample sentences (Tatoeba)
  • Visual representations of a given word using Google Images
  • Conjugation of verbs (Present, Perfect, Past, Pluperfect, Future) supported by Verbix
  • Pluralization of nouns (accuracy depending on the language)
  • Indefinite and definite articles for nouns (accuracy depending on the language)
  • Comparative and superlative for adjectives
  • Basic gender detection for nouns

General information

Everything inside LLTK is split up into different modules, allowing for a maximum of flexibility and interchangeability. In fact, each language is a module for itself. When calling a language-specific function, you can choose between addressing the module directly (e.g. lltk.nl.plural('hond')), or using the generic interface (e.g. lltk.generic.plural('nl', 'hond')). Both calls will pass down the request to an appropriate scraper and can be considered equivalent.

To get a quick overview of LLTK's syntax, launch IPython, import lltk and start browsing using tab completion. If you want, you can enable the debug mode by setting lltk.config['debug'] = True.

Examples

The syntax should be pretty straightforward and intuitive. Nevertheless, you might want to have a look at the following examples:

  • IPA: lltk.generic.ipa('de', 'Blume') returns a list of possible IPA writings or None.
  • Pluralization: lltk.generic.plural('nl', 'boom') returns a list of plural forms or None.

Some scrapers know when there's no plural form of a given word. They will return [''].

  • Definite/Indefinite articles: lltk.generic.articles('de', 'Katze') returns a list of lists of valid articles (singular and plural). Have a look at lltk.generic.reference as well.

When using the generic interface, LLTK will raise the NotImplementedError exception if the desired functionality is not available in your target language.

  • For conjugation of verbs, try the following:
lltk.generic.conjugate('de', 'bauen', 'present')
lltk.generic.conjugate('de', 'bauen', 'past')
lltk.generic.conjugate('de', 'bauen', 'perfect')
  • If you want to listen to audio samples, register at Forvo and get your API key. Then paste:
urls = lltk.generic.audiosamples('it', 'mela', key = '---')
lltk.helpers.download(urls[0], '/tmp/audiosample-it-mela.mp3')
lltk.helpers.play('/tmp/audiosample-it-mela.mp3')
  • To see a word used in context, request sample sentences (currently using Tatoeba). Try:
sentences = lltk.generic.textsamples('es', u'jardín')
for sentence in sentences:
	print sentence
  • View images related to a given word (currently using Google Images). Try the following:
photos = lltk.generic.images('fr', u'souris')
clipart = lltk.generic.images('fr', u'souris', itype = 'clipart', isize = 'large')
lineart = lltk.generic.images('fr', u'souris', itype = 'lineart', isize = 'small')

Requirements

The Language Learning Toolkit is written for Python 2.7. There is no support for Python 3, yet. Please install the following Python packages: requests, lxml, Pattern, functools32. You can do that by running:

sudo pip install -r requirements/base.txt

Furthermore, we highly encourage you to install CouchDB for caching. If you are a developer, you should probably install everything from base.txt, extra.txt and development.txt.

License

GNU Lesser General Public License (LGPL), see LICENSE.txt for further details.