GSOC 2014 Proposal
Clone this wiki locally
A user can upload their Anki deck and Tatoeba can use that data to generate a list of cards to add based on i+1 and other requirements. Export to Anki
Name: Jake Probst
IRC: tmjake (freenode)
Timezone: PDT (UTC-7)
Languages: English, learning Japanese
I intend to write a web application that will accept an Anki deck from a user and compare it against the sentence database to find sentences where the user will know exactly one new word. The idea is based on the Input Hypothesis. It will use named entity recognition and stemming to find appropriate sentences. The user will be able to specify tags or certain words they want in the sentences. Then it will compile these sentences into an .apkg (Anki`s file format for importing decks) and send it to the user.
It will be separated into three parts: libiplusone, iplusone web service, and the webfacing interface.
libiplusone will be written in C.
The iplusone web service will be in either C++ or Python, depending on which language ends up being used in the final website.
As web development is not my strong point, the link between the web service and the actual website is something I will need help with.
Realistically, I`m not going to have support for every language in this by the time GSOC ends. I`ll have to keep adding languages as I learn how to deal with them. I figure I will be able to support all the latin character based languages. I will need to read more on language processing before I can answer how many I will be able to do exactly.
libiplusone will be designed to make adding new languages easy with dynamically loaded plugins.
On the web front, it will show a person X number of sentences, they can choose which sentences they want to add and it will create an .apkg for them.
Before I start this, I will need to familiarize myself with django and cppcms. I will also need to read a lot about natural language processing.
Week 1: figure out structure of libiplusone, code .anki/.apkg format import/export
Week 2: start coding algorithm to handle i+1
Week 3: implement named entity recognition, stemming, and phrase detection
Week 4: continue implementing the above
Week 5: work on optimization
Week 6: figure out structure of iplusone
Week 7: code basic structure
Week 8: code in various options (clozes, specific tags/words, etc)
Week 9: finish up iplusone code
Week 10: connect iplusone and tatoeba
Week 11: Buffer for unplanned problems
Week 12: Buffer for unplanned problems
If I finish early, I will add as many languages to libiplusone as I can.
If I finish late, PANIC.
I will be taking a class or two over the summer, but they shouldn`t get in the way of development. Also I will be gone the last week of May.
I have not used Tatoeba much, only to double check certain bits of grammar. I decided to work with Tatoeba because when I was first starting to learn Japanese, this is the exact sort of thing I wanted to exist. My skills and interests are mainly in Flashcards (I even wrote an SRS application for the Nintendo DS).
This is the only gsoc project I am applying for.