This repo currently contains the code for some machine learning tutorials (with emphasis on language processing).
NB: the introduction to distributional semantics, DSTutorial, is very out-of-date. Use it at your own risk!
Content of the repo
- Agreement: No machine learning without quality data. Learn about Cohen's kappa by doing some annotation yourself.
- Authorship: Naive Bayes. Who wrote this? Run some authorship attribution code.
- FruitFly: Dimensionality reduction. Experiment with a fruit fly neural architecture for hashing vectors.
- RLcafe: Reinforcement learning. Learn politeness rules to talk to your local café owner.
- RNNCats: Recurrent Neural Networks. Are you fed up with generating Shakespeare language models? Try those awesome ASCII cats!
- SVMs: Support Vector Machines. Classify web documents using representations from the PeARS search engine.
- Translation: Regression. Linguistically challenged? Learn a bilingual lexicon from an English and a Catalan semantic space.
Instructions for running the code are in the README of each directory. I do not guarantee the code will run under all python installations and all OSs! If you want to be sure things will run smoothly, open an account with the fantastic pythonanywhere people and try the code under Python 3.5. It's been tested there!