CMU LTI Low Resource NLP Bootcamp 2020
This is a page for a low-resource natural language and speech processing bootcamp held by the Carnegie Mellon University Language Technologies Institute in May 2020. The bootcamp was held virtually for some visitors to the institute, but we are making the videos and materials available for those interested in learning on your own. It comes in 8 parts, all with lecture videos and example exercises that you can do to expand your knowledge.
1. NLP Tasks
The exercise has participants download spaCy and see the types of linguistic outputs generated in its tutorial. We also examined the Universal Dependencies Treebank to see the various other languages that have annotated data such as that generated by spaCy's analysis.
2. Linguistics - Phonology and Morphology
3. Machine Translation
4. Linguistics - Syntax and Morphosyntax
The exercise consists of creating an interlinear gloss for the language of your choice.
5. Neural Representation Learning
6. Multilingual NLP
The exercise, by Chan Park, provides two Jupyter noteboks that explain how to train a Naive Bayes Classifier for classification across languages, and introduces how to use multilingual BERT for cross-lingual classification.