08/16: TF-IDF python https://labs.yodas.com/large-scale-matrix-multiplication-with-pyspark-or-how-to-match-two-large-datasets-of-company-1be4b1b2871e#.fjvd7q5zg
http://www.nltk.org/book/ch05.html
"Grammatical category disambiguation by statistical optimization": http://delivery.acm.org/10.1145/50000/49087/p31-derose.pdf?ip=108.215.223.98&id=49087&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&CFID=767732212&CFTOKEN=29300192&__acm__=1459835453_c56c0b8a7d28d9f65919cc629d8231aa
POS Tree Bank https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html