All the data (some superfulous) for the project is contained within the data folder. Including formatting rules for raw. All models and testing scripts are in the current working directory, as well as web crawling code to get the scripts from the web source.
run trainTestLangModel.py for demo. Python Version 3.5