Before doing anything, please pip install -r requirements.txt
The .txt files in corruption annotated files/ folder are the original news articles. The .ann files in corruption annotated files/ folder are the human annotations, the gold standard.
Please run python for producing the pipelined annotation files. After running, .ann.machine files will show up in corruption annotated data. These are the machine annotations.
Please run python evaluation for evaluating the precision, recall scores for the machine annotations vs gold standard.
To train the CRF model for NER of certain fields, we need to first produce the training files. This can be done by python
Then to run the CRF training, do python test 400 4, the 400 here specifies number of files used for training. and were used to produce a logistic classficier for the time_tags. is how we connect to BosonNLP
Other files are of lesser importance, but we are happy to explain if necessary.
864Final.pptx is our presentation on Thursday.
Best Wishes, Shidan & Tingtao
