Twitter named entity extraction for WNUT 2016 http://noisy-text.github.io/2016/ner-shared-task.html
Clone or download
Permalink
Failed to load latest commit information.
NoisyNLP Added new experiments, features and figures Oct 31, 2016
data Added 400 d pre-trained vecs Apr 24, 2017
word_clusters Submitted paper WNUT NER 2016 Oct 3, 2016
word_clusters_wv Submitted paper WNUT NER 2016 Oct 3, 2016
.gitattributes Restaged all files Sep 17, 2016
.gitignore Added 400 d pre-trained vecs Apr 24, 2017
COLING2016-WNUT-Model-Architechture.png Added Image Dec 13, 2016
CRFSuite.ipynb Restaged all files Sep 17, 2016
Data Generation - Weka.ipynb Restaged all files Sep 17, 2016
Data preprocessing.ipynb Added NoisyNLP package Oct 28, 2016
Download Wikidata.ipynb Submitted paper WNUT NER 2016 Oct 3, 2016
Dynet - BiLSTM - Viterbi loss + Char + Word Embeds-Pretrained.ipynb Updated Dynet models Jan 17, 2017
Dynet - BiLSTM - Viterbi loss + Char + Word Embeds.ipynb Updated Dynet models Jan 17, 2017
Dynet tutorials.ipynb Updated Dynet models Jan 17, 2017
Exploratory analysis.ipynb Restaged all files Sep 17, 2016
Extra Gazetteers.ipynb Submitted paper WNUT NER 2016 Oct 3, 2016
Gen new clusters.ipynb Added NoisyNLP package Oct 28, 2016
KerasCharRNN.ipynb Restaged all files Sep 17, 2016
KerasWordRNN.ipynb Restaged all files Sep 17, 2016
LICENSE Restaged all files Sep 17, 2016
Paper Figures.ipynb Added Image Dec 13, 2016
README.md Added dataset details Apr 3, 2017
Run Experiment.ipynb Added new experiments, features and figures Oct 31, 2016
Run Experiments.ipynb Submitted paper WNUT NER 2016 Oct 3, 2016
Shubhanshu.DeepER.UIUC.WNUT_NER.10_types.txt Added 400 d pre-trained vecs Apr 24, 2017
Tensorflow RNN.ipynb Restaged all files Sep 17, 2016
Test module.ipynb Added NoisyNLP package Oct 28, 2016
Updated Gazetteers.ipynb Submitted paper WNUT NER 2016 Oct 3, 2016
WNUT_NER_2016_IM_models.txt Submitted paper WNUT NER 2016 Oct 3, 2016
Word2Vec.ipynb Added DyNet experiment using word embeddings Nov 27, 2016
all_sequences.clark_clusters.32.txt Submitted paper WNUT NER 2016 Oct 3, 2016
run_experiment.py Added NoisyNLP package Oct 28, 2016
vocab.no_extras.txt Restaged all files Sep 17, 2016
vocab.txt Restaged all files Sep 17, 2016

README.md

TwitterNER DOI

Twitter named entity extraction for WNUT 2016 http://noisy-text.github.io/2016/ner-shared-task.html and the corresponding workshop paper at WNUT COLING 2016, titled Semi-supervised Named Entity Recognition in noisy-text by Shubhanshu Mishra and Jana Diesner

model architechture

Data download

The dataset used in this repository can bs downloaded from https://github.com/aritter/twitter_nlp/tree/master/data/annotated/wnut16

Submitted Solution [ST]

See Word2Vec.ipynb for details on the original submitted solution for the task.

Improved model

See Run Experiments.ipynb for the details on the improved system. See Run Experiment.ipynb for the details on the improved system with test data.

Using the API

The final system is packaged as an API specified in the folder NoisyNLP. More updates will be made to the API in upcoming days. See Run Experiment.ipynb for API usage.

Downloading Gazetteers

See Updated Gazetteers.ipynb, Extra Gazetteers.ipynb, Download Wikidata.ipynb

Generating word clusters

See Gen new clusters.ipynb

Data Pre-processing

See Data preprocessing.ipynb

Preliminary comparison with RNN models

See KerasCharRNN.ipynb, and KerasWordRNN.ipynb

Please cite as:

@INPROCEEDINGS {mishra2016_wnut_ner,
    author    = "Shubhanshu Mishra and Jana Diesner",
    title     = "Semi-supervised Named Entity Recognition in noisy-text",
    booktitle = "Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)",
    publisher = "The COLING 2016 Organizing Committee",
    pages     = "203-212",
    url       = "http://aclweb.org/anthology/W16-3927",
    year      = "2016",
    month     = "dec"
}