Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Language Identification and Named Entity Recognition in Hinglish Code Mixed Tweets

Kushagra Singh, Indira Sen, Ponnurangam Kumaraguru
ACL 2018, SRW
Link to paper

Repository contains
(i) Seq2seq based transliterator (Roman to Devanagri)
(ii) Language identification tool for Hindi-English code switched text (English, Hindi, Rest)
(iii) CRF based Named Entity Recogntion tool for Hindi-English code switched text (Person, Location, Organisation)

Check for the annotated corpus.

  • Install dependencies using requirements.txt file in a virtualenv.

  • Check the README in transliteration dir and follow instructions to set up.

  • Export the following env variables before running demo files

export TRANSLITERATION_DIR={{path_to_parent_dir}}/hindi-english-code-mixing-lidf-ner/transliteration
export HINGLISH_ROOT_DIR={{path_to_parent_dir}}/hindi-english-code-mixing-lidf-ner
You can’t perform that action at this time.