Skip to content

Commit

Permalink
added documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
sagorbrur committed Aug 22, 2020
1 parent 1964e8e commit 2729870
Show file tree
Hide file tree
Showing 2 changed files with 102 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ We used [LinCE](https://ritual.uh.edu/lince/home) dataset for training **multili
* Spanish-English(spa-eng)
* Hindi-English(hin-eng)
* Nepali-English(nep-eng)

### Language Code
* `spa-eng` for spanish-english
* `hin-eng` for hindi-english
Expand Down
101 changes: 101 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@

Code Switch
===========

**CodeSwitch** is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.

Supported Code-Mixed Language
-----------------------------

We used `LinCE <https://ritual.uh.edu/lince/home>`_ dataset for training **multilingual BERT** model using huggingface `transformers <https://github.com/huggingface/transformers>`_. ``LinCE`` has four language mixed data. We took three of it ``spanish-english``\ , ``hindi-english`` and ``nepali-english``. Hope we will train and add other language and task too.


* Spanish-English(spa-eng)
* Hindi-English(hin-eng)
* Nepali-English(nep-eng)

Language Code
^^^^^^^^^^^^^


* ``spa-eng`` for spanish-english
* ``hin-eng`` for hindi-english
* ``nep-eng`` for nepali-english

Installation
------------

.. code-block::
pip install codeswitch
Dependency
----------


* pytorch >=1.6.0

Features & Supported Language
-----------------------------


* Language Identification

* spanish-english
* hindi-english
* nepali-english

* POS

* spanish-english
* hindi-english

* NER

* spanish-english
* hindi-english

Language Identification
-----------------------

.. code-block:: py
from codeswitch.codeswitch import LanguageIdentification
lid = LanguageIdentification('spa-eng')
# for hindi-english use 'hin-eng',
# for nepali-english use 'nep-eng'
text = "" # your code-mixed sentence
result = lid.identify(text)
print(result)
POS Tagging
-----------

.. code-block:: py
from codeswitch.codeswitch import POS
pos = POS('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence
result = pos.tag(text)
print(result)
NER Tagging
-----------

.. code-block:: py
from codeswitch.codeswitch import NER
ner = NER('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence
result = ner.tag(text)
print(result)
Acknowledgement
---------------


* `LinCE <https://ritual.uh.edu/lince/home>`_
* `BERT <https://arxiv.org/abs/1810.04805>`_
* `huggingface <https://github.com/huggingface>`_

0 comments on commit 2729870

Please sign in to comment.