Skip to content

Commit

Permalink
First commit
Browse files Browse the repository at this point in the history
Initial commit of the library with the Lemmatizer, Part-of-Speech tagger, Named Entity Annotator and Syllabificator tools
  • Loading branch information
jpereiran committed Apr 14, 2018
1 parent 1af868b commit 01b954c
Show file tree
Hide file tree
Showing 18 changed files with 4,939 additions and 2 deletions.
10 changes: 10 additions & 0 deletions .gitignore
@@ -0,0 +1,10 @@
.installed.cfg
bin
develop-eggs
dist
downloads
eggs
parts
*.egg-info
lib
lib64
7 changes: 7 additions & 0 deletions LICENSE.txt
@@ -0,0 +1,7 @@
Copyright (c) 2018 Chana PUCP

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
1 change: 1 addition & 0 deletions MANIFEST.in
@@ -0,0 +1 @@
recursive-include chana/files *
69 changes: 67 additions & 2 deletions README.md
@@ -1,2 +1,67 @@
# chana-library
Python 3.x library with an NLP toolkit for Shipibo-Konibo
# Chana: An NLP toolkit for the Shipibo-Konibo language of Peru.

chana is a Python library of various NLP tools for the Shipibo-Konibo.
Some of these tools can be reused on other peruvian indigenous and/or highly agglutinative languages.
It is built on top of scikit-learn and distributed under MIT license.

Chana has various NLP tools such as:

* Lemmatizer.
* Part-of-Speech tagger.
* Named Entity annotation.
* Syllabificator.


## Installation

### Dependencies

Chana requires:

- Python (>= 3.4)
- NumPy (>= 1.13.1)
- Scikit-learn (>= 0.18.1)
- Python-crfsuite (>= 0.9.5)


### User installation

If you already have a working installation of numpy and scipy,
the easiest way to install chana is using ``pip`` :

```
pip install chana
```


## Help and Support

### Important links

- Project website: http://chana.inf.pucp.edu.pe
- Official source code repo: https://github.com/iapucp/chana-library
- Download releases: https://pypi.python.org/pypi/...
- HTML documentation (stable release): http://chana.inf.pucp.edu.pe

### Communication

- Website: http://chana.inf.pucp.edu.pe
- Authors: **Jose Pereira** - [jpereiran](https://github.com/jpereiran)


### Contact

For any question and feedback please contact:

- José Pereira Noriega (jpereira@pucp.edu.pe)
- Rodolfo Mercado Gonzales (rmercado@pucp.edu.pe)
- Arturo Oncevay Marcos (arturo.oncevay@pucp.edu.pe)
- Vivian Góngora Patrón (v.gongora@pucp.pe)

### Acknowledgments

- Pontificia Universidad Católica del Perú (PUCP)
- Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica (CONCYTEC)
- NVIDIA
- Amazon Web Services

12 changes: 12 additions & 0 deletions chana/__init__.py
@@ -0,0 +1,12 @@
#coding=UTF-8
"""
Basic toolkit for the shipibo-konibo language
Modules that are implemented:
-Lemmatizer
-NER
For more information on these modules check help(chana.module_name)
All the information and code is from the Chana project
"""
Binary file added chana/files/lemmatizer/shipibo_knn_model.pkl
Binary file not shown.
225 changes: 225 additions & 0 deletions chana/files/lemmatizer/shipibo_suffixes.dat
@@ -0,0 +1,225 @@
naan
yama
men
iosma
baon
yora
ki
shaman
ke
kon
boon
kain
kean
yáb
yác
yách
yáh
yáw
yáj
yám
yán
yáp
yáq
yár
yás
yásh
yáx
yát
yáts
yáy
kaya
non
iosi
onmea
yáa
yáe
yái
yáo
wan
xon
bax
cax
chax
hax
wax
jax
max
nax
pax
qax
rax
sax
shax
xax
tax
tsax
yax
bicho
inab
inac
inach
inah
inaw
inaj
inam
inan
inap
inaq
inar
inas
inash
inax
inat
inats
inay
toshia
toshie
toshii
toshio
nan
kin
nonxon
noxon
kan
kas
inaa
inae
inai
inao
ekeet
eeet
ekee
eee
ananan
tani
nin
bekon
a
kiran
i
o
n
patankain
beiran
ibab
ibac
ibach
ibah
ibaw
ibaj
ibam
iban
ibap
ibaq
ibar
ibas
ibash
ibax
ibat
ibats
ibay
ra
bi
bo
ia
tian
tsi
meet
mee
mein
main
bait
isi
oma
nontian
nox
mea
wepanon
eenaan
enaan
kawan
iáma
ta
rabe
ti
pari
kashama
aanaan
anaan
aanan
anan
tan
akea
ekea
ikea
okea
ax
ex
ix
ox
nonx
ma
bires
shin
bira
akaat
akaa
ken
pan
pao
paket
ai
káto
an
aitian
res
iinaan
inaan
toshib
toshic
toshich
toshih
toshiw
toshij
toshim
toshin
toship
toshiq
toshir
toshis
toshish
toshix
toshit
toshits
toshiy
ketian
ni
sa
na
boan
oonaan
onaan
koma
we
anan
pacho
ibaa
ibae
ibai
ibao
cha
pake
yantan
ikiit
iiit
ikii
iii
on
mis
okoot
ooot
okoo
ooo
shoko
Binary file added chana/files/ner/crf_ner.crfsuite
Binary file not shown.

0 comments on commit 01b954c

Please sign in to comment.