Bilma

Bert In Latin aMericA

Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the https://sadit.github.io/regional-spanish-models-talk-2022/ datasets.

The regional models can be downloaded from http://geo.ingeotec.mx/~lgruiz/regional-models-bilma/. You will also need to download the vocabulary file which is common to all the model and regions.

The accuracy of the models trained on the MLM task for different regions are:

We also fine tuned the models for emoticon prediction, the resulting accuracy is as follows:

Pre-requisites

You will need TensorFlow 2.4 or newer.

Quick guide

You can see the demo notebooks for a quick guide on how to use the models.

Clone this repository and then run

bash download-emoji15-bilma.sh

to download the MX model. Then to load the model you can use the code:

from bilma import bilma_model
vocab_file = "vocab_file_All.txt"
model_file = "bilma_small_MX_epoch-1_classification_epochs-13.h5"
model = bilma_model.load(model_file)
tokenizer = bilma_model.tokenizer(vocab_file=vocab_file,
max_length=280)

Now you will need some text:

texts = ["Tenemos tres dias sin internet ni senal de celular en el pueblo.",
         "Incomunicados en el siglo XXI tampoco hay servicio de telefonia fija",
         "Vamos a comer unos tacos",
         "Los del banco no dejan de llamarme"]
toks = tokenizer.tokenize(texts)

With this, you are ready to use the model

p = model.predict(toks)
tokenizer.decode_emo(p[1])

which produces the output: each emoji correspond to each entry in texts.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
bilma		bilma
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
bilma-cls-demo.ipynb		bilma-cls-demo.ipynb
bilma-demo.ipynb		bilma-demo.ipynb
download-emoji15-bilma.sh		download-emoji15-bilma.sh
emoji-output.jpg		emoji-output.jpg
pyproject.toml		pyproject.toml
setup.py		setup.py
train_twitt_classifier.py		train_twitt_classifier.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bilma

bilma

LICENSE

LICENSE

MANIFEST.in

MANIFEST.in

README.md

README.md

bilma-cls-demo.ipynb

bilma-cls-demo.ipynb

bilma-demo.ipynb

bilma-demo.ipynb

download-emoji15-bilma.sh

download-emoji15-bilma.sh

emoji-output.jpg

emoji-output.jpg

pyproject.toml

pyproject.toml

setup.py

setup.py

train_twitt_classifier.py

train_twitt_classifier.py

Repository files navigation

Bilma

Pre-requisites

Quick guide

About

Releases

Packages

Contributors 2

Languages

License

msubrayada/bilma

Folders and files

Latest commit

History

Repository files navigation

Bilma

Pre-requisites

Quick guide

About

Resources

License

Stars

Watchers

Forks

Languages