NER model for Armenian #1206

ShakeHakobyan · 2023-03-07T14:02:17Z

Hello! I have trained a NER model for the Armenian language using the ArmTDP dataset and the xlm-roberta-base model.

After that, I attempted to test the model using stanza.Pipeline:

import stanza

config = {
'processors': 'tokenize, ner',
'lang': 'hy',
'ner_model_path': '/Lab/Projects/ner/models/hy_armtdp_nertagger_bert_18.pt',
}

nlp = stanza.Pipeline(**config)

nlp("some text in Arminian")

While working with the same data, I observed that the outputs after loading the model were different each time.
Although there was no such problem when testing the code using internal commands. Whenever I run the following code, I get the same output:

python3 -m stanza.utils.training.run_ner hy_armtdp --score_test

What could be the cause of this problem?

Additionally, I have added data conversion and BERT code for Armenian in this pull request (trained model can be downloaded from this drive).

If the problem is feasible, it would be great to integrate a NER model for Armenian in the main package

Thanks!

The text was updated successfully, but these errors were encountered:

AngledLuffa · 2023-03-13T02:09:42Z

Thank you for doing this! Although I should point out that the pull request is currently against your own fork, not our dev branch. If you'll fix that, I can check this out tomorrow and try to diagnose the problem you're seeing.

AngledLuffa · 2023-03-13T02:14:31Z

Actually, if you would give an example of a sentence which causes the inconsistent labels, that would help a lot.

AngledLuffa · 2023-03-13T02:59:12Z

It was pretty easy for me to get the changes you made, so I replicated your pull request locally (with you as the author, of course)

https://github.com/stanfordnlp/stanza/pull/1212/commits

Let me know if that looks good to you.

I like having a non-bert model as the default so the pipeline is less expensive unless people know they want the bert model, so I will check that everything works by retraining the model. If you would find an example that was triggering the non-deterministic behavior, though, I can try to debug that.

Thanks for sending this!

ShakeHakobyan · 2023-03-15T14:51:37Z

Hi there,
Thank you for your response. The pull request replication option looks good. I have included a link to my trained model without bert here. As for the issue with the pipeline that I mentioned earlier, it appears to have been resolved.

AngledLuffa · 2023-03-15T19:16:12Z

It's been merged, and the new models (retrained locally) are included in 1.5.0 Thanks for the help!

ShakeHakobyan added the enhancement label Mar 7, 2023

AngledLuffa closed this as completed Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER model for Armenian #1206

NER model for Armenian #1206

ShakeHakobyan commented Mar 7, 2023

AngledLuffa commented Mar 13, 2023

AngledLuffa commented Mar 13, 2023

AngledLuffa commented Mar 13, 2023

ShakeHakobyan commented Mar 15, 2023

AngledLuffa commented Mar 15, 2023

NER model for Armenian #1206

NER model for Armenian #1206

Comments

ShakeHakobyan commented Mar 7, 2023

AngledLuffa commented Mar 13, 2023

AngledLuffa commented Mar 13, 2023

AngledLuffa commented Mar 13, 2023

ShakeHakobyan commented Mar 15, 2023

AngledLuffa commented Mar 15, 2023