Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanish usage #75

Closed
angelo337 opened this issue Dec 21, 2016 · 12 comments
Closed

spanish usage #75

angelo337 opened this issue Dec 21, 2016 · 12 comments

Comments

@angelo337
Copy link

hi there
is it possible to use RASA in spanish? with the MITIE model in spanish?
if so, could you please point me some resource to do all changes?
thanks
angelo

@amn41
Copy link
Contributor

amn41 commented Dec 21, 2016

the spanish MITIE models are here , if you unzip them and find the feature extractor file you should use that as your mitie_file. If you find that the tokenizer isn't working perfectly for spanish we can address that.

@angelo337
Copy link
Author

I just Download that model and place all that infor in the config file, however I am getting this error:
would you please point me out how to fix it?
thanks

creangel@creangel_hadoop:~/Downloads/mitie/rasa_nlu$ time python -m rasa_nlu.train -c config.json
Training to recognize 4 categories: 'saludo', 'restaurante_busqueda', 'afirmacion', 'despedida'
Train classifier
extracting text features
now do training
num training samples: 63
C: 200 f-score: 0.709677
C: 400 f-score: 0.709677
C: 300 f-score: 0.709677
C: 100 f-score: 0.709677
C: 0.01 f-score: 0.612903
C: 600 f-score: 0.709677
C: 1400 f-score: 0.709677
C: 3000 f-score: 0.709677
C: 5000 f-score: 0.709677
C: 2550 f-score: 0.709677
C: 1325 f-score: 0.709677
C: 712.5 f-score: 0.709677
C: 406.25 f-score: 0.709677
C: 253.125 f-score: 0.709677
C: 176.562 f-score: 0.709677
C: 138.281 f-score: 0.709677
C: 119.141 f-score: 0.709677
C: 109.57 f-score: 0.709677
C: 104.785 f-score: 0.709677
C: 102.393 f-score: 0.709677
C: 101.196 f-score: 0.709677
C: 100.598 f-score: 0.709677
C: 100.299 f-score: 0.709677
best C: 100.598
test on train:
20 0 0 0
0 8 0 0
0 0 21 0
0 0 0 14

overall accuracy: 1
Training time: 429 seconds.
df.number_of_classes(): 4

Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "build/bdist.linux-x86_64/egg/rasa_nlu/train.py", line 65, in
File "build/bdist.linux-x86_64/egg/rasa_nlu/train.py", line 59, in do_train
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 25, in train
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 42, in train_entity_extractor
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 31, in start_and_end
IndexError: list index out of range

@amn41
Copy link
Contributor

amn41 commented Dec 21, 2016

looks like there's an error picking up one of your entities. I can't tell if this is a bug or a problem with your data without seeing it.

Please try training intents only (e.g. removing any entities from your training data), and then add them back one by one until you trigger this error. Then please post here the training example which causes the error.

@angelo337
Copy link
Author

hi there
I just try your solutions and work like a charm, i figure out my mistake is that start counting sentences from 1 instead of 0.
now is fix it.
thanks

@ghost
Copy link

ghost commented Jan 12, 2017

I have the same problem @angelo337 had.. IndexError: list index out of range
I am using the expressions.json file from wit.ai

is there a problem with training wit data??
expressions.json.zip

@amn41
Copy link
Contributor

amn41 commented Jan 12, 2017

thanks for sharing your training data! I'm able to reproduce this error. It's down to the fact that you have entities like 'perth' in the sentence "what is perths weather like next week". MITIE can only handle entities made up of whole tokens. I will handle this edge case in rasa, but it will still return "perths" rather than "perth" as your location. So for now you will have to resolve that entity yourself. It's on the roadmap to come up with a solution to that, though.

@amn41
Copy link
Contributor

amn41 commented Jan 12, 2017

although thinking about it we could explicitly insert a whitespace in these cases. I will create a new issue & make a proposal

@beeva-lisettegarcia
Copy link

Hello

I would like to use rasa por spanish texts.
I already download the spanish Mitie model and prepared the config file.
During training, I get the following error:

python -m rasa_nlu.train -c config.json
Training to recognize 8 categories: 'greet', 'restaurant_search', 'affirm', 'goodbye', 'saludo', 'busqueda_restaurante', 'afirmacion', 'despedida'
Train classifier
extracting text features
now do training
num training samples: 44
C: 200 f-score: 0.525
C: 400 f-score: 0.525
C: 300 f-score: 0.525
C: 100 f-score: 0.525
C: 0.01 f-score: 0.575
C: 50.005 f-score: 0.525
C: 25.0075 f-score: 0.525
C: 12.5088 f-score: 0.525
C: 6.25938 f-score: 0.525
C: 3.13469 f-score: 0.525
C: 1.57234 f-score: 0.525
C: 0.791172 f-score: 0.525
C: 0.400586 f-score: 0.525
best C: 0.01
test on train:
5 0 0 0 0 0 0 0
0 8 0 0 0 0 0 0
0 0 6 0 0 0 1 0
0 0 0 5 0 0 0 0
1 0 0 0 1 0 0 0
0 0 0 0 0 8 0 0
0 0 0 0 0 0 5 0
0 0 0 1 0 0 0 3

overall accuracy: 0.931818
Training time: 854 seconds.
df.number_of_classes(): 8

Traceback (most recent call last):
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/train.py", line 65, in
do_train(config)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/train.py", line 59, in do_train
trainer.train(training_data)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 30, in train
self.entity_extractor = self.train_entity_extractor(data.entity_examples)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 53, in train_entity_extractor
start, end = self.find_entity(ent, text)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 35, in find_entity
tokens, offsets = tk.tokenize_with_offsets(text)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/tokenizers/mitie_tokenizer.py", line 24, in tokenize_with_offsets
offset += m.start()
AttributeError: 'NoneType' object has no attribute 'start'

Tracing the error, I found the problem in

(mitie_tokenizer.py)
line 22 m = re.search(re.escape(tok), _text[offset:])

when we work with words with accents.

Any idea ?

Thanks
busq_restaurante_Data.json.zip

@frankai
Copy link

frankai commented Feb 21, 2017

I have the same problem than @beeva-lisettegarcia when training with spanish accents. The problem appears to be in the mitie_tokenizer.py script. Any idea or clue to fix it? Thanks!

@tmbo
Copy link
Member

tmbo commented Feb 21, 2017

@beeva-lisettegarcia @frankai I just pushed a change that should fix the encoding issue (unfortunately the test that should have ensured this functionality had a bug on its own 😓 ). Would be great if you could test that to see if it solves your issue.

For the future: Please avoid re-using closed issues. Don't hesitate to create new issues. The only thing you should do is the following: make sure the exact problem is not already an existing issue.

@cbonadio
Copy link

I had the same issue as @beeva-lisettegarcia @frankai, now pulled the changes and it is working.

Thanks

@beeva-lisettegarcia
Copy link

Thanks, Now it is working :-)

vcidst pushed a commit that referenced this issue Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants