Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with pre-trained word embeddings #3

Closed
michellegiang opened this issue Mar 15, 2018 · 9 comments
Closed

Error with pre-trained word embeddings #3

michellegiang opened this issue Mar 15, 2018 · 9 comments

Comments

@michellegiang
Copy link

michellegiang commented Mar 15, 2018

Hi,

When I run test with your pre-trained word embeddings: .
./run.sh "/home/michelle/mlc/mlconvgec2018/data/test/conll14st-test/conll14st-test.tok.src" "/home/michelle/mlc/test" 2 "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed"

I have below error. Could you please let me know how to solve it ? And how to get the M2 score instead of BLEU score ?

(michelle) michelle@k:~/mlc/mlconvgec2018$ ./run.sh "/home/michelle/mlc/mlconvgec2018/data/test/conll14st-test/conll14st-test.tok.src" "/home/michelle/mlc/test" 2 "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed"
++ source paths.sh
+++++ dirname paths.sh
++++ cd .
++++ pwd
+++ BASE_DIR=/home/michelle/mlc/mlconvgec2018
+++ DATA_DIR=/home/michelle/mlc/mlconvgec2018/data
+++ MODEL_DIR=/home/michelle/mlc/mlconvgec2018/models
+++ SCRIPTS_DIR=/home/michelle/mlc/mlconvgec2018/scripts
+++ SOFTWARE_DIR=/home/michelle/mlc/mlconvgec2018/software
++ '[' 4 -ge 4 ']'
++ input_file=/home/michelle/mlc/mlconvgec2018/data/test/conll14st-test/conll14st-test.tok.src
++ output_dir=/home/michelle/mlc/test
++ device=2
++ model_path=/home/michelle/mlc/mlconvgec2018/models/mlconv_embed
++ '[' 4 -eq 6 ']'
++ '[' -d /home/michelle/mlc/mlconvgec2018/models/mlconv_embed ']'
+++ ls /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model2.pt /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model3.pt /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt
+++ tr '\n' ' '
+++ sed 's| ([^$])| --path \1|g'
++ models='/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model2.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model3.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt '
++ echo /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model2.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model3.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt
/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model2.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model3.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt
++ FAIRSEQPY=/home/michelle/mlc/mlconvgec2018/software/fairseq-py
++ NBEST_RERANKER=/home/michelle/mlc/mlconvgec2018/software/nbest-reranker
++ beam=12
++ nbest=12
++ threads=12
++ mkdir -p /home/michelle/mlc/test
++ /home/michelle/mlc/mlconvgec2018/scripts/apply_bpe.py -c /home/michelle/mlc/mlconvgec2018/models/bpe_model/train.bpe.model
++ CUDA_VISIBLE_DEVICES=2
++ python3.6 /home/michelle/mlc/mlconvgec2018/software/fairseq-py/generate.py --no-progress-bar --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model2.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model3.pt --path /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt --beam 12 --nbest 12 --interactive --workers 12 /home/michelle/mlc/mlconvgec2018/models/data_bin
Traceback (most recent call last):
File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/generate.py", line 167, in
main()
File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/generate.py", line 41, in main
models, dataset = utils.load_ensemble_for_inference(args.path, args.data)
File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/fairseq/utils.py", line 127, in load_ensemble_for_inference
model = build_model(args, dataset)
File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/fairseq/utils.py", line 31, in build_model
return getattr(models, args.model).build_model(args, dataset)
File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/fairseq/models/fconv.py", line 541, in build_model
dictionary=dataset.src_dict
File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/fairseq/models/fconv.py", line 100, in init
self.embed_tokens = load_embeddings(embed_path, dictionary, self.embed_tokens)
File "/home/michelle/mlc/mlconvgec2018/software/fairseq-py/fairseq/models/fconv.py", line 22, in load_embeddings
with open(embed_path) as f_embed:
FileNotFoundError: [Errno 2] No such file or directory: '/home.local/shamil/wiki/wiki.bpe.fasttext/model.vec'

@michellegiang
Copy link
Author

Hi,

I also post another issue at below link when I train with my data.

https://github.com/facebookresearch/fairseq-py/issues/129

@shamilcm
Copy link
Collaborator

shamilcm commented Mar 15, 2018

Hi, the current issue seems to be with our fork of fairseq-py, so you can close the issue you opened here: facebookresearch/fairseq-py#129 and re-post the issue here.

@shamilcm
Copy link
Collaborator

shamilcm commented Mar 15, 2018

The issue was due to some hardcoded paths in arguments.
It is now fixed here: shamilcm/fairseq-py@ceb2f12
Can you retry with this?

Regarding the training issue, can you close it at facebookresearch/fairseq-py#129 and open a new issue here. I will take a look at it.

Also what data is it trained on? Is it a very small training data with fewer than 30000 words in the vocabulary?

@michellegiang
Copy link
Author

hi, the training data is Lang-8 Learner Corpus of English v1.0 and NUCLE

@michellegiang
Copy link
Author

Hi Shamil,

So I need to delete the software/fairseq-py, download the new one and reinstall it right ?

Regards,
Viet Anh

@shamilcm
Copy link
Collaborator

shamilcm commented Mar 16, 2018

If you installed fairseq-py using setup.py, pull the new changes and run it again. Otherwise, you just need to just pull the changes. The change is done in only one file: fairseq/utils.py

@michellegiang
Copy link
Author

michellegiang commented Mar 16, 2018

Thank Shamil. If I already installed fairseq-py, could I just copy your new utils to replace the old utils ?

The reason is that I used your version of fairseq-py with the new version of PyTorch and I had some trouble with setup.py build and I need to apply some manual patch. (The version of PyTorch in your original read me has some problems, thus I need to use the newest version of PyTorch)

https://github.com/facebookresearch/fairseq-py/issues/120

@shamilcm
Copy link
Collaborator

If you installed by python setup.py develop, just getting utils.py should work.

@shamilcm
Copy link
Collaborator

Closing this issue as it has been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants