Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running run_phoBert.sh and run_predict.sh run into missing file ? #1

Closed
DuyquanDuc opened this issue Dec 27, 2021 · 1 comment
Closed

Comments

@DuyquanDuc
Copy link

DuyquanDuc commented Dec 27, 2021

Em Chào Anh !

So I was trying to git clone this version and following your instructions exactly. After changing the file direction, I was hoping it will work, but I found myself 2 bug for 2 separate runs:

!bash /content/ZaloAI2021_LTR/run_predict.sh

and here is the trouble shoot:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/content/ZaloAI2021_LTR/predict.py", line 29, in load_model model = MODEL_CLASSES[args.model_type][1].from_pretrained(args.model_dir, args=args) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/checkpoint During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/content/ZaloAI2021_LTR/predict.py", line 170, in <module> predict(pred_config) File "/content/ZaloAI2021_LTR/predict.py", line 117, in predict model = load_model(pred_config, args, device) File "/content/ZaloAI2021_LTR/predict.py", line 34, in load_model raise Exception("Some model files might be missing...") Exception: Some model files might be missing...

The second one is trainning from scratch run_phobert.sh:

!bash /content/ZaloAI2021_LTR/run_phobert.sh

and here is the trouble shoot:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 12/27/2021 08:44:27 - INFO - data_loader - Creating features from dataset file at train_tokenize_clean.json Traceback (most recent call last): File "/content/ZaloAI2021_LTR/main.py", line 68, in <module> main(args_parse) File "/content/ZaloAI2021_LTR/main.py", line 12, in main train_dataset = load_and_cache_examples(args, tokenizer) File "/content/ZaloAI2021_LTR/data_loader.py", line 136, in load_and_cache_examples examples = create_examples(input_file) File "/content/ZaloAI2021_LTR/data_loader.py", line 36, in create_examples with open(input_file, "r", encoding='utf-8') as reader: FileNotFoundError: [Errno 2] No such file or directory: 'train_tokenize_clean.json'

From what I know, PhoBert of Vin AI dont have any directory that either name train_tokenize_clean.json or this link https://huggingface.co/api/models/checkpoint. My hypothesis is that I may have forgotten to install a library or something else, but I'm not sure what it is even after reading the code. So I'm here for help.

*Note: I'm running using google collab pro with gpu runtime

@DuyquanDuc DuyquanDuc changed the title Running run_phoBert.sh and run_predict.sh keep missing file ? Running run_phoBert.sh and run_predict.sh run into missing file ? Dec 27, 2021
@hieudx149
Copy link
Owner

hieudx149 commented Jan 3, 2022

bạn đã tạo folder checkpoint và đẩy checkpoint của mình vào đó trước khi chạy chưa nhỉ ?
mình đã update lại file run_predict.sh và run_phobert.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants