You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So I was trying to git clone this version and following your instructions exactly. After changing the file direction, I was hoping it will work, but I found myself 2 bug for 2 separate runs:
!bash /content/ZaloAI2021_LTR/run_predict.sh
and here is the trouble shoot: Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/content/ZaloAI2021_LTR/predict.py", line 29, in load_model model = MODEL_CLASSES[args.model_type][1].from_pretrained(args.model_dir, args=args) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/checkpoint During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/content/ZaloAI2021_LTR/predict.py", line 170, in <module> predict(pred_config) File "/content/ZaloAI2021_LTR/predict.py", line 117, in predict model = load_model(pred_config, args, device) File "/content/ZaloAI2021_LTR/predict.py", line 34, in load_model raise Exception("Some model files might be missing...") Exception: Some model files might be missing...
The second one is trainning from scratch run_phobert.sh:
!bash /content/ZaloAI2021_LTR/run_phobert.sh
and here is the trouble shoot:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 12/27/2021 08:44:27 - INFO - data_loader - Creating features from dataset file at train_tokenize_clean.json Traceback (most recent call last): File "/content/ZaloAI2021_LTR/main.py", line 68, in <module> main(args_parse) File "/content/ZaloAI2021_LTR/main.py", line 12, in main train_dataset = load_and_cache_examples(args, tokenizer) File "/content/ZaloAI2021_LTR/data_loader.py", line 136, in load_and_cache_examples examples = create_examples(input_file) File "/content/ZaloAI2021_LTR/data_loader.py", line 36, in create_examples with open(input_file, "r", encoding='utf-8') as reader: FileNotFoundError: [Errno 2] No such file or directory: 'train_tokenize_clean.json'
From what I know, PhoBert of Vin AI dont have any directory that either name train_tokenize_clean.json or this link https://huggingface.co/api/models/checkpoint. My hypothesis is that I may have forgotten to install a library or something else, but I'm not sure what it is even after reading the code. So I'm here for help.
*Note: I'm running using google collab pro with gpu runtime
The text was updated successfully, but these errors were encountered:
DuyquanDuc
changed the title
Running run_phoBert.sh and run_predict.sh keep missing file ?
Running run_phoBert.sh and run_predict.sh run into missing file ?
Dec 27, 2021
Em Chào Anh !
So I was trying to git clone this version and following your instructions exactly. After changing the file direction, I was hoping it will work, but I found myself 2 bug for 2 separate runs:
!bash /content/ZaloAI2021_LTR/run_predict.sh
and here is the trouble shoot:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/content/ZaloAI2021_LTR/predict.py", line 29, in load_model model = MODEL_CLASSES[args.model_type][1].from_pretrained(args.model_dir, args=args) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/checkpoint During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/content/ZaloAI2021_LTR/predict.py", line 170, in <module> predict(pred_config) File "/content/ZaloAI2021_LTR/predict.py", line 117, in predict model = load_model(pred_config, args, device) File "/content/ZaloAI2021_LTR/predict.py", line 34, in load_model raise Exception("Some model files might be missing...") Exception: Some model files might be missing...
The second one is trainning from scratch run_phobert.sh:
!bash /content/ZaloAI2021_LTR/run_phobert.sh
and here is the trouble shoot:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 12/27/2021 08:44:27 - INFO - data_loader - Creating features from dataset file at train_tokenize_clean.json Traceback (most recent call last): File "/content/ZaloAI2021_LTR/main.py", line 68, in <module> main(args_parse) File "/content/ZaloAI2021_LTR/main.py", line 12, in main train_dataset = load_and_cache_examples(args, tokenizer) File "/content/ZaloAI2021_LTR/data_loader.py", line 136, in load_and_cache_examples examples = create_examples(input_file) File "/content/ZaloAI2021_LTR/data_loader.py", line 36, in create_examples with open(input_file, "r", encoding='utf-8') as reader: FileNotFoundError: [Errno 2] No such file or directory: 'train_tokenize_clean.json'
From what I know, PhoBert of Vin AI dont have any directory that either name train_tokenize_clean.json or this link https://huggingface.co/api/models/checkpoint. My hypothesis is that I may have forgotten to install a library or something else, but I'm not sure what it is even after reading the code. So I'm here for help.
*Note: I'm running using google collab pro with gpu runtime
The text was updated successfully, but these errors were encountered: