Specify a model from a specific directory for extract_features.py #62

johann-petrak · 2018-11-28T17:04:39Z

I have downloaded the model and vocab files into a specific location, using their original file names, so my directory for bert-base-cased contains:

bert-base-cased-vocab.txt
bert_config.json
pytorch_model.bin

But when I try to specify the directory which contains these files for the --bert_model parameter of extract_features.py I get the following error:

ValueError: Can't find a vocabulary file at path <THEDIRECTORYPATHISPECIFIED> ...

When I specify a file that exists and is a proper file, the error messages seem to indicate that the program wants to untar and uncompress the files.

Is there no way to just specify a specific directory that contains the vocab, config, and model files?

The text was updated successfully, but these errors were encountered:

artemisart · 2018-11-29T15:38:05Z

The last update broke this, but you can fix this in tokenization.py, you have to add this after vocab_file = pretrained_model_name:

if os.path.isdir(vocab_file):
    vocab_file = os.path.join(vocab_file, "vocab.txt")

johann-petrak · 2018-11-30T17:30:58Z

Thank you, is it fair to assume that this will get accepted as an issue and fixed in a future update/release?

thomwolf · 2018-11-30T17:36:12Z

Yes :-) There is a new release planned for tonight that will fix this (among other things, basically all the other open issues).

thomwolf · 2018-11-30T22:30:11Z

Ok, this is now included in the new release 0.3.0 (by #73).

* Update trainer and model flows to accommodate sparseml Disable FP16 on QAT start (huggingface#12) * Override LRScheduler when using LRModifiers * Disable FP16 on QAT start * keep wrapped scaler object for training after disabling Using QATMatMul in DistilBERT model class (huggingface#41) Removed double quantization of output of context layer. (huggingface#45) Fix DataParallel validation forward signatures (huggingface#47) * Fix: DataParallel validation forward signatures * Update: generalize forward_fn selection Best model after epoch (huggingface#46) fix sclaer check for non fp16 mode in trainer (huggingface#38) Mobilebert QAT (huggingface#55) * Remove duplicate quantization of vocabulary. enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9) * Utils and auxillary changes update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54) add flag to signal NM integration is active (huggingface#32) Add recipe_name to file names * Fix errors introduced in manual cherry-pick upgrade Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

rq

thomwolf closed this as completed Nov 30, 2018

maeotaku mentioned this issue May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023

Merge pull request huggingface#62 from huggingface/main

7d18172

rq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify a model from a specific directory for extract_features.py #62

Specify a model from a specific directory for extract_features.py #62

johann-petrak commented Nov 28, 2018

artemisart commented Nov 29, 2018

johann-petrak commented Nov 30, 2018

thomwolf commented Nov 30, 2018

thomwolf commented Nov 30, 2018

Specify a model from a specific directory for extract_features.py #62

Specify a model from a specific directory for extract_features.py #62

Comments

johann-petrak commented Nov 28, 2018

artemisart commented Nov 29, 2018

johann-petrak commented Nov 30, 2018

thomwolf commented Nov 30, 2018

thomwolf commented Nov 30, 2018