Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify a model from a specific directory for extract_features.py #62

Closed
johann-petrak opened this issue Nov 28, 2018 · 4 comments
Closed

Comments

@johann-petrak
Copy link

I have downloaded the model and vocab files into a specific location, using their original file names, so my directory for bert-base-cased contains:

bert-base-cased-vocab.txt
bert_config.json
pytorch_model.bin

But when I try to specify the directory which contains these files for the --bert_model parameter of extract_features.py I get the following error:

ValueError: Can't find a vocabulary file at path <THEDIRECTORYPATHISPECIFIED> ...

When I specify a file that exists and is a proper file, the error messages seem to indicate that the program wants to untar and uncompress the files.

Is there no way to just specify a specific directory that contains the vocab, config, and model files?

@artemisart
Copy link

The last update broke this, but you can fix this in tokenization.py, you have to add this after vocab_file = pretrained_model_name:

if os.path.isdir(vocab_file):
    vocab_file = os.path.join(vocab_file, "vocab.txt")

@johann-petrak
Copy link
Author

Thank you, is it fair to assume that this will get accepted as an issue and fixed in a future update/release?

@thomwolf
Copy link
Member

Yes :-) There is a new release planned for tonight that will fix this (among other things, basically all the other open issues).

@thomwolf
Copy link
Member

Ok, this is now included in the new release 0.3.0 (by #73).

xloem pushed a commit to xloem/transformers that referenced this issue Apr 9, 2023
* Update trainer and model flows to accommodate sparseml

Disable FP16 on QAT start (huggingface#12)

* Override LRScheduler when using LRModifiers

* Disable FP16 on QAT start

* keep wrapped scaler object for training after disabling

Using QATMatMul in DistilBERT model class (huggingface#41)

Removed double quantization of output of context layer. (huggingface#45)

Fix DataParallel validation forward signatures (huggingface#47)

* Fix: DataParallel validation forward signatures

* Update: generalize forward_fn selection

Best model after epoch (huggingface#46)

fix sclaer check for non fp16 mode in trainer (huggingface#38)

Mobilebert QAT (huggingface#55)

* Remove duplicate quantization of vocabulary.

enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9)

* Utils and auxillary changes

update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54)

add flag to signal NM integration is active (huggingface#32)

Add recipe_name to file names

* Fix errors introduced in manual cherry-pick upgrade

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants