Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running the configuring parameters cell #14

Closed
bernardmizzi opened this issue Mar 26, 2020 · 35 comments
Closed

Error running the configuring parameters cell #14

bernardmizzi opened this issue Mar 26, 2020 · 35 comments

Comments

@bernardmizzi
Copy link

Good morning,

I am running the configuring parameters cell and I am getting the below error:


UnpicklingError Traceback (most recent call last)
in
5 pass
6
----> 7 bertmodel = BertForSequenceClassification.from_pretrained(lm_path,cache_dir=None, num_labels=3)
8
9

~/anaconda3/envs/finbert/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
601 if state_dict is None and not from_tf:
602 weights_path = os.path.join(serialization_dir, WEIGHTS_NAME)
--> 603 state_dict = torch.load(weights_path, map_location='cpu')
604 if tempdir:
605 # Clean up temp dir

~/anaconda3/envs/finbert/lib/python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
385 f = f.open('rb')
386 try:
--> 387 return _load(f, map_location, pickle_module, **pickle_load_args)
388 finally:
389 if new_fd:

~/anaconda3/envs/finbert/lib/python3.7/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
562 f.seek(0)
563
--> 564 magic_number = pickle_module.load(f, **pickle_load_args)
565 if magic_number != MAGIC_NUMBER:
566 raise RuntimeError("Invalid magic number; corrupt file?")

UnpicklingError: invalid load key, 'v'.

Moreover, can you kindly explain how I can construct the files train.csv, validation.csv, test.csv?

Regards,
Bernard

@cskksdfklpz
Copy link

Make sure you successfully downloaded the language model (modes/language_model/finbertTRC2/pytorch_model.bin should be about 400 MB). Try to use git-lfs or directly download the model from GitHub webpage.

@bernardmizzi
Copy link
Author

Thanks for your feedback.

Moreover, how I can construct the files train.csv, validation.csv, test.csv?

@bernardmizzi
Copy link
Author

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

@davidifshk
Copy link

image
Hi all,
I also met the problem when I ran the configuring parameters cell. I'm trying to download the pytorch_model.bin with git-lfs then but getting this error. It seems a service limit. Kindly be asked for any helps.
Great Thanks!

@bernardmizzi
Copy link
Author

Take a look at this #8

@davidifshk
Copy link

image
Thanks for your help!
But after running wget https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin
the file I got is also the size 134kb one not the original 400Mb one.

@l0rem1psum
Copy link

image
Thanks for your help!
But after running wget https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin
the file I got is also the size 134kb one not the original 400Mb one.

When I did a git lfs pull, it tells me that:

"batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/ProsusAI/finBERT.git/info/lfs'"

This is probably related to this issue.

@bernardmizzi
Copy link
Author

You could manually download https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin from the browser, that's what I did

@bernardmizzi
Copy link
Author

image
Thanks for your help!
But after running wget https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin
the file I got is also the size 134kb one not the original 400Mb one.

Did you try manually downloading the file from the browser from https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin? It worked for me and the downloaded file is approximately 400MB

@l0rem1psum
Copy link

You could manually download https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin from the browser, that's what I did

Can you share your local copy of the model file? This method no longer works due to GitHub bandwidth restrictions. I can download the file but it's only 134 bytes. Thank you

@bernardmizzi
Copy link
Author

Yeah sure

https://drive.google.com/drive/folders/1Y7pS_P4Bui7pZXKp04aPjb1c62CQUZT2?usp=sharing

ok?

@bernardmizzi
Copy link
Author

@davidifshk you can also use my link if you want ^

@l0rem1psum
Copy link

@bernardmizzi Thank you. This is going to benefit more people with the same issue.

@bernardmizzi
Copy link
Author

No problem, glad I could help

@l0rem1psum
Copy link

@bernardmizzi

Sorry to ask again, but could you please also share the model under classifier_model/finbert-sentiment. I believe that could not be downloaded as well. Really appreciate your help!

@bernardmizzi
Copy link
Author

That model is created when trained on certain text, you'll have to run the notebook finBERT/notebooks/finbert_training.ipynb as mine is trained on certain text. If you want i'll give you mine but it is trained on reddit news headlines and obviously it reported very low accuracy.

@l0rem1psum
Copy link

That's okay. Thank you very much!

@bernardmizzi
Copy link
Author

Should you need help with running the notebook just send me a message as I got it up and running.

@davidifshk
Copy link

@davidifshk you can also use my link if you want ^

It works! Thank you very much! I'm going to run the training with the dataset from FinancialPhraseBank first.

@davidifshk
Copy link

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

Kindly be asked for the data structure of train.csv that I got an error when ran the cell 'get_data()'. Here is the data structure of my train.csv. Is there anything wrong?
image

@davidifshk
Copy link

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

Kindly be asked for the data structure of train.csv that I got an error when ran the cell 'get_data()'. Here is the data structure of my train.csv. Is there anything wrong?
image

fixed. I used wrong sep character ',' to export csv file

@bernardmizzi
Copy link
Author

@davidifshk I wan't able to run the model on the PhaseBank Dataset as I was getting encoding errors on both windows and ubuntu systems. Thus I opted for another dataset.

@davidifshk
Copy link

ic, I have already run the model on the PhaseBank Dataset that result is shown below.

image

@bernardmizzi
Copy link
Author

@davidifshk would it be a problem to provide me the code you used to open and format the PhraseBank dataset as I was getting encoding errors?

@saishashank85
Copy link

Im trying to use finbert for classification of new articles into several different categories in the banking domain . Which model should i use for classification .
Natual language model or the classification model .
Thanks.

@bernardmizzi
Copy link
Author

You have to run the notebook FinBERT/notebooks/finbert_training.ipynb which will train the language model, then it will create a new classification model, which then, will continuing running the notebook, will use it for classification

@jarasny
Copy link

jarasny commented Apr 26, 2020

@bernardmizzi Your link to model from google drive has expired, can you re-upload it please? When trying to download model from repository I get error:

This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

@bernardmizzi
Copy link
Author

https://drive.google.com/drive/folders/19j9gFJnDDEH5qebrB-Om5lduqg0zr-Is?usp=sharing

@metodj
Copy link

metodj commented Apr 27, 2020

Thanks a lot @bernardmizzi ! Could you upload also the sentiment model weights?

@bernardmizzi
Copy link
Author

The model is already pre-trained and can be used. I think the model weights are embedded within the model. To run finbert, all you need s the pythorch model bin file and its config.

@metodj
Copy link

metodj commented Apr 27, 2020

Indeed, weights are embedded within a model. It's just that there are 2 different models on this repo, one is language model and one is sentiment model (see picture below). On your drive you uploaded the language model, could you upload the sentiment model too? Thanks!

image

@bernardmizzi
Copy link
Author

You'll have to run the notebook finbert_training.ipynb since the model you are asking for is fine-tuned (trained) on a certain dataset, and that depends on which dataset you want

@clone95
Copy link

clone95 commented Apr 27, 2020

I actually need it fine-tuned on financial news, so if you can upload the fine-tuned version of the sentiment-analysis one, I'd be glad! Thank you anyway.

@metodj
Copy link

metodj commented Apr 28, 2020

@bernardmizzi you're right, didn't went carefully enough through the read me to notice that. Thanks for your help!
@clone95 I will fine-tune the model for the sentiment analysis in the following days and can then upload that version

@akmalsabri
Copy link

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

Hi, how to settle this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants