Error running the configuring parameters cell #14

bernardmizzi · 2020-03-26T10:01:32Z

Good morning,

I am running the configuring parameters cell and I am getting the below error:

UnpicklingError Traceback (most recent call last)
in
5 pass
6
----> 7 bertmodel = BertForSequenceClassification.from_pretrained(lm_path,cache_dir=None, num_labels=3)
8
9

~/anaconda3/envs/finbert/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
601 if state_dict is None and not from_tf:
602 weights_path = os.path.join(serialization_dir, WEIGHTS_NAME)
--> 603 state_dict = torch.load(weights_path, map_location='cpu')
604 if tempdir:
605 # Clean up temp dir

~/anaconda3/envs/finbert/lib/python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
385 f = f.open('rb')
386 try:
--> 387 return _load(f, map_location, pickle_module, **pickle_load_args)
388 finally:
389 if new_fd:

~/anaconda3/envs/finbert/lib/python3.7/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
562 f.seek(0)
563
--> 564 magic_number = pickle_module.load(f, **pickle_load_args)
565 if magic_number != MAGIC_NUMBER:
566 raise RuntimeError("Invalid magic number; corrupt file?")

UnpicklingError: invalid load key, 'v'.

Moreover, can you kindly explain how I can construct the files train.csv, validation.csv, test.csv?

Regards,
Bernard

cskksdfklpz · 2020-03-27T13:26:15Z

Make sure you successfully downloaded the language model (modes/language_model/finbertTRC2/pytorch_model.bin should be about 400 MB). Try to use git-lfs or directly download the model from GitHub webpage.

bernardmizzi · 2020-03-27T13:59:43Z

Thanks for your feedback.

Moreover, how I can construct the files train.csv, validation.csv, test.csv?

bernardmizzi · 2020-03-27T14:34:19Z

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

davidifshk · 2020-03-30T09:01:11Z

Hi all,
I also met the problem when I ran the configuring parameters cell. I'm trying to download the pytorch_model.bin with git-lfs then but getting this error. It seems a service limit. Kindly be asked for any helps.
Great Thanks!

bernardmizzi · 2020-03-30T09:05:13Z

Take a look at this #8

davidifshk · 2020-03-30T09:19:00Z

Thanks for your help!
But after running wget https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin
the file I got is also the size 134kb one not the original 400Mb one.

l0rem1psum · 2020-03-30T09:48:03Z

Thanks for your help!
But after running wget https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin
the file I got is also the size 134kb one not the original 400Mb one.

When I did a git lfs pull, it tells me that:

"batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/ProsusAI/finBERT.git/info/lfs'"

This is probably related to this issue.

bernardmizzi · 2020-03-30T13:18:23Z

You could manually download https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin from the browser, that's what I did

bernardmizzi · 2020-03-30T13:20:03Z

Thanks for your help!
But after running wget https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin
the file I got is also the size 134kb one not the original 400Mb one.

Did you try manually downloading the file from the browser from https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin? It worked for me and the downloaded file is approximately 400MB

l0rem1psum · 2020-03-30T13:21:40Z

You could manually download https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin from the browser, that's what I did

Can you share your local copy of the model file? This method no longer works due to GitHub bandwidth restrictions. I can download the file but it's only 134 bytes. Thank you

bernardmizzi · 2020-03-30T13:28:29Z

Yeah sure

https://drive.google.com/drive/folders/1Y7pS_P4Bui7pZXKp04aPjb1c62CQUZT2?usp=sharing

ok?

bernardmizzi · 2020-03-30T13:29:28Z

@davidifshk you can also use my link if you want ^

l0rem1psum · 2020-03-30T13:30:26Z

@bernardmizzi Thank you. This is going to benefit more people with the same issue.

bernardmizzi · 2020-03-30T13:31:50Z

No problem, glad I could help

l0rem1psum · 2020-03-30T13:42:07Z

@bernardmizzi

Sorry to ask again, but could you please also share the model under classifier_model/finbert-sentiment. I believe that could not be downloaded as well. Really appreciate your help!

bernardmizzi · 2020-03-30T13:47:01Z

That model is created when trained on certain text, you'll have to run the notebook finBERT/notebooks/finbert_training.ipynb as mine is trained on certain text. If you want i'll give you mine but it is trained on reddit news headlines and obviously it reported very low accuracy.

l0rem1psum · 2020-03-30T13:49:29Z

That's okay. Thank you very much!

bernardmizzi · 2020-03-30T13:51:45Z

Should you need help with running the notebook just send me a message as I got it up and running.

davidifshk · 2020-03-31T01:37:58Z

@davidifshk you can also use my link if you want ^

It works! Thank you very much! I'm going to run the training with the dataset from FinancialPhraseBank first.

davidifshk · 2020-03-31T04:25:34Z

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

Kindly be asked for the data structure of train.csv that I got an error when ran the cell 'get_data()'. Here is the data structure of my train.csv. Is there anything wrong?

davidifshk · 2020-03-31T06:41:34Z

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

Kindly be asked for the data structure of train.csv that I got an error when ran the cell 'get_data()'. Here is the data structure of my train.csv. Is there anything wrong?

fixed. I used wrong sep character ',' to export csv file

bernardmizzi · 2020-03-31T14:17:02Z

@davidifshk I wan't able to run the model on the PhaseBank Dataset as I was getting encoding errors on both windows and ubuntu systems. Thus I opted for another dataset.

davidifshk · 2020-04-02T07:17:40Z

ic, I have already run the model on the PhaseBank Dataset that result is shown below.

bernardmizzi · 2020-04-02T09:12:31Z

@davidifshk would it be a problem to provide me the code you used to open and format the PhraseBank dataset as I was getting encoding errors?

saishashank85 · 2020-04-03T15:45:27Z

Im trying to use finbert for classification of new articles into several different categories in the banking domain . Which model should i use for classification .
Natual language model or the classification model .
Thanks.

bernardmizzi · 2020-04-03T15:57:27Z

You have to run the notebook FinBERT/notebooks/finbert_training.ipynb which will train the language model, then it will create a new classification model, which then, will continuing running the notebook, will use it for classification

jarasny · 2020-04-26T12:37:58Z

@bernardmizzi Your link to model from google drive has expired, can you re-upload it please? When trying to download model from repository I get error:

This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

bernardmizzi · 2020-04-27T13:12:31Z

https://drive.google.com/drive/folders/19j9gFJnDDEH5qebrB-Om5lduqg0zr-Is?usp=sharing

metodj · 2020-04-27T13:18:41Z

Thanks a lot @bernardmizzi ! Could you upload also the sentiment model weights?

bernardmizzi · 2020-04-27T13:31:37Z

The model is already pre-trained and can be used. I think the model weights are embedded within the model. To run finbert, all you need s the pythorch model bin file and its config.

metodj · 2020-04-27T14:03:48Z

Indeed, weights are embedded within a model. It's just that there are 2 different models on this repo, one is language model and one is sentiment model (see picture below). On your drive you uploaded the language model, could you upload the sentiment model too? Thanks!

bernardmizzi · 2020-04-27T17:48:32Z

You'll have to run the notebook finbert_training.ipynb since the model you are asking for is fine-tuned (trained) on a certain dataset, and that depends on which dataset you want

clone95 · 2020-04-27T18:34:12Z

I actually need it fine-tuned on financial news, so if you can upload the fine-tuned version of the sentiment-analysis one, I'd be glad! Thank you anyway.

metodj · 2020-04-28T06:05:04Z

@bernardmizzi you're right, didn't went carefully enough through the read me to notice that. Thanks for your help!
@clone95 I will fine-tune the model for the sentiment analysis in the following days and can then upload that version

akmalsabri · 2020-06-30T02:28:42Z

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

Hi, how to settle this issue?

bernardmizzi closed this as completed Mar 30, 2020

l0rem1psum mentioned this issue Mar 30, 2020

Unable to download models #15

Closed

Error running the configuring parameters cell #14

Error running the configuring parameters cell #14

Comments

bernardmizzi commented Mar 26, 2020

cskksdfklpz commented Mar 27, 2020

bernardmizzi commented Mar 27, 2020

bernardmizzi commented Mar 27, 2020

davidifshk commented Mar 30, 2020

bernardmizzi commented Mar 30, 2020

davidifshk commented Mar 30, 2020

l0rem1psum commented Mar 30, 2020

bernardmizzi commented Mar 30, 2020

bernardmizzi commented Mar 30, 2020

l0rem1psum commented Mar 30, 2020

bernardmizzi commented Mar 30, 2020

bernardmizzi commented Mar 30, 2020

l0rem1psum commented Mar 30, 2020

bernardmizzi commented Mar 30, 2020

l0rem1psum commented Mar 30, 2020

bernardmizzi commented Mar 30, 2020

l0rem1psum commented Mar 30, 2020

bernardmizzi commented Mar 30, 2020

davidifshk commented Mar 31, 2020

davidifshk commented Mar 31, 2020

davidifshk commented Mar 31, 2020

bernardmizzi commented Mar 31, 2020

davidifshk commented Apr 2, 2020

bernardmizzi commented Apr 2, 2020

saishashank85 commented Apr 3, 2020

bernardmizzi commented Apr 3, 2020

jarasny commented Apr 26, 2020

bernardmizzi commented Apr 27, 2020

metodj commented Apr 27, 2020

bernardmizzi commented Apr 27, 2020

metodj commented Apr 27, 2020

bernardmizzi commented Apr 27, 2020

clone95 commented Apr 27, 2020

metodj commented Apr 28, 2020

akmalsabri commented Jun 30, 2020