FinBERT

FinBERT is a BERT model trained on financial communication text. The purpose is to enhance finaincal NLP research and practice. It is trained on the following three finanical communication corpus. The total corpora size is 4.9B tokens.

Corporate Reports 10-K & 10-Q: 2.5B tokens
Earnings Call Transcripts: 1.3B tokens
Analyst Reports: 1.1B tokens

FinBERT results in state-of-the-art performance on financial sentiment classification task, which is a core financial NLP task. With the release of FinBERT, we hope practitioners and researchers can utilize FinBERT for a wider range of applications where the prediction target goes beyond sentiment, such as financial-related outcomes including stock returns, stock volatilities, corporate fraud, etc.

Download FinBERT

We provide four versions of pre-trained weights.

FinBERT-FinVocab-Uncased (Recommended)
FinBERT-FinVocab-Cased
FinBERT-BaseVocab-Uncased
FinBERT-BaseVocab-Cased

FinVocab is a new WordPiece vocabulary on our finanical corpora using the SentencePiece library. We produce both cased and uncased versions of FinVocab, with sizes of 28,573 and 30,873 tokens respectively. This is very similar to the 28,996 and 30,522 token sizes of the original BERT cased and uncased BaseVocab.

Using FinBERT for financial sentiment classification

Finanical sentiment classification is a core NLP task in finance. FinBERT is shown to outperform vanilla BERT model on several financial sentiment classification task. Since FinBERT is in the same format as BERT, please refer to Google's BERT repo for downstream tasks.

As a demostration, We provide a script for fine-tuning FinBERT for Finanical Phrase Bank dataset, a financial sentiment classification dataset. We also provide a jupyter notebook to show how to load a fine tuned model, and then use it to predict on novel sentences. In the jupyter notebook, one can see 2 models, FinBert-FinVocab-Uncased and a Naive Bayes Model. Both Model were FineTuned on the 10K HKUST dataset, as mentioned in the paper.

Downloading Financial Phrase Bank Dataset

The datase is collected by Malo et al. 2014, and can be downloaded from this link. The zip file for the Financial Phrase Bank Dataset has been provided for ease of download and use.

Environment:

To set up the evironment used to train and test the model, run pip install -r requirements.txt
We would like to give special thanks to the creators of pytorch-pretrained-bert (i.e. pytorch-transformers)

In order to fine-tune FinBERT on the Financial Phrase Bank dataset, please run the script as follows:

python train_bert.py --cuda_device (cuda:device_id) --output_path (output directory) --vocab (vocab chosen)
--vocab_path (path to new vocab txt file) --data_dir (path to downloaded dataset) --weight_path (path to downloaded weights)

There are 4 kinds of vocab to choose from: FinVocab-Uncased, FinVocab-Cased, and Google's BERT Base-Uncased and Base-Cased.

Note to run the script, one should first download the model weights, and the Financial Phrase Bank Dataset.

Citation

@misc{yang2020finbert,
    title={FinBERT: A Pretrained Language Model for Financial Communications},
    author={Yi Yang and Mark Christopher Siy UY and Allen Huang},
    year={2020},
    eprint={2006.08097},
    archivePrefix={arXiv},
    }

Contact

Please post a Github issue or contact imyiyang@ust.hk if you have any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
FinBert Model Example.ipynb		FinBert Model Example.ipynb
FinancialPhraseBank-v1.0.zip		FinancialPhraseBank-v1.0.zip
LICENSE		LICENSE
README.md		README.md
bertModel.py		bertModel.py
datasets.py		datasets.py
naive_tfidf.joblib		naive_tfidf.joblib
requirements.txt		requirements.txt
train_bert.py		train_bert.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinBERT

Download FinBERT

Using FinBERT for financial sentiment classification

Downloading Financial Phrase Bank Dataset

Environment:

Citation

Contact

About

Releases

Packages

Languages

License

schatimo/FinBERT

Folders and files

Latest commit

History

Repository files navigation

FinBERT

Download FinBERT

Using FinBERT for financial sentiment classification

Downloading Financial Phrase Bank Dataset

Environment:

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages