The following repository includes the code and pre-trained cross-lingual word embeddings from the paper Learning Company Embeddings from Annual Reports for Fine-grained Industry Characterization (FinNLP 2020).
In the following you can find how to easily get your relation embeddings given a corpus.
Requirements:
- Python 3.8.2
You can download the dataset and models from the followlig link
A number of optional parameters can be specified to your needs:
*-training (English) *: English/learning.ipynb.
*-training (English) *: English/evaluation.ipynb.
*-training (Japanese) *: Japanese/learning.ipynb.
*-training (Japanese) *: Japanese/evaluation.ipynb.
If you use any of these resources, please cite the following paper:
@InProceedings{tomokicompany2vec,
author = "Tomoki Ito, Jose Camacho Collados, Hiroki Sakaji and Steven Schockaert",
title = "Learning Company Embeddings from Annual Reports for Fine-grained Industry Characterization",
booktitle = "Proceedings of The 2nd Workshop on Financial Technology and Natural Language Processing",
year = "2020"
}
Code and data in this repository are released open-source.
Copyright (C) 2020, Tomoki Ito.