Semrel Extraction

Repository contains a codebase used in research on the extraction of semantic relations (brand-product). Research description and results are included in the paper: "Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations" published in LREC2020 conference.

Original repository used in development you can find here.

Frameworks

Two frameworks were used in the project. DVC for versioning the datasets and mlflow for tracking experiments. To manage the project with ease consider familiarize with them.

Setup project

To setup the project in your machine perform following commands

Download repository:
$ git clone https://gitlab.clarin-pl.eu/team-semantics/semrel-extraction.git

Enter main folder:
$ cd semrel-extraction

Download datasets related to actual commit:
$ dvc pull

Then enter to docker folder:
$ cd docker

Copy credentials.template into credentials files and fill with correct access keys.
$ cp deps/credentials.template deps/credentials

Start docker:
$ docker-compose up

Repository packages

Repository also contains code for additional functionalities:

docker - docker configuration and execution environment for semrel package.
mlflow - configuration and execution environment for mlflow server used for tracking experiments.
spert - scripts used to prepare dataset in format required to train SpERT model.
worker - scripts and execution environment to use trained model as a worker.

FAQ

Where is data stored?

Data is versioned by DVC which works like a git but for data. All data is stored on the remote storage (https://minio.clarin-pl.eu/minio/semrel/) in dvc folder. To retrieve data execute:

$ git checkout [branch_name]
$ git dvc checkout

DVC will download all data related to actual commit.

How to train and test a model?

There is a script semrel/model/train.sh which starts training. Adjust training params in semrel/model/config.yaml and then execute:
$ ./train.sh

Training result will be automatically uploaded to mlflow server.

Do I need to setup anything on my machine?

Yes, to make mlflow log artifacts properly set environment variable, otherwise mlflow try to ping original Amazon S3 storage.

$ export MLFLOW_S3_ENDPOINT_URL=https://minio.clarin-pl.eu

add also config file filled with correct credentials:

$ echo "[default]" > ~/.aws/credentials
$ echo "aws_access_key_id = access_key" >> ~/.aws/credentials
$ echo "aws_secret_access_key = secret_key" >> ~/.aws/credentials

Name		Name	Last commit message	Last commit date
Latest commit History 1,510 Commits
.dvc		.dvc
docker		docker
mlflow		mlflow
semrel		semrel
spert		spert
worker		worker
.gitignore		.gitignore
LREC_BP.pdf		LREC_BP.pdf
README.md		README.md
_corpora.dvc		_corpora.dvc
_elmo.dvc		_elmo.dvc
_elmo.map.dvc		_elmo.map.dvc
_fasttext.dvc		_fasttext.dvc
_fasttext.map.dvc		_fasttext.map.dvc
_relations.files.list.dvc		_relations.files.list.dvc
_relations.tsv.dvc		_relations.tsv.dvc
_retrofit.map.dvc		_retrofit.map.dvc
_sent2vec.dvc		_sent2vec.dvc
_spert.indices.dvc		_spert.indices.dvc
_spert.jsons.dvc		_spert.jsons.dvc
_vectors.dvc		_vectors.dvc
_vectors.ner.dvc		_vectors.ner.dvc
_vectors.sent2vec.dvc		_vectors.sent2vec.dvc
arch-diagram.svg		arch-diagram.svg
setup.py		setup.py
tox.ini		tox.ini

lkopocinski/semrel-extraction

Folders and files

Latest commit

History

Repository files navigation

Semrel Extraction

Frameworks

Setup project

Repository packages

FAQ

Where is data stored?

How to train and test a model?

Do I need to setup anything on my machine?

How it works?

About

Topics

Resources

Stars

Watchers

Forks

Languages