Repository contains a codebase used in research on the extraction of semantic relations (brand-product). Research description and results are included in the paper: "Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations" published in LREC2020 conference.
Original repository used in development you can find here.
Two frameworks were used in the project. DVC for versioning the datasets and mlflow for tracking experiments. To manage the project with ease consider familiarize with them.
To setup the project in your machine perform following commands
Download repository:
$ git clone https://gitlab.clarin-pl.eu/team-semantics/semrel-extraction.git
Enter main folder:
$ cd semrel-extraction
Download datasets related to actual commit:
$ dvc pull
Then enter to docker folder:
$ cd docker
Copy credentials.template into credentials files and fill with correct access keys.
$ cp deps/credentials.template deps/credentials
Start docker:
$ docker-compose up
Repository also contains code for additional functionalities:
docker - docker configuration and execution environment for semrel package.
mlflow - configuration and execution environment for mlflow server used for tracking experiments.
spert - scripts used to prepare dataset in format required to train SpERT model.
worker - scripts and execution environment to use trained model as a worker.
Data is versioned by DVC which works like a git but for data. All data is stored on the remote storage (https://minio.clarin-pl.eu/minio/semrel/) in dvc folder. To retrieve data execute:
$ git checkout [branch_name]
$ git dvc checkout
DVC will download all data related to actual commit.
There is a script semrel/model/train.sh which starts training.
Adjust training params in semrel/model/config.yaml and then execute:
$ ./train.sh
Training result will be automatically uploaded to mlflow server.
Yes, to make mlflow log artifacts properly set environment variable, otherwise mlflow try to ping original Amazon S3 storage.
$ export MLFLOW_S3_ENDPOINT_URL=https://minio.clarin-pl.eu
add also config file filled with correct credentials:
$ echo "[default]" > ~/.aws/credentials
$ echo "aws_access_key_id = access_key" >> ~/.aws/credentials
$ echo "aws_secret_access_key = secret_key" >> ~/.aws/credentials