Skip to content

Research on the extraction of semantic relations (brand-product). Code for paper: "Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations"

Notifications You must be signed in to change notification settings

lkopocinski/semrel-extraction

Repository files navigation

Semrel Extraction

Repository contains a codebase used in research on the extraction of semantic relations (brand-product). Research description and results are included in the paper: "Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations" published in LREC2020 conference.

Original repository used in development you can find here.

Frameworks

Two frameworks were used in the project. DVC for versioning the datasets and mlflow for tracking experiments. To manage the project with ease consider familiarize with them.

Setup project

To setup the project in your machine perform following commands

Download repository:
$ git clone https://gitlab.clarin-pl.eu/team-semantics/semrel-extraction.git

Enter main folder:
$ cd semrel-extraction

Download datasets related to actual commit:
$ dvc pull

Then enter to docker folder:
$ cd docker

Copy credentials.template into credentials files and fill with correct access keys.
$ cp deps/credentials.template deps/credentials

Start docker:
$ docker-compose up

Repository packages

Repository also contains code for additional functionalities:

docker - docker configuration and execution environment for semrel package.
mlflow - configuration and execution environment for mlflow server used for tracking experiments.
spert - scripts used to prepare dataset in format required to train SpERT model.
worker - scripts and execution environment to use trained model as a worker.

FAQ

Where is data stored?

Data is versioned by DVC which works like a git but for data. All data is stored on the remote storage (https://minio.clarin-pl.eu/minio/semrel/) in dvc folder. To retrieve data execute:

$ git checkout [branch_name]
$ git dvc checkout

DVC will download all data related to actual commit.

How to train and test a model?

There is a script semrel/model/train.sh which starts training. Adjust training params in semrel/model/config.yaml and then execute:
$ ./train.sh

Training result will be automatically uploaded to mlflow server.

Do I need to setup anything on my machine?

Yes, to make mlflow log artifacts properly set environment variable, otherwise mlflow try to ping original Amazon S3 storage.

$ export MLFLOW_S3_ENDPOINT_URL=https://minio.clarin-pl.eu

add also config file filled with correct credentials:

$ echo "[default]" > ~/.aws/credentials
$ echo "aws_access_key_id = access_key" >> ~/.aws/credentials
$ echo "aws_secret_access_key = secret_key" >> ~/.aws/credentials

How it works?

Project diagram

About

Research on the extraction of semantic relations (brand-product). Code for paper: "Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published