This repo contains the code used to run the experiments of Semi-supervised Ensemble Learning with Weak Supervision for Biomedical Relationship Extraction, presented in the Automated Knowledge Base Construction 2019 conference in Amherst, Massachusetts.
This specific methodology can be used as is to every relationship extraction problem, to extend training datasets to arbitrarily large weakly supervised datasets. If you are using it, please cite our paper
The code is based on snorkel v0.6.2, a framework for information extraction using weak supervision.
Snorkel uses Python 2.7 or Python 3 and requires a few python packages which can be installed using conda
and pip
.
Installation is easiest if you download and install conda
.
You can create a new conda environment with e.g.:
conda create -n py2Env python=2.7 anaconda
And then run the correct environment:
source activate py2Env
First install NUMBA, a package for high-performance numeric computing in Python via Conda:
conda install numba
Then install the remaining package requirements:
pip install --requirement python-package-requirement.txt
Finally, enable ipywidgets
:
jupyter nbextension enable --py widgetsnbextension --sys-prefix
Note: If you are using conda and experience issues with lxml
, try running conda install libxml2
.
Note: Currently the Viewer
is supported on the following versions:
jupyter
: 4.1jupyter notebook
: 4.2
In some tutorials, etc. we also use Stanford CoreNLP for pre-processing text; you will be prompted to install this when you run run.sh
.
After installing, just run:
./run_local.sh
The code used to perform the experiments for semi-supervised learning (using ML models as weak sources of supervision) can be found in /my-code/