Skip to content
A collection of tutorials for Snorkel
Python Shell Dockerfile
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github Add exempt label (#149) Sep 17, 2019
crowdsourcing Remove probabilistic in crowdsourcing tutorial (#167) Oct 29, 2019
drybell Update drybell to include content-heuristic LF (#176) Nov 19, 2019
getting_started Update download_data.sh (#152) Sep 17, 2019
multitask Fix links, seed, and flatten (#173) Nov 19, 2019
recsys Fix links, seed, and flatten (#173) Nov 19, 2019
scripts Ignore md files when deciding what envs to test (#139) Aug 28, 2019
spam Update links to slicing TR (#171) Nov 19, 2019
spouse Use Bi-LSTM model for Spouse Demo (#150) Sep 20, 2019
visual_relation Fix visual_relation timeout (#175) Nov 19, 2019
.dockerignore Add Docker build (#38) Jul 29, 2019
.flake8 Lighter-weight 101 intro version of spam tutorial (#37) Aug 14, 2019
.gitattributes Repo diff settings (#8) Jul 19, 2019
.gitignore Add nohup and swp files to gitignore (#54) Aug 7, 2019
.travis.yml Travis style (#136) Aug 22, 2019
.web.yml Restore output rendering (#105) Aug 15, 2019
CONTRIBUTING.md Make syncing and testing faster (#135) Aug 22, 2019
Dockerfile Update install (#126) Aug 15, 2019
LICENSE Initial commit Jul 15, 2019
README.md Add note on Snorkel version to repo (#156) Sep 27, 2019
pyproject.toml Lighter-weight 101 intro version of spam tutorial (#37) Aug 14, 2019
requirements.txt Remove probabilistic in crowdsourcing tutorial (#167) Oct 29, 2019
tox.ini Make syncing and testing faster (#135) Aug 22, 2019

README.md

Snorkel Tutorials

Snorkel Version Python Version build license

A collection of tutorials for Snorkel. For more information, visit the Snorkel website.

Contents

Tutorials

We recommend that all users start with the Getting Started tutorial hosted on the Snorkel website for a gentle introduction to the concepts and classes of Snorkel. All other tutorials assume that you have already completed that tutorial and are familiar with its concepts. After that, the spam directory contains three tutorials introducing labeling functions, transformation functions, and slicing functions, respectively. The rest of the tutorials explore other tasks, techniques, and integrations.

The Snorkel tutorials are grouped by application:

  • spam: Is this YouTube comment spam?
  • spouse: Does this sentence imply that the two marked people are spouses?
  • visual_relation: Is object A riding object B in the image, carrying it, or neither?
  • crowdsourcing: Is this tweet about the weather expressing a positive, negative or neutral sentiment?
  • multitask (Multi-Task Learning): A synthetic task demonstrating the native Snorkel multi-task classifier API
  • recsys (Recommender Systems): Will this user read and like this book?
  • drybell: Is a celebrity mentioned in this news article?

Here we provide an index pointing to different available tutorials by their task type, techniques, and integrations.

  • Task
    • Text Classification (Text): spam, crowdsourcing, drybell
    • Relation Extraction (Text): spouse
    • Visual Relationship Detection (Image): visual_relation
    • Recommender Systems: recsys
  • Techniques
    • Labeling with Labeling Functions (LFs): spam, spouse, visual_relation, crowdsourcing
    • Augmentation with Transformation Functions (TFs): spam
    • Monitoring with Slicing Functions (SFs): spam
    • Using Crowdworker Labels: crowdsourcing
    • Multi-Task Learning (MTL): multitask, spam
  • Integrations
    • TensorFlow/Keras: spam, spouse
    • Scikit-learn: spam, crowdsourcing
    • PyTorch: multitask, visual_relation
    • Dask: drybell
    • Spark: drybell

Getting Started

Step one is cloning this repo.

git clone https://github.com/snorkel-team/snorkel-tutorials.git
cd snorkel-tutorials

As with Snorkel, our tutorials require Python 3.6+. If you're looking to quickly get started with a tutorial, we recommend using our Docker setup. If you want to install things yourself using pip or conda, you can follow our installation steps below instead.

Snorkel version

This tutorials repo is pinned to a specific version of the Snorkel library, which is specified in the requirements file. Note that this will likely not be up to date with the master branch in the main Snorkel repo. We recommend using virtual environments or Docker containers to run the tutorials, so check out the details below.

A quick note for Windows users

If you're using Windows, we highly recommend using the Docker setup or the Linux subsystem. It can be tricky to get the installation right using application-specific shells (e.g. the conda shell). Additionally, the shell scripts included in this repo (such as those for downloading datasets) use *nix-style commands.

Running with Docker

We've included a Docker setup for our tutorials to make setup easy. First, make sure you have Docker installed on your machine. To build and run a Docker image for a tutorial, use scripts/docker_launch.py with the --build flag. For example, run the following for the spam tutorial:

python3 scripts/docker_launch.py spam --build

Building a Docker image from scratch can take anywhere between 5 and 30 minutes depending on the machine you're using. We're working on making prebuilt images available via DockerHub.

Once the image has been built, a Jupyter notebook server will be available on port 8888 (you can change the port with the --port command line option) and print out a link you can follow to access the browser interface. In your browser, open a .ipynb file you would like to run — such as 01_spam_tutorial.ipynb — and execute the cells in sequence.

Once you've built a tutorial-specific image for the first time, you can run it without the --build flag:

python3 scripts/docker_launch.py spam

Installing yourself

Running a tutorial has three required steps if you're installing yourself:

  1. Installing repo-wide requirements
  2. Installing tutorial-specific requirements
  3. Launching a Jupyter notebook server or executing as a script

We recommend installing requirements in a virtual environment using virtualenv or conda.

The following example commands show you how to install the requirements for the spam tutorial, then launch a notebook server to run the tutorial. To run a different tutorial, simply replace spam with the desired directory.

Installing with pip

These commands assume that your Python version is 3.6+ and that the Python 3 version of pip is available as pip3. It may be available as pip depending on how your system is configured.

# [OPTIONAL] Activate a virtual environment
pip3 install --upgrade virtualenv
virtualenv -p python3 .envspam
source .envspam/bin/activate

# Install requirements (both shared and tutorial-specific)
pip3 install -r requirements.txt
pip3 install -r spam/requirements.txt

# Launch the Jupyter notebook interface
jupyter notebook spam

Installing with conda

These commands assume that your conda installation is Python 3.6+.

# [OPTIONAL] Activate a virtual environment
conda create --yes -n spam python=3.6
conda activate spam

# Install requirements (both shared and tutorial-specific)
pip install environment_kernels
# We specify PyTorch here to ensure compatibility, but it may not be necessary.
conda install pytorch==1.1.0 -c pytorch
conda install snorkel==0.9.0 -c conda-forge
pip install -r spam/requirements.txt

# Launch the Jupyter notebook interface
jupyter notebook spam

Make sure to select the right kernel (conda_spam) when running the jupyter notebook.

Then in the browser tab that opens, navigate to a .ipynb file you would like to run — such as 01_spam_tutorial.ipynb — and execute the cells in sequence.

Alternatively, you can run the tutorial as a script by calling python3 on the corresponding .py file directly (e.g. python3 spam/01_spam_tutorial.py). The .py source files are written in Jupytext percent format, and contain the same content as the notebooks.

Contributing to Snorkel Tutorials

If you're interested in improving existing tutorials or contributing new tutorials, check out our contributing guidelines.

You can’t perform that action at this time.