Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ajratner committed Dec 6, 2018
1 parent aa6d175 commit f28b0fc
Showing 1 changed file with 7 additions and 14 deletions.
21 changes: 7 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,25 +17,18 @@
* Get set up quickly [below](#quick-start)
* Try the tutorials with [these instructions](#tutorials)
* Documentation [here](http://snorkel.readthedocs.io/en/master/)
* For blog posts, tutorials, papers, and more, check out [snorkel.stanford.edu](snorkel.stanford.edu)!

## Motivation

**Check out a recent one-pager about [Snorkel and the Software 2.0 vision](https://cs.stanford.edu/~chrismre/papers/Chris_Re-KDD.pdf)!**

Snorkel is a system for rapidly **creating, modeling, and managing training data**, currently focused on accelerating the development of _structured or "dark" data extraction applications_ for domains in which large labeled training sets are not available or easy to obtain.

Today's state-of-the-art machine learning models require massive labeled training sets--which usually do not exist for real-world applications. Instead, Snorkel is based around the new [data programming](https://papers.nips.cc/paper/6523-data-programming-creating-large-training-sets-quickly) paradigm, in which the developer focuses on writing a set of labeling functions, which are just scripts that programmatically label data. The resulting labels are noisy, but Snorkel automatically models this process—learning, essentially, which labeling functions are more accurate than others—and then uses this to train an end model (for example, a deep neural network in TensorFlow).
Snorkel is a system for rapidly **creating, modeling, and managing training data.**
Today's state-of-the-art machine learning models require _massive_ labeled training sets--which usually do not exist for real-world applications. Instead, Snorkel is based around the new [data programming](https://papers.nips.cc/paper/6523-data-programming-creating-large-training-sets-quickly) paradigm, in which the developer focuses on writing a set of labeling functions, which are just scripts that programmatically label data.
The resulting labels are noisy, but Snorkel automatically models this process—learning, essentially, which labeling functions are more accurate than others—and then uses this to train an end model (for example, a deep neural network in TensorFlow).

_Surprisingly_, by modeling a noisy training set creation process in this way, we can take potentially low-quality labeling functions from the user, and use these to train high-quality end models. We see Snorkel as providing a general framework for many [_weak supervision_](http://hazyresearch.github.io/snorkel/blog/weak_supervision.html) techniques, and as defining a new programming model for weakly-supervised machine learning systems.
_Surprisingly_, by modeling a noisy training set creation process in this way, we can take potentially low-quality labeling functions from the user, and use these to train high-quality end models.
We see Snorkel as providing a general framework for many [_weak supervision_](http://hazyresearch.github.io/snorkel/blog/weak_supervision.html) techniques, and as defining a new programming model for weakly-supervised machine learning systems.

### Snorkel @ VLDB 2018
<img src="figs/vldb2018_logo.png" width="200" height="85" align="middle" />

We're excited to be presenting on Snorkel at this year's [VLDB conference in Rio De Janeiro](http://vldb2018.lncc.br/), on **Tuesday 8/28 in the "Database Techniques for Machine Learning" session**. We're also honored that the corresponding paper, **_[Snorkel: Rapid Training Data Creation with Weak Supervision](https://arxiv.org/abs/1711.10160)_**, has been invited to the annual **"Best Of VLDB" Special Issue!**

* [Poster](https://ajratner.github.io/assets/papers/Snorkel_VLDB_2018_poster.pdf)
* Slides [Coming Soon]
* Check out the [Intro Tutorial](https://github.com/HazyResearch/snorkel/tree/master/tutorials/intro)!
**Check out a recent one-pager about [Snorkel and the Software 2.0 vision](https://cs.stanford.edu/~chrismre/papers/Chris_Re-KDD.pdf)!**

## Users \& Sponsors
We're lucky to have some amazing collaborators who are currently using Snorkel!
Expand Down

0 comments on commit f28b0fc

Please sign in to comment.