From 1a78ed6396420f77b1dafeb1d55bf7d6b5acbe80 Mon Sep 17 00:00:00 2001 From: Braden Hancock Date: Fri, 12 Jul 2019 14:16:04 -0700 Subject: [PATCH] Add informational banner to README about forthcoming v0.9 (#1260) * Add informational banner to README about forthcoming v0.9 * Update README banner * Update bullets in README banner --- README.md | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 9bab27bd9..330eaf712 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,26 @@ +# ANNOUNCEMENT: +**Snorkel v0.9 is being released this summer!** -**_v0.7.0-beta_** +Why is it 0.9 when the last version was 0.7? Because this is more than just an incremental change. It's a full redesign from the ground up, including: +* Support for new training data operators: labeling functions (LFs), transformation functions (TFs), and slicing functions (SFs) +* A new matrix-completion-based approach for learning LF accuracies and correlation structure +* Native support for multi-task learning (MTL), transfer learning (TL), and complex model flows +* A Snorkel 101 guide that provides a gentle introduction to the technology and API for first-time users +* A fresh batch of tutorials demonstrating different use cases and integrations +* A more modular form factor that makes it easier to integrate with other libraries +* A commitment to stability, with full coverage unit tests, type checking, and doc strings + +The new version will be available via `pip` and `conda` but **will not be backwards compatible**. The current (v0.7) Snorkel code will be moved to another repository to continue to support existing applications that depend on it. + +As part of this refactor, we will be bringing under one roof a number of projects in the Snorkel ecosystem that have previously been posted in separate repositories—[Snorkel](https://github.com/HazyResearch/snorkel), [Snorkel MeTaL](https://github.com/HazyResearch/metal), [TANDA](https://github.com/HazyResearch/tanda), etc.—and which have been used to achieve state-of-the-art results on the [GLUE](https://dawn.cs.stanford.edu/2019/03/22/glue/) and [SuperGLUE](https://hazyresearch.github.io/snorkel/blog/superglue.html) benchmarks, automate [cardiac MRI classification](https://www.biorxiv.org/content/10.1101/339630v1) and [genetic research database curation](https://ai.stanford.edu/~kuleshov/papers/gwaskb-manuscript.pdf) (as featured in two forthcoming Nature papers), and extract information from electronic health record (EHR) data for national [medical device surveillance](https://arxiv.org/abs/1904.07640). + +If you'd like to stay in the loop on the latest news in the Snorkel ecosystem, join the [Snorkel Google Group](https://groups.google.com/forum/#!forum/snorkel-ml). We'll keep you posted! + +--- + +**_v0.7.0_** [![Build Status](https://travis-ci.org/HazyResearch/snorkel.svg?branch=master)](https://travis-ci.org/HazyResearch/snorkel) [![Documentation](https://readthedocs.org/projects/snorkel/badge/)](http://snorkel.readthedocs.io/en/master/) @@ -11,14 +30,14 @@ * For the latest news, blog posts, tutorials, papers, etc. related to Snorkel, check out **[snorkel.stanford.edu](http://snorkel.stanford.edu)**! * Get [set up](#quick-start) quickly -* Try the [tutorials](#tutorials) +* Try the [tutorials](#tutorials) * Read the [documentation](http://snorkel.readthedocs.io/en/master/) ## Motivation Snorkel is a system for rapidly **creating, modeling, and managing training data.** -Today's state-of-the-art machine learning models require _massive_ labeled training sets--which usually do not exist for real-world applications. Instead, Snorkel is based around the new [data programming](https://papers.nips.cc/paper/6523-data-programming-creating-large-training-sets-quickly) paradigm, in which the developer focuses on writing a set of labeling functions, which are just scripts that programmatically label data. +Today's state-of-the-art machine learning models require _massive_ labeled training sets--which usually do not exist for real-world applications. Instead, Snorkel is based around the new [data programming](https://papers.nips.cc/paper/6523-data-programming-creating-large-training-sets-quickly) paradigm, in which the developer focuses on writing a set of labeling functions, which are just scripts that programmatically label data. The resulting labels are noisy, but Snorkel automatically models this process—learning, essentially, which labeling functions are more accurate than others—and then uses this to train an end model (for example, a deep neural network in TensorFlow). @@ -219,7 +238,7 @@ It's usually most convenient to write most code in an external `.py` file, and l ``` A more convenient option is to add these lines to your IPython config file, in `~/.ipython/profile_default/ipython_config.py`: ``` -c.InteractiveShellApp.extensions = ['autoreload'] +c.InteractiveShellApp.extensions = ['autoreload'] c.InteractiveShellApp.exec_lines = ['%autoreload 2'] ```