GenoML Workshop for IPDGC (in Notebook Format)
- Authors: Hampton Leonard, Mary Makarious, Juan A. Botia, Faraz Faghri, David Saffo, and Mike Nalls
- Project: GenoML Demo for IPDGC London 2019
- PIs: Mike Nalls, PhD and Andrew Singleton, PhD
- Collaborators: Full list of collaborators can be found here
- Date Last Updated: 12.13.2019
- Last Update: Fixed tune + added seed to limit randomness
We will be running everything in a virtual environment (no downloads necessary!)
PLEASE NOTE: Use Chrome to run this! The Binder environment might take several minutes to start up
What is GenoML?
GenoML is an automated Machine Learning (autoML) tool that optimizes basic machine learning pipelines for genomic data. In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. The first steps toward simplifying machine learning involved developing simple, unified interfaces to a variety of machine learning algorithms (like Python's scikit-learn). Although languages like R and Python have made it easy to experiment with machine learning, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Development of machine learning models for genomic data in particular are notoriously difficult for a non-expert to tune properly.
In order for machine learning software to truly be accessible to non-experts, we have designed an easy-to-use interface which automates the process of training a large selection of candidate models. GenoML will automate the most tedious part of machine learning by intelligently exploring many possible models to find the best one for your data. This notebook is a sampling of what the Python package will be able to do, and only focuses on discrete supervised learning. These scripts run in a notebook for the sake of the workshop, whereas the package will run directly from your command line.
You can read more on our website here