Skip to content
This is a small workshop put together to demonstrate how GenoML works - presented at the IPDGC conference
Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.
plink Adding PLINK path fix to determine mac vs linux Dec 10, 2019
scripts Merge pull request #1 from dsaffo/master Dec 12, 2019
GenoML_IPDGC_Demo.ipynb Update Dec 17, 2019

GenoML Workshop for IPDGC (in Notebook Format)

  • Authors: Hampton Leonard, Mary Makarious, Juan A. Botia, Faraz Faghri, David Saffo, and Mike Nalls
  • Project: GenoML Demo for IPDGC London 2019
  • PIs: Mike Nalls, PhD and Andrew Singleton, PhD
  • Collaborators: Full list of collaborators can be found here
  • Date Last Updated: 12.13.2019
    • Last Update: Fixed tune + added seed to limit randomness

Getting Started

We will be running everything in a virtual environment (no downloads necessary!)


PLEASE NOTE: Use Chrome to run this! The Binder environment might take several minutes to start up

Link to Google Slides: here

What is GenoML?

GenoML is an automated Machine Learning (autoML) tool that optimizes basic machine learning pipelines for genomic data. In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. The first steps toward simplifying machine learning involved developing simple, unified interfaces to a variety of machine learning algorithms (like Python's scikit-learn). Although languages like R and Python have made it easy to experiment with machine learning, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Development of machine learning models for genomic data in particular are notoriously difficult for a non-expert to tune properly.

In order for machine learning software to truly be accessible to non-experts, we have designed an easy-to-use interface which automates the process of training a large selection of candidate models. GenoML will automate the most tedious part of machine learning by intelligently exploring many possible models to find the best one for your data. This notebook is a sampling of what the Python package will be able to do, and only focuses on discrete supervised learning. These scripts run in a notebook for the sake of the workshop, whereas the package will run directly from your command line.

You can read more on our website here

You can’t perform that action at this time.