Skip to content

Commit

Permalink
edit tutorials documentation (#4098)
Browse files Browse the repository at this point in the history
* edit tutorials landing page
links in sidebar should link directly to tutorials
improve tutorials
remove types and missingness tutorials

* fix bad copypasta

* fix bad deletion

* strip metadata from ipynb files

* add types/missingness stuff to expr tutorial
address comments

fix duplicate link to python api

* edits
  • Loading branch information
maccum authored and danking committed Aug 17, 2018
1 parent e395a65 commit f320c54
Show file tree
Hide file tree
Showing 12 changed files with 423 additions and 795 deletions.
9 changes: 7 additions & 2 deletions python/hail/docs/api.rst
Expand Up @@ -9,6 +9,9 @@ on the Python programming interface.

Use ``import hail as hl`` to access this functionality.

Classes
~~~~~~~

.. autosummary::
:nosignatures:
:toctree: ./
Expand All @@ -19,7 +22,8 @@ Use ``import hail as hl`` to access this functionality.
hail.MatrixTable
hail.GroupedMatrixTable

.. rubric:: Modules
Modules
~~~~~~~

.. toctree::
:maxdepth: 1
Expand All @@ -37,7 +41,8 @@ Use ``import hail as hl`` to access this functionality.
plot <plot>
experimental <experimental>

.. rubric:: Module functions
Top-Level Functions
~~~~~~~~~~~~~~~~~~~

.. autofunction:: hail.init
.. autofunction:: hail.stop
Expand Down
116 changes: 9 additions & 107 deletions python/hail/docs/tutorials-landing.rst
Expand Up @@ -7,112 +7,14 @@ Hail Tutorials
To take Hail for a test drive, go through our tutorials. These can be viewed here in the documentation,
but we recommend instead that you run them yourself with Jupyter.


Hail Overview
=============

This notebook is designed to provide a broad overview of Hail’s functionality, with emphasis on the
functionality to manipulate and query a genetic dataset. We walk through a genome-wide SNP association
test, and demonstrate the need to control for confounding caused by population stratification.

.. toctree::
:maxdepth: 2

1: Genome-Wide Association Study<tutorials/01-genome-wide-association-study.ipynb>

Types
=====

This notebook is a hands-on introduction to Hail's type system.

.. toctree::
:maxdepth: 2

2: Types<tutorials/02-types.ipynb>

Expressions
===========

This notebook is a hands-on introduction to manipulating and understanding the
:class:`.Expression` interface in Hail.

.. toctree::
:maxdepth: 2

3: Expressions<tutorials/03-expressions.ipynb>

Missingness
===========

This notebook walks through how missing data is handled inside Hail.

.. toctree::
:maxdepth: 2

4: Missingngess<tutorials/04-missingness.ipynb>

Tables
======

This notebook introduces the :class:`.Table` interface.

.. toctree::
:maxdepth: 2

5: Tables<tutorials/05-tables.ipynb>

Aggregation
===========

This notebook explores ways to aggregate data in Hail.

.. toctree::
:maxdepth: 2

6: Aggregation<tutorials/06-aggregation.ipynb>

Filtering and Annotation
========================

This notebook demonstrates the interfaces for filtering tables and
annotating new fields.

.. toctree::
:maxdepth: 2

7: Filtering and Annotation<tutorials/07-filter-annotate.ipynb>

Joins
=====

This notebook walks through how to join tables together in Hail.

.. toctree::
:maxdepth: 2

8: Joins<tutorials/08-joins>


Matrix Tables
=============

This notebook communicates some understanding of the :class:`.MatrixTable`
data representation used in the GWAS tutorial, and some of the operations
that are possible (and easy!) in this interface.

.. toctree::
:maxdepth: 2

9: MatrixTable<tutorials/09-matrixtable.ipynb>


Plot Examples
=============

This notebook contains examples of how to use the plotting functions in Hail's
:mod:`.plot` module.

.. toctree::
:maxdepth: 2
:maxdepth: 1

Plotting<tutorials/plotting.ipynb>
GWAS Tutorial <tutorials/01-genome-wide-association-study.ipynb>
Expression Tutorial <tutorials/03-expressions.ipynb>
Table Tutorial <tutorials/05-tables.ipynb>
Aggregation Tutorial <tutorials/06-aggregation.ipynb>
Filtering and Annotation Tutorial <tutorials/07-filter-annotate.ipynb>
Table Joins Tutorial <tutorials/08-joins>
MatrixTable Tutorial <tutorials/09-matrixtable.ipynb>
Plotting Tutorial<tutorials/plotting.ipynb>
66 changes: 22 additions & 44 deletions python/hail/docs/tutorials/01-genome-wide-association-study.ipynb
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview\n",
"## GWAS Tutorial\n",
"\n",
"This notebook is designed to provide a broad overview of Hail's functionality, with emphasis on the functionality to manipulate and query a genetic dataset. We walk through a genome-wide SNP association test, and demonstrate the need to control for confounding caused by population stratification.\n"
]
Expand Down Expand Up @@ -46,7 +46,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Check for tutorial data or download if necessary\n",
"### Check for tutorial data or download if necessary\n",
"\n",
"This cell downloads the necessary data if it isn't already present."
]
Expand All @@ -64,7 +64,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading data from disk\n",
"### Loading data from disk\n",
"\n",
"Hail has its own internal data representation, called a MatrixTable. This is both an on-disk file format and a [Python object](https://hail.is/docs/devel/hail.MatrixTable.html#hail.MatrixTable). Here, we read a MatrixTable from disk.\n",
"\n",
Expand All @@ -84,7 +84,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting to know our data\n",
"### Getting to know our data\n",
"\n",
"It's important to have easy ways to slice, dice, query, and summarize a dataset. Some of these methods are demonstrated below."
]
Expand Down Expand Up @@ -119,9 +119,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"metadata": {},
"outputs": [],
"source": [
"mt.s.show(5)"
Expand All @@ -139,9 +137,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"metadata": {},
"outputs": [],
"source": [
"mt.entry.take(5)"
Expand All @@ -151,7 +147,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adding column fields\n",
"### Adding column fields\n",
"\n",
"A Hail MatrixTable can have any number of row fields and column fields for storing data associated with each row and column. Annotations are usually a critical part of any genetic study. Column fields are where you'll store information about sample phenotypes, ancestry, sex, and covariates. Row fields can be used to store information like gene membership and functional impact for use in QC or analysis. \n",
"\n",
Expand Down Expand Up @@ -193,9 +189,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"metadata": {},
"outputs": [],
"source": [
"table.describe()"
Expand Down Expand Up @@ -227,9 +221,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"metadata": {},
"outputs": [],
"source": [
"print(mt.col.dtype)"
Expand All @@ -254,9 +246,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"metadata": {},
"outputs": [],
"source": [
"print(mt.col.dtype)"
Expand All @@ -275,7 +265,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Query functions and the Hail Expression Language\n",
"### Query functions and the Hail Expression Language\n",
"\n",
"Hail has a number of useful query functions that can be used for gathering statistics on our dataset. These query functions take Hail Expressions as arguments.\n",
"\n",
Expand Down Expand Up @@ -358,9 +348,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"metadata": {},
"outputs": [],
"source": [
"pprint(mt.aggregate_cols(agg.stats(mt.CaffeineConsumption)))"
Expand Down Expand Up @@ -442,7 +430,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Quality Control\n",
"### Quality Control\n",
"\n",
"QC is where analysts spend most of their time with sequencing datasets. QC is an iterative process, and is different for every project: there is no \"push-button\" solution for QC. Each time the Broad collects a new group of samples, it finds new batch effects. However, by practicing open science and discussing the QC process and decisions with others, we can establish a set of best practices as a community."
]
Expand Down Expand Up @@ -491,9 +479,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"metadata": {},
"outputs": [],
"source": [
"p = hl.plot.histogram(mt.sample_qc.call_rate, range=(.88,1), legend='Call Rate')\n",
Expand Down Expand Up @@ -537,9 +523,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"metadata": {},
"outputs": [],
"source": [
"p.renderers.extend(\n",
Expand Down Expand Up @@ -694,7 +678,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Let's do a GWAS!\n",
"### Let's do a GWAS!\n",
"\n",
"First, we need to restrict to variants that are : \n",
"\n",
Expand Down Expand Up @@ -762,7 +746,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Confounded!\n",
"### Confounded!\n",
"\n",
"The observed p-values drift away from the expectation immediately. Either every SNP in our dataset is causally linked to caffeine consumption (unlikely), or there's a confounder.\n",
"\n",
Expand Down Expand Up @@ -810,9 +794,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"metadata": {},
"outputs": [],
"source": [
"p = hl.plot.scatter(pca_scores.scores[0], pca_scores.scores[1],\n",
Expand Down Expand Up @@ -844,9 +826,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"metadata": {},
"outputs": [],
"source": [
"p = hl.plot.qq(linear_regression_results.linreg.p_value)\n",
Expand Down Expand Up @@ -905,17 +885,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Rare variant analysis\n",
"### Rare variant analysis\n",
"\n",
"Here we'll demonstrate how one can use the expression language to group and count by any arbitrary properties in row and column fields. Hail also implements the sequence kernel association test (SKAT).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"metadata": {},
"outputs": [],
"source": [
"entries = mt.entries()\n",
Expand Down Expand Up @@ -976,7 +954,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Epilogue\n",
"### Epilogue\n",
"\n",
"Congrats! You've reached the end of the first tutorial. To learn more about Hail's API and functionality, take a look at the other tutorials. You can check out the [Python API](https://hail.is/docs/devel/api.html#python-api) for documentation on additional Hail functions. If you use Hail for your own science, we'd love to hear from you on [Zulip chat](https://hail.zulipchat.com) or the [discussion forum](http://discuss.hail.is).\n",
"\n",
Expand Down

0 comments on commit f320c54

Please sign in to comment.