# Single-cell Bioinformatics

Over the course of your project, you'll have many ups and downs with your data, so we'll use the analogy of a romantic relationship to explain the different steps of single-cell bioinformatics analyses.

## 1. Matchmaking: Getting the publicly available [deets](http://www.urbandictionary.com/define.php?term=deets) on your data

"Pubmed stalking"... it's just like facebook stalking!

* [1.0_Introduction_to_bioinformatics.ipynb](notebooks/1.0_Introduction_to_bioinformatics.ipynb)
* [1.1_Overview_of_analysis_steps.ipynb](notebooks/1.1_Overview_of_analysis_steps.ipynb)
* [1.2_Downloading_public_data_shalek2013.ipynb](notebooks/1.2_Downloading_public_data_Shalek2013.ipynb)
* [1.3_Single-cell_overview_additional_reading.ipynb](notebooks/1.3_Single-cell_overview_additional_reading.ipynb)
* [1.4_Unix.ipynb](notebooks/1.4_Unix.ipynb) - Optional additional exercises with the unix command line. If you're on Linux or Mac you can do this on your computer, if you have windows, do these on the Macs.

### Homework
- Spillover from what we didn't finish:
    - Mapping/alignment spillover
    - Downloading public data and filtering on expressed genes spillover
- Find another single cell paper with GEO/ArrayExpress accession, download its data, and compare gene expression filtering strategies (Will use this dataset throughout the course)

### Optional: Pandas from `.head()` to `.tail()`

The package we'll be using to deal with matrices and dataframes in Python is called Pandas. Thoughout the course, I've tried to show some different applications of `pandas` but this is definitely not complete. For a full introduction, I recommend the following tutorial from Tom Augspurger.

* [Video](https://www.youtube.com/watch?v=otCriSKVV_8)
* [Notebooks](pandas_tutorial/)

While this tutorial is aimed for newbies to Python and `pandas`, and thus the beginning would be review for intermediate to advanced Python and `pandas` users, the last few notebooks would be of interest to non-newbies.

* [Groupby](pandas_tutorial/notebooks/4. Groupby.ipynb)
    * Life-changing concept that has saved me hours of work. There's been many days where I've said to myself, ***"I LOVE GROUPBY!!!!!!"***
* [Tidy Data](pandas_tutorial/notebooks/5. Tidy Data.ipynb)
    * Another Awesome life-changing concept that helps you think about how to structure your data, even as you're making Excel files. Based off of [this](notebooks/papers/tidy-data.pdf) paper by Hadley Wickham, the author of many many dataframe manipulation packages in R.
* [Pandas applied to Machine Learning and Statistics](pandas_tutorial/notebooks/6. For Stats & ML.ipynb)
    * Categorical variables and transforming them to machine-learning friendly formats


## 2. First date: Get your data's life story with dimensionality reduction

* [2.0 Machine Learning Intro](sklearn_tutorial/notebooks/02.1-Machine-Learning-Intro.ipynb) [Jake Vanderplas' tutorial]
* [2.1 Basic Principles in Machine Learning](sklearn_tutorial/notebooks/02.2-Basic-Principles.ipynb) [Jake Vanderplas' tutorial]
* [2.2_Introduction_to_dimensionality_reduction.ipynb](notebooks/2.2_Introduction_to_dimensionality_reduction.ipynb)
* [2.3_PCA](sklearn_tutorial/notebooks/04.1-Dimensionality-PCA.ipynb) [Jake Vanderplas' tutorial]
* [2.4_ICA.ipynb](notebooks/2.4_ICA.ipynb)
* [2.5_Manifold_learning.ipynb](notebooks/2.5_Manifold_learning.ipynb)
* [2.6_Compare_dimensionality_reduction.ipynb](notebooks/2.6_Compare_dimensionality_reduction.ipynb)
* [2.7_Apply_dimensionality_reduction_on_Shalek2013_Macaulay2016.ipynb](notebooks/2.7_Apply_dimensionality_reduction_on_Shalek2013_Macaulay2016.ipynb)
* [2.8_Additional reading.ipynb](notebooks/2.8_Additional_reading.ipynb)
* [2.9_tSNE_on_subsets_of_digits.ipynb](notebooks/2.9_tSNE_on_subsets_of_digits.ipynb)

### Homework

- Application spillover
- Same single cell dataset, compare all dimensionality reduction algorithms

## 3. One-month anniversary: Give your boo some clusters

* [3.0_Introduction_to_clustering.ipynb](notebooks/3.0_Introduction_to_clustering.ipynb)
* [3.1 $K$-means_clustering](sklearn_tutorial/notebooks/04.2-Clustering-KMeans.ipynb) [Jake Vanderplas' tutorial]
* [3.2_Hierarchical_clustering.ipynb](notebooks/3.2_Hierarchical_clustering.ipynb)
* [3.3_Apply_clustering_to_Shalek2013_Macaulay2016.ipynb](notebooks/3.3_Apply_clustering_to_Shalek2013_Macaulay2016.ipynb)
* [3.4_Plotting_colors_and_evaluating_clustering.ipynb](notebooks/3.4_Plotting_colors_and_evaluating_clustering.ipynb)

### Homework

- Application spillover
- Same dataset, compare cluster finding

## 4. One-year anniversary: Find what makes your data tick using supervised learning

* [4.0_Introduction_to_classifiers.ipynb](notebooks/4.0_Introduction_to_classifiers.ipynb)
* [4.1_Overfitting.ipynb](notebooks/4.1_Overfitting.ipynb)
* [4.1_Support_vector_machines](sklearn_tutorial/notebooks/03.1-Classification-SVMs.ipynb) [Jake Vanderplas' tutorial]
* [4.2_Decision_trees](sklearn_tutorial/notebooks/03.2-Regression-Forests.ipynb) [Jake Vanderplas' tutorial]
* [4.4_Apply_SVM_to_Shalek2013_clustered_heatmap.ipynb](notebooks/4.4_Apply_SVM_to_Shalek2013_clustered_heatmap.ipynb)
* [4.5_Apply_SVM_to_Shalek2013_with_violinplots.ipynb](notebooks/4.5_Apply_SVM_to_Shalek2013_with_violinplots.ipynb)
* [4.6_Assess_clustering_with_gene_ontology.ipynb](notebooks/4.6_Assess_clustering_with_gene_ontology.ipynb)
* [4.7_Apply_tree_classifiers_to_Shalek2013_with_gene_ontology.ipynb](notebooks/4.7_Apply_tree_classifiers_to_Shalek2013_with_gene_ontology.ipynb)
* [4.8_Apply_classifiers_to_Macaulay2016_with_gene_ontology.ipynb](notebooks/4.8_Apply_classifiers_to_Macaulay2016_with_gene_ontology.ipynb)

### Homework

- Application spillover
- Same dataset, compare enriched genes in clusters

## 5. Ten-year anniversary: Reflect on where you've been together with pseudotime ordering

Pseudotime ordering is like biologically-driven "regression"

* [5.0_Pseudotime_introduction.ipynb](notebooks/5.0_Pseudotime_introduction.ipynb)
* [5.1_Pseudotime_ordering_algorithms_overiew.ipynb](notebooks/5.1_Pseudotime_ordering_algorithms_overiew.ipynb)


## 6. Couples counseling: Dealing with technical noise and batch effects

* [6.1_Dealing_with_technical_noise.ipynb](notebooks/6.1_Dealing_with_technical_noise.ipynb)
* [6.2_Batch_Correction.ipynb](notebooks/6.2_Batch_Correction.ipynb)
* [6.3_Technical_noise_additional_reading.ipynb](notebooks/6.3_Technical_noise_additional_reading.ipynb)

## 7. 50-year anniversary: Advanced topics

If you're already an experienced bioinformatician, you may be interested in working through the analyses steps of the papers assigned for the course. The simpler one is the Shalek2013 paper:

* [7.2_Reproducing_Shalek2013_figures](notebooks/7.2_Reproducing_Shalek2013_figures.ipynb)


More advanced is the Macaulay2016 paper, which includes pseudotime ordering and Bayesian modeling.

* [7.0_Case_Study_Macaulay2016.ipynb](notebooks/7.0_Case_Study_Macaulay2016.ipynb)
    * Links to the original notebooks supplied with the paper
* [7.1_Playing_with_analysis_decisions_in_Macaulay2016.ipynb](notebooks/7.1_Playing_with_analysis_decisions_in_Macaulay2016.ipynb)
    * Interactive widgets playing with PCA vs ICA vs MDS vs t-SNE, linkage methods, and distance metrics at key points of the Macaulay2016 analysis pipeline


## 8. Plotting tips

Tips for Python plotting with colors and such

* [8.0_Plotting_tips.ipynb](8.0_Plotting_tips.ipynb)