# Quick and clean: Python for biological data processing

- [Course information](day1/info.ipynb)
    - About the course
    - About us
    - About you
    - Venue, credits, pricing, requirements
    - Feedback
    - Bibliography

## Day 1, 8:00-16:00, Python

Easy introduction, light tasks.

- [What is Python?](day1/what.ipynb)
    - How do computers work?
    - Python: Stats, strengths and weaknesses
    - Python: Past, present and future
    - As a new "pythonista", who will be your friends?
    - Make Python work again!
    - Installing libraries
    - Python consoles, interpreters and editors
    - Python distributions, conda


- [Jupyter](day1/jupyter.ipynb)
    - History and JupyterLab
    - General presentation
    - Use in reproducible research
    - Other uses
    - Slide shows
    
    
- [Python tutorial](day1/tutorial.ipynb)
    - Basics: Math, Variables, Functions, Control flow, Modules
    - Data representation: String, Tuple, List, Set, Dictionary, Objects and Classes
    - Standard library modules: script arguments, file operations, timing, processes, forks, multiprocessing
    
    
- [DevOps](day1/devops.ipynb)
    - Reproducible research.
    - What are devops?
    - How to source your code.
    - Using source editors. What matters?
    - Distributed version control using git.
    - Development vs production.
    - When do we need containers? Using Docker.
    - Speed: Profiling, IPython, JIT 
    - Robustness: unit testing
    - Documentation: pydoc and Sphinx
    
    
- [Python and the data](day1/web.ipynb)
    - File IO, streaming, serialization
    - Parsing and regular expressions
    - Chunking and HDF5, pytables
    - XML, HTML editing
    - Web frameworks.
    - SQL frameworks.
    - Remote API calls (Entrez, BioBank)
    
    
- [Python and other languages](day1/languages.ipynb)
    - Interacting with other programs through API calls.
    - The whole is greater than its parts: syncretism.
    - Python and C: make your own API.
    - Python and R: direct piping through Jupyter.
    - Python and Julia
    
    
- [Python and the cloud](day1/cloud.ipynb)
    - What is the cloud?
    - Hadoop and Spark: distributed computing intro.
    - HPC: grid computing intro.
    - Spinning instances: Back to containers.
    - Accelerate your code: GPU computing, FPGA, ASICS

## Day 2, 8:00-16:00, Data science

Intensive in math, slightly harder tasks to accomplish in class.

- [Intro to data science](day2/introds.ipynb)
    - You are a data scientist.
    - How to extract information from data.
    - Dataset, model, prediction.
    
    
- [Visualization](day2/visualization.ipynb):
    - Standard plots with matplotlib and seaborn: line, scatter, chart
    - Web publishing with plotly and bokeh: heatmap example
    - Network display with graphviz
    - GUI programming with wxpython
    - Web interfaces
    
    
- [Statistics](day2/statistics.ipynb):
    - Dataframing with pandas
    - Normalizing a dataset
    - Scipy: ANOVA
    - Statistical enrichment analysis
    
    
- [Scientific computing](day2/scicomp.ipynb):
    - Numpy: advanced array operations
    - Scipy introduction: from linear algebra to image analisys
    - Simpy: symbolic math
    - Networks with networkx: centrality computation
    - Fitting a curve, cleaning a signal.
    
    
- [Machine learning](day2/learning.ipynb):
    - scikit-learn: clustering
    - Handling multivariate data: PCA, PLS regression
    - Deep learning: theano, tensorflow and pyspark
    
    
- [Workflow management](day2/workflow.ipynb)
    - Snakemake tutorial
    - Nextflow tutorial




## Day 3, 8:00-16:00, 'Omics

I setup the problems and describe the tasks, and give you some helper code to start with, and you will work on your picked task in class. I will tend to guide rather than tell.

- [NGS pipelining](day3/ngs_pipelining.ipynb):
    - Open a cloud instance and install required programs
    - Setup the pipeline
    - Read mapping and IGV inspection
    - Normalizing counts and differential expression
    - Galaxy integration
    
    
- [Sequencing](day3/sequencing.ipynb)
    - Make a toy sequencing library in standard Python for processing DNA, RNA and protein data.
    - Implement the DNA, RNA and proteins as Python classes
    - Make methods for transcription, translation, regulation.
    - Compute several sequence similarity scores, such as hamming distance and mutual information.
    - Add BioPython methods and prefix them with bp
    - Describe your module in a tutorial like fashion
    
    
- [Gene Expression](day3/expression.ipynb)
    - Download a GEO dataset and prepare it
    - Cluster the genes based on their expression
    - Compute a co-expression network
    - Compute differential gene expression for a set of samples.
    - Compute functional enrichment of the main clusters.
    
    
- [Transcriptomics](day3/transcriptomics.ipynb)
    - Extract the promoter regions using biopython
    - Investigate de-novo motifs on clusters of genes using meme or steme
    - Use TransFac database to search for motif occurences on selected genes
    - Save the found motifs and related data
    - Test which of the motif occurences on your selected genes is significant
    
    
- [Proteomics](day3/proteomics.ipynb)
    - Compute a protein similarity graph, cluster enrichment study
    - Perform structural alignment and plots with PyMol
    
    
- [Metabolomics](day3/metabolomics.ipynb)
    - Metabolic pathway assembly, enrichment and display
    - Flux balance analysis
    
    
- [Dynamic modeling](day3/dynamic_modeling.ipynb)
    - Load a curated SBML model
    - Plot the model
    - Solve the model
    - Peak identification
    - Pathway studies
    
    
- [Population genetics and philogeny](PopGen.ipynb) - Removed/Not included.
    - Run a small scale coalescent simulation
    - Compute a philogeny tree and display it