# Quick and clean: Python for biological data processing

- [Course information](day1/info.ipynb)
    - About the course
    - About us
    - About you
    - Venue, credits, pricing, requirements
    - Feedback
    - Bibliography

**Day 1, 8:00-16:00, Python**

Easy introduction, light tasks.

- [What is Python?](day1/what.ipynb)
    - Stats, strengths and weaknesses
    - Past, present and future of Python
    - As a new "pythonista", who will be your friends?
    - How to make Python work for this course
    - Python distributions, Anaconda
    - Jupyter and interactive notekeeping
    - Installing libraries
    - Python consoles, interpreters and editors
- [Python tutorial](day1/tutorial.ipynb)
    - Basics: Math, Variables, Functions, Control flow, Modules
    - Data representation: String, Tuple, List, Set, Dictionary, Objects and Classes
    - Standard library modules: script arguments, file operations, timing, processes, forks, multiprocessing
- [Text manipulation](day1/text.ipynb):
    - File IO, streaming, serialization
    - Parsing and regular expressions
    - XML, HTML editing
- [Python and the web](day1/web.ipynb)
    - Introduction to Django
    - SQL interogation
    - Remote API calls (Entrez, BioBank)
- [Python and other languages](day1/languages.ipynb)
    - Python and C: Mutual Information
    - Python and R: microarray processing

**Day 2, 8:00-16:00, Data science**

Intensive in math, slightly harder tasks to accomplish in class.

- [Visualization](day2/visualization.ipynb):
    - Standard plots with matplotlib: line, scatter, chart
    - Web publishing with plotly: heatmap example
    - Network display with graphviz
    - GUI programming with wxpython
- [Statistics](day2/statistics.ipynb):
    - Dataframing with pandas
    - scipy: anova, linear regression, curve fitting
    - Statistical enrichment analysis
- [Scientific computing](day2/scicomp.ipynb):
    - Numpy: advanced array operations
    - Scipy introduction: from linear algebra to image analisys
    - Simpy: symbolic math
- [Machine learning](day2/learning.ipynb):
    - scikit-learn: clustering
    - Handling multivariate data: PCA and PLS regression
- [Networks](day2/networks.ipynb):
    - networkx: centrality computation
    - Network IO
- *Presentation of Omics*
    - Omics tasks of day 3 are presented and discussed.





**Day 3, 8:00-16:00, 'Omics**

I setup the problems and describe the tasks, and give you some helper code to start with, and you will solve them in class, in the order of your choosing. I will tend to guide rather than tell.

- [NGS_pipelining](day3/ngs_pipelining.ipynb):
    - Open a cloud instance and install required programs
    - Setup the pipeline
    - Mapping and IGV inspection
    - Normalizing counts and differential expression
    - Galaxy integration
- [Sequencing](day3/sequencing.ipynb)
    - Make a toy sequencing library in standard Python for processing DNA, RNA and protein data.
    - Implement the DNA, RNA and proteins as Python classes
    - Make methods for transcription, translation, regulation.
    - Compute several sequence similarity scores, such as hamming distance and mutual information.
    - Add BioPython methods and prefix them with bp
    - Describe your module in a tutorial like fashion
- [Gene Expression](day3/expression.ipynb)
    - Download a GEO dataset and prepare it
    - Cluster the genes based on their expression
    - Compute a co-expression network
    - Compute differential gene expression for a set of samples.
    - Compute functional enrichment of the main clusters.
- [Transcriptomics](day3/transcriptomics.ipynb)
    - Extract the promoter regions using biopython
    - Investigate de-novo motifs on clusters of genes using meme or steme
    - Use TransFac database to search for motif occurences on selected genes
    - Save the found motifs and related data
    - Test which of the motif occurences on your selected genes is significant
- [Proteomics](day3/proteomics.ipynb)
    - Compute a protein similarity graph, cluster enrichment study
    - Perform structural alignment and plots with PyMol
- [Metabolomics](day3/metabolomics.ipynb)
    - Metabolic pathway assembly and display
    - Flux balance analysis
- [Dynamic modeling](day3/dynamic_modeling.ipynb)
    - Load a curated SBML model
    - Plot the model
    - Solve the model
    - Peak identification
    - Pathway studies
- [Population genetics and philogeny](PopGen.ipynb) - Removed/Not included.
    - Run a small scale coalescent simulation
    - Compute a philogeny tree and display it