# Introduction to scientific computing with Python


## Computational Thinking

> __Task: Program a bread slicer machine__

<img src="./_img/bread.png" width="700px"> 

* Provide comands to cut 3 normal slices of bread.
* Each person provides on command and writes it on a post-it.
* Any previous command can be undone by an erase command.

## Scientific Computing

Scientists spend an increasing amount of time building and using software. However, many researchers are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. Software is very important to modern scientific research as the daily operation of science revolves around developing new algorithms, managing and analyzing the large amounts of data that are generated in single research projects, combining disparate datasets to assess synthetic problems, and other computational tasks.

In that sense software is just another kind of experimental apparatus and should be built, checked, and used as carefully as any physical apparatus. However, while most scientists are careful to validate their laboratory and field equipment, most do not know how reliable their software is. This can lead to serious errors impacting the central conclusions of published research. In addition, because software is often used for more than a single project, and is often reused by other scientists, computing errors can have disproportionate impacts on the scientific process. 



### Best Practices for Scientific Computing

[Wilson G, et al. (2014) Best Practices for Scientific Computing. PLoS Biol 12(1): e1001745.](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745)

* __Write programs for people, not computers.__
    * A program should not require its readers to hold more than a handful of facts in memory at once.
    * Make names consistent, distinctive, and meaningful.
    * Make code style and formatting consistent.
* __Let the computer do the work.__
    * Make the computer repeat tasks.
    * Save recent commands in a file for re-use.
    * Use a build tool to automate workflows.
* __Make incremental changes.__
    * Work in small steps with frequent feedback and course correction.
    * Use a version control system.
    * Put everything that has been created manually in version control.
* __Don't repeat yourself (or others).__
    * Every piece of data must have a single authoritative representation in the system.
    * Modularize code rather than copying and pasting.
    * Re-use code instead of rewriting it.
* __Plan for mistakes.__
    * Add assertions to programs to check their operation.
    * Use an off-the-shelf unit testing library.
    * Turn bugs into test cases.
    * Use a symbolic debugger.
* __Optimize software only after it works correctly.__
    * Use a profiler to identify bottlenecks.
    * Write code in the highest-level language possible.
* __Document design and purpose, not mechanics.__
    * Document interfaces and reasons, not implementations.
    * Refactor code in preference to explaining how it works.
    * Embed the documentation for a piece of software in that software.
* __Collaborate.__
    * Use pre-merge code reviews.
    * Use pair programming when bringing someone new up to speed and when tackling particularly tricky problems.
    * Use an issue tracking tool.

***
## Why Python?

### A powerful, all-purpose language with a great syntax (_"executable pseudocode"_)

<img src="./_img/xkcd_learning_curve_wave_python.png" width="700px"> 

Source: [Fabien Maussion](http://fabienmaussion.info/)

### Interoperability with other languages

[Guido van Rossum](https://gvanrossum.github.io//index.html) (creator of Python formally known as Benevolent Dictator For Life (BDFL))

_... I never intended Python to be the primary language for programmers ..._

_... bridge the gap between the shell and C ..._


<img src="./_img/guido.png" width="500px">

Source: [SD Times](https://sdtimes.com/guido-van-rossum/python-creator-proposes-type-annotations-programming-language/) (2014
) & [A Conversation with Guido van Rossum](https://www.artima.com/intv/)  (2002)

### Open and encouraging community

<img src="./_img/growth_major_languages-1-1400x1200.png"  width="650px">

Source: [Blog post by David Robinson](https://stackoverflow.blog/2017/09/06/incredible-growth-python/) (September 6, 2017)




### Batteries included and third-party modules

<img src="https://imgs.xkcd.com/comics/python.png" width="500px;">

Source: [xkcd](https://xkcd.com/353/)

### Python ecosystem


<img src="./_img/python_ecosystem.png"  style="height: 480px;">

***
    conda install <YOUR-PACKAGE-OF-CHOICE> # conda package manager
    pip install <YOUR-PACKAGE-OF-CHOICE>   # tool for installing Python packages
    
    >>> import <YOUR-PACKAGE-OF-CHOICE>

### Python for other domains


> _If you are interested in other domain specific stacks consider [Awesome Python](https://awesome-python.com/), a curated list of Python frameworks, libraries, software and resources._

***
## Reproducible Research



The term [reproducible research](https://en.wikipedia.org/wiki/Reproducibility#Reproducible_research) refers to the idea that the ultimate product of academic research is the paper along with the laboratory notebooks and full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research. Typical examples of reproducible research comprise compendia of data, code and text files, often organised around an __[R Markdown source document](https://rmarkdown.rstudio.com/)__ or a __[Jupyter notebook](http://jupyter.org/)__.


<img src="./_img/pipeline.png" width="1200px;">

(after Peng et al. 2006)

***
## Literate Programming

[Literate programming](https://en.wikipedia.org/wiki/Literate_programming) is a programming paradigm introduced by [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth) in which a program is given as an explanation of the program logic in a natural language, interspersed with snippets of macros and traditional source code.

The literate programming paradigm enables programmers to develop programs in the order demanded by the logic and flow of their thoughts. Literate programs are written much like the text of an essay, in which macros are included to hide abstractions and traditional source code.

> _Write Programs For People, Not Computers._ ([Wilson G, et al. (2014)](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745))

### _The Scientific Paper Is Obsolete_
by James Somers, [The Atlantic, Apr 5, 2018](https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/)

In [None]:
from IPython.display import IFrame    
IFrame('https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/', 
       width="100%", height=450)

***
## Data Science

### What is data science?
   

_Data science [...] is an **interdisciplinary field of scientific methods, processes, and systems** to extract **knowledge or insights from data**[...]_ ([Wikipedia](https://en.wikipedia.org/wiki/Data_science))

_It employs techniques and theories drawn from many fields within the broad areas of_
* _mathematics_, 
* _statistics_, 
* _information science_, and 
* _computer science_, in particular from the subdomains of 
    * _machine learning_, 
    * _classification_, 
    * _cluster analysis_, 
    * _data mining_, 
    * _databases_, and 
    * _visualization._

The [Harvard Business Review](https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century)  (2012) referred to data science as "**The Sexiest Job of the 21st Century**".

<img src="./_img/data_science.png"  width="800px"> 

Source: [Michael Barber](https://towardsdatascience.com/introduction-to-statistics-e9d72d818745)

***