<center>
<img 
    src='https://raw.githubusercontent.com/cineca-scai/lectures/science-rome/pyscience/images/welcome.jpg'
    width=800>
</center>

> The most complicated part is to start

<table>
    <thead>
        <tr> <th>Day</th> <th>Time</th> <th>Lesson</th> </tr>
    </thead>
    <tr> <td> 1 </td> <td> Morning </td> <td> Introduction to language </td> </tr>
    <tr> <td> 1 </td> <td> Afternoon </td> <td> Modular code and some magic </td> </tr>
    <tr> <td> 2 </td> <td> Morning </td> <td> Numeric optimized arrays </td> </tr>
    <tr> <td> 2 </td> <td> Afternoon </td> <td> Numeric operations </td> </tr>
    <tr> <td> 3 </td> <td> Morning </td> <td> Scientific libraries and dataframes </td> </tr>
    <tr> <td> 3 </td> <td> Afternoon </td> <td> Various </td> </tr>
</table>

# Before starting

** Testing the audience **

- Do you already know python?
- Which version of python are you using?
- What is the main reason you decided to use python? 
- Have you ever used `numpy`?
- Do you know what `ipython` is?

## The role of computing in science

> about why you probably are here now

Science has traditionally been divided into 

* **experimental** and 
* **theoretical** disciplines.

During the last several decades ***computing*** has emerged

* Related to both experiments and theory.
* Often viewed as a new third branch of science. 

<center>
    <img src="images/theory-experiment-computation.png" width="600">
</center>

*Nowadays a vast majority of both experimental and theoretical papers involve some **numerical calculations**, simulations or computer modeling*.

## Requirements on scientific computing

With respect to numerical work:

**Replication**
    - An author of a scientific paper that involves numerical calculations should be able to rerun the simulations and replicate the results upon request. 
    - Other scientist should also be able to perform the same calculations and obtain the same results, given the information about the methods used in a publication.

**Reproducibility** 
    - The results obtained should be reproducible with an independent implementation of the method, or using a different method altogether. 

In experimental sciences
* the methods used and the results are published
* All experimental data should be available upon request
* It is considered unscientific to withhold crucial details in a theoretical proof

In computational sciences 
* There are not yet any well established guidelines for how **source code** and **generated data** should be handled. 

*A number of editorials in high-profile journals have started 
to demand of authors to provide the source code for simulation software*


> Reproducibility is one of the cornerstones of science. 

> Made popular by British scientist Robert Boyle in the 1660s, the idea is that a discovery should be reproducible before being accepted as scientific knowledge.

> Computers wrangle the data, but also obscure it...

> recommend free software that is available for all computer systems (eg. Windows, Mac, and Linux) for analyzing and visualizing data (such as R and Python)

source: http://j.mp/1TjbF7i


### To achieve goals

* Keep source code and version that was used to produce data and figures in published papers
* Record information of which version of external software that was used
    - Keep access to the environment that was used
* Be ready to give additional information about the methods used
* Ideally codes should be published online

1. Can we do something for keeping source code/tools versions?

2. How to write (understandable) code that can be published online?

# Tools for managing source code
*this is extremely important for your future work*

Ensuring replicability and reprodicibility of scientific simulations is a *complicated problem*

#### Revision Control System (RCS) software
Good choices include
* git - http://git-scm.com
* mercurial - http://mercurial.selenic.com. Also known as `hg`
* subversion - http://subversion.apache.org. Also known as `svn`

#### Online repositories for source code
Some good alternatives are
* Github - http://www.github.com
* Bitbucket - http://www.bitbucket.com
* Privately hosted repositories on the university's or department's servers

Repositories are also excellent for version controlling manuscripts, figures, thesis files, data files, lab logs, etc. 

Basically for any digital content that must be preserved and is frequently updated!

They are also excellent **collaboration tools**!

## The cycle of data/science projects

> how computing changed the life of a scientist

<table>
<tr><td>
<h2>Steps:</h2>
<br>
1. Identify the problem
<br>
2. Identify available data sources
<br>
3. Implementation
<br>
4. Statistics and tests
<br>
5. Develop application/service/paper
<br><br>
Additional:
<br>
6. Share results
<br>
7. Maintenance
<br>
</td>
<td>
<img src='http://j.mp/1lMNT8R' width=500>
</td></tr></table>

source *Vincent Granville*:
http://www.datasciencecentral.com/profiles/blogs/life-cycle-of-data-science-projects

# Uff.. boring!

I agree! Here we go...


...with live code from the beginning!

In [None]:
%%bash 

# Download an extension from my repo
repo="https://raw.githubusercontent.com/pdonorio/pdonorio.github.io"
path="$HOME/.ipython/extensions" 
extension="whoami.py"

# Use bash commands!
wget -q $repo/master/scripts/$extension -O $path/$extension
echo "Downloaded"

* note to self: click on the cell below
* then press key combination: `SHIFT` + `ENTER`

In [None]:
# Load the installed extension
%load_ext whoami

<small> wait, what is that?? </small>

A command about *my self*

In [None]:
# This is a new command added from the extension 'whoami'
%helloworld

### Ok. Let's double check

In [None]:
import time
print ("Today is " + time.strftime("%d/%m/%Y"))

## A notebook and a Calculator

Many of the things I used to use a calculator for, I now use Python for:

In [None]:
2+2

In [None]:
(50-5*6)/4

In [None]:
5 ** 2

But much MUCH more powerful

In [None]:
width = 20
height = 5*9
width * height

I can also use python as my base shell

In [None]:
ls -l

Ok. I think i've got your attention...

#  A modern approach to science
*with python*

<!-- logos -->
<table>
<tr>
<td colspan=3> <img src='images/scai-logo.png' width=500> </td>
</tr><tr>
<td> <img src='http://upload.wikimedia.org/wikipedia/commons/thumb/f/f8/Python_logo_and_wordmark.svg/2000px-Python_logo_and_wordmark.svg.png' width=400> </td>
<td> <img src='https://raw.githubusercontent.com/jupyter/nature-demo/master/images/jupyter-logo.png' width=400> </td>
<td> <img src='https://pbs.twimg.com/profile_images/562949148932964352/a5XS7EQh.png' width=400> </td>
</tr>
</table>

# The evolution

`Scripting` -> *Python* Interpreter -> *Ipython* Shell -> *Jupyter* Notebooks -> **Awesome!**

## About '*scripting*'

## Compilers

* **Computer programs** are "*binaries*" (alphabet of only 0 and 1) instructions, which is the only language the machine understand.


* A **programming language** (e.g. `C` or `Fortran`) is a way in the middle between human language (e.g. English) and machine language (0s and 1s).


* A person learns `C` language which is translated into machine language by ***compilers***. The **code** is translated into **binaries** or **executables**.  

## Interpreters

* **Scripting languages** are programming languages that *don't require an explicit compilation step*. For example: PHP, Javascript, Perl, Python, R.


* You have to compile a C program before you can run it. You don't have to compile a JavaScript program before using it.


* Scripting languages use **interpreters** (instead of compilers) to translate source-code to machine executable code at run-time.

## What is Python?

> TL;TR: an interpreter

[Python](http://www.python.org/) is: 

> a modern, general-purpose, object-oriented, high-level programming language.

### General characteristics of Python

* **clean and simple language:** 
    * Easy-to-read and intuitive code
    * easy-to-learn minimalistic syntax
    * maintainability scales well with size of projects

* **expressive language:** 
    * Fewer lines of code
    * fewer bugs
    * easier to maintain

### Some *technical* details

* **dynamically typed:** 
    * No need to define the type of variables, function arguments or return types.
* **automatic memory management:** 
    * No need to explicitly allocate and deallocate memory for variables and data arrays
        * No memory leak bugs 
* **interpreted:** 
    * No need to compile the code
        * The Python interpreter reads and executes the python code directly

### Advantages

* The main advantage is ease of programming
    - minimizing the **time required** to develop, debug and maintain the code
* Well designed: encourage many good programming practices
    - Modular, with good system for packaging and re-use of code
    - This often results in more transparent, maintainable and bug-free code
* Self describing (*introspection*)
    - Documentation tightly integrated with the code
* A large standard library, and a large collection of add-on packages

### Disadvantages

* Interpreted and dynamically typed programming language's execution **may be slow compared to compiled** programming languages
* Very different from functional programming, which you may already know a little bit

### Popularity

source: http://githut.info/
<img src='http://j.mp/1NwNIdj'>

### What makes python suitable for scientific computing?
<img src="images/optimizing-what.png" width="800">

# Python interpreter

The standard way to use the Python programming language is to use the Python interpreter to run python code

* The python interpreter is a program that read and execute the python code in files passed to it as arguments
* At the command prompt, the command ``python`` is used to invoke the Python interpreter

For example, to run a file ``my-program.py`` that contains python code from the command prompt, use:

    $ python my-program.py

We can also start the interpreter by simply typing ``python`` at the command line, and interactively type python code into the interpreter. 

(i may use the notebook terminal)

<img src="https://raw.githubusercontent.com/cineca-scai/lectures/master/pydata/images/python-screenshot.jpg" width="700">


Most notable feature of the Python language syntax is the use of **indentation** to define code blocks, instead of classic '*Parentheses*'

Let's see a quick shell demo.

### IPython

IPython is an interactive shell that addresses the limitation of the standard python interpreter

...it is a work-horse for scientific use of python! 

It provides an interactive prompt to the python interpreter with a greatly improved user-friendliness.

<img src="https://raw.githubusercontent.com/cineca-scai/lectures/master/pydata/images/ipython-screenshot.jpg" width="800">

Born in 2001 as a work of a student (*Fernando Perez*).

Based on features he liked in *Mathematica* and trying to create a system for everyday scientific computing.

Some of the many useful features of IPython includes:

* Command history, which can be browsed with the up and down arrows on the keyboard.
* Tab auto-completion.
* In-line editing of code.
* Object introspection, and automatic extract of documentation strings from python objects like classes and functions.
* Good interaction with operating system shell.
* Support for multiple parallel back-end processes, that can run on computing clusters or cloud services like Amazon EE2.


Let's see a quick shell demo.

* colors
* output
* magic commands
* history
* bash commands

# IPython notebook

[IPython notebook](http://ipython.org/notebook.html) is an HTML-based notebook environment for Python

* Based on the IPython shell
* Provides a web cell-based interactive environment powered with Javascript
* System profiles to access unix-terminal-like capability
* Comments and notes with HTML and markdown formats
* Integrates embedded plots


<img src="https://raw.githubusercontent.com/cineca-scai/lectures/master/pydata/images/ipython-notebook-screenshot.jpg" width="800">

Although using the a web browser as graphical interface, 

IPython notebooks are usually run **locally**

from the same computer that run the browser. 


To start a new IPython notebook session, run the following command:

    $ ipython notebook

from a directory where you want the notebooks to be stored. 

<small>(This will open a new browser window with a running explorer of the current path)</small>

# Jupyter project

> In 2014, Fernado Perez announced a spin-off project from IPython called Project Jupyter. 

> IPython will continue to exist as a Python shell and a kernel for Jupyter, 

> while **the notebook and other language-agnostic parts of IPython** will move under the Jupyter name. 

> Jupyter added support for Julia, R, Haskell and Ruby.

source: https://en.wikipedia.org/wiki/IPython#Project_Jupyter

## Kernels

Jupyter notebooks are based on **ipython kernels**.

> A ‘kernel’ is a program that runs and introspects the user’s code. 

IPython includes a kernel for Python code

People have written kernels for [several other languages](https://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages).

<small>Note: 
If you wish to **write a Kernel for a missing language**, you can read the ipython development documentation:
http://ipython.readthedocs.org/en/stable/development/kernels.html.</small>

<small>Note: When IPython starts a kernel, it passes it a connection file. 
This specifies how to set up communications with the frontend.</small>

# Welcome, 
## new *you* 

a <u>notebooker</u> data scientist

Note: 

we will NOW give you credentials to test notebook 

on our PICO cloud environment

## A notebook is crazy simple and fun

- explorer, create new, rename, remove

- move inside, run cell code, help

- the kernel: start and stop, cell types. autosaving and checkpoints.

- shortcuts: becoming an editor

## A notebook is crazy simple and fun

- markdown and notes <small>(consider to [learn markdown language](http://markdowntutorial.com/))

- download ipynb, python, html

- install a library and use it

- slideshow

- slideshow

<center>
<h2>The Pydata stack</h2>
<img src="images/scientific-python-stack.jpg" width="750"></center>

# Main resources

- Course official website: http://bit.ly/pfs_2015rome
- Github cineca-scai/lectures https://github.com/cineca-scai/lectures/tree/science-rome/pyscience
- Docker hub and main notebook image https://hub.docker.com/r/cineca/nbscience
- Python weekly newsletter http://www.pythonweekly.com/
- Markdown cheatsheet https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet

## How to replicate this environment when you go home

https://hub.docker.com/r/cineca/nbscience

The resources we provided here for you are just simple **Docker containers** running on our PICO cloud.

Docker (a lightweigth virtual environment) has the amazing property to run your software on a linux virtual environment in the same conditions on every machine. Also your workstation or laptop (even if they are windows or mac).

last step before getting serious:

## Python and module versions

For the reproducibility of an IPython notebook:

- We record the versions of all these different software packages
- If this is done properly it will be easy to reproduce the environment 


To encourage the practice of recording versions in notebooks:

> a simple IPython extension that produces a table with versions numbers of selected software components

Now, to load the extension and produce the version table

In [None]:
# Execute me now
%reload_ext version_information
%version_information numpy, scipy, pandas, matplotlib, seaborn, bokeh

# End of Chapter

This lectures are also available in *non-interactive mode* on free **nbviewer** service:
http://bit.ly/pyscience_readonly

*note*:

The idea of this course based on notebooks was initially inspired when i found the **Scientific Python lectures** written by *Robert Johansson*.

Few of his material has been integrated with our previous courses.

You can find its original work on his Github Project:
https://github.com/jrjohansson/scientific-python-lectures

You can go to [next one](%5B01_02%5D%20Learning%20Python.ipynb)