# Data science in Python

- Our course webpage: http://pycam.github.io
- Python website: https://www.python.org/ 

## Session 1.1: Starting with data and Python
- [Jupyter notebook](#Jupyter-notebook)
    - Add text into notebook using Markdown cells
    - Create and execute Code cells
    - Download Python code into file
- [Shell commands](#Shell-commands)
    - Find python script downloaded from Jupyter and execute it
    - Find first data file
- [Basic Python](#Basic-Python)
    - [Cheat Sheet](cheat_sheet_basic_python.ipynb)
    - [Files](#Files)
- [Exercise 1.1](#Exercise-1.1)

## Jupyter notebook

<img src="http://jupyter.org/assets/nav_logo.svg">

- The [Jupyter Notebook](http://jupyter.org/) is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. 

- Jupyter provides a rich architecture for interactive data science and scientific computing with: 
    - Over 40 programming languages such as Python, R, Julia and Scala.
    - A browser-based notebook with support for code, rich text, math expressions, plots and other rich media.
    - Support for interactive data visualization.

### How to install Jupyter on your own computer?

- We recommend using a virtual environment after having installed [Python 3](https://www.python.org/) on your computer
```bash
python3 -m venv venv
source venv/bin/activate # activate your virtual environment
```
- Install Jupyter:
```
pip install jupyter
```
- Start the notebook server from the command line:
```
jupyter notebook
```
- You should see the notebook home page open in your web browser.

### How to run Python in a Jupyter notebook?

- See [Jupyter Notebook Basics](http://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb)
- Create a simple Jupyter notebook with one Markdown cell and one Code cell
- Download the python file associated with the notebook newly created

## Shell commands 

- Three commands
```bash
pwd
ls
cd
```
- Run the Python interpreter
```
python3
```
- Find Python script downloaded from Jupyter and execute it
```
python3 my-script.py
```
- Find first data file

### How to start the Python interpreter?

On a Mac or Linux machine you should start a terminal and then just type the command `python3`.
<center><img src="img/python_shell.png"></center>

### How to run Python code from a file?

For running Python code, open a Terminal window and type the command `python3` or just `python` followed by the name of the script.

```bash
python3 scripts/hello.py
```
<center><img src="img/python_run_code.png"></center>

Please, make sure that you are running Python 3:
```bash
python --version
```

## Basic Python

### Cheat Sheet

- [Cheat Sheet](cheat_sheet_basic_python.ipynb)

### Files

To read from a file, your program needs to open the file and then read the contents of the file. You can read the entire contents of the file at once, or read the file line by line. The **`with`** statement makes sure the file is closed properly when the program has finished accessing the file.


Passing the `'w'` argument to `open()` tells Python you want to write to the file. Be careful; this will erase the contents of the file if it already exists. Passing the `'a'` argument tells Python you want to append to the end of an existing file.

In [None]:
# reading from file
with open("data/genes.txt") as f:
    for line in f:
        print(line.strip())

# writing to a file
with open('programming.txt', 'w') as f:
    f.write("I love programming in Python!\n")
    f.write("I love making scripts.\n")
    
# appending to a file 
with open('programming.txt', 'a') as f:
    f.write("I love working with data.\n")

### Getting help

[The Python 3 Standard Library](https://docs.python.org/3/library/index.html) is the reference documentation of all libraries included in Python as well as built-in functions and data types.

In [None]:
help(len)          # help on built-in function
help(list.extend)  # help on list function

In [None]:
# help within jupyter
len?

## Exercise 1.1

We are going to look at a [Gapminder](https://www.gapminder.org/) dataset, made famous by Hans Rosling from hisTed presentation [‘The best stats you’ve ever seen’](http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen).

- Read the dataset from the file `data/gapminder.txt` 
- Find what are the oldest and youngest years in the dataset programatically 
- Calculate average life expectancy as well as global population increase between these two years
- Find which country has the lowest life expectancy in 2002

## Next session

Go to our next notebook: [Session 1.2: Using existing python modules to explore data in files](12_python_data.ipynb)