# Python 

For this course, you will use the programming language Python and Jupyer Lab (or Jupyter Notebook) to create and run programs.

You will also need the following Python packages:

- NumPy to manipulate homogeneous data arrays
- Pandas to manipulate hererogeous labelled data (think of an excel table with data of different types such as string of characters, numbers, etc.) 
- Openpyxl to automate data processing from/to excel files
- Matplotlib and Seaborn to create data visualizations

# Book recommendations

### Python for Data Analysis by Wes McKinney

  <img src="https://nico.nexgate.ch/images/mckinney-book.job" width='15%' alt="book1" />

Wes McKinney is an american software developer born in 1985.

  <img src="https://nico.nexgate.ch/images/wes.jpg" width='15%' alt="book1" />

He is the original creator of the Pandas package for Python. He started writing Pandas in 2008 when he worked at the Hedge Fund AQR Capital and made it public in 2009. As an annectodal fact, Nicolas Vu Huu worked at Morgan Stanley in Japan in 2008 where he managed the electronic trading platform. In that funciton, Nic had AQR Capital as a client. 

### Python Data Science Handbook by Jake Vanderplas

 <img src="https://nico.nexgate.ch/images/handbook.jpg" width='15%' alt="book2" />

https://jakevdp.github.io/PythonDataScienceHandbook/

NOTE: all the examples in the book are also available on Colaboratory (Google). 


### Whirlwind Tour of Python

  <img src="https://nico.nexgate.ch/images/whrilwind.gif" width='15%' alt="book1" />


Recommended to students who absolutly no prior programming experience in any programming language. This book can be downloaded as a free pdf and also has a companion GitHub repository where you can run all code examples. It covers foundamental concepts such as what is a variable, what is an operator, what is a list etc.

Free pdf: http://www.oreilly.com/programming/free/files/a-whirlwind-tour-of-python.pdf
GitHub repo: https://github.com/jakevdp/WhirlwindTourOfPython


### Thinking, Fast and Slow by Daniel Kahneman

 <img src="https://nico.nexgate.ch/images/thinking.jpeg" width='15%' alt="book3" />

A fascinating book about "jugements and choices" with incredible insights about how humans think and broad coverage of statistical topics such as the law of small numbers.

# Codespace vs. local installation

## GitHub - Codespace

 <img src="https://nico.nexgate.ch/images/github.png" width='20%' alt="github" />

You need a Github account to submit your project work. But you do not need to learn all the github commands for this course. We will use some basic file upload from your laptop.

GitHub is a web platform which provides version control and a wide range of tools to write and deploy software.

Many schools and universities use Github. In Switzerland, TBZ (Technische Berufsschule Zürich), ZHAW, ETH and EPFL all use GitHub.

Here are a few prominent GitHub projects:

- TensorFlow is the most widely used Machine Learning library  https://github.com/tensorflow/tensorflow

- DALL-E is a text to image generator https://github.com/openai/dall-e

Here is an example of a DALL-E image created from the prompt: "watercolor painting of a penguin programmer":

  <img src="https://nico.nexgate.ch/images/penguin.png" width='20%' alt="DALL-E example" />

⚠️ <font color=red>WARNING</font>

There are over 300 millions code repositories hosted on GitHub and not all of them are safe.

The packages we are using to this course are all legit and safe to use, but you should not assume that any Python you find on the internet is safe to use.

NOTE: GitHub was originally launched in 2008 and acquired by Mircosoft in 2018 for US$7.5 billion.

## Local installation with Miniconda

Anaconda and Miniconda are the most widely used distributions of Python.

Anaconda is a software bundle package which includes Python, Jupyter Notebook, Spyder, as well as many Python libraries. This distribution uses >700 MB of disk space.

A lighter installaiton - sufficient for this course - is Miniconda which includes Python and its most commonly used maodules such as Numpy, Pandas and Matplotlib/Seaborn.

  <img src="https://nico.nexgate.ch/images/MinicondavsAnaconda.jpg" width='80%' alt="conda" />

<sub>source: https://linuxnetmag.com/miniconda-vs-anaconda/</sub>

### Recommended installation

https://docs.conda.io/en/latest/miniconda.html

then from a command prompt (or a teminal window on MacOS/Linux), run the following command:

<code>conda install numpy pandas matplotlib seaborn jupyterlab</code> 

Once you are done, run jupyter lab (either from the Anaconda desktop or from the command line).

### Alternavive installation


 <img src="https://nico.nexgate.ch/images/vsc.jpeg" width='10%' alt="vsc" />

You can also use Visual Studio code for this course. You can download the installation file for your laptop at:

https://code.visualstudio.com/download

# First steps with Python & Jupyter Notebook 

Python was invented by <a href='https://en.wikipedia.org/wiki/Guido_van_Rossum' target='new'>Guido van Rossum</a>.

His goals for Python were:

* An easy and intuitive language just as powerful as major competitors
* Open source, so anyone can contribute to its development
* Code that is as understandable as plain English
* Suitability for everyday tasks, allowing for short development times

🍻 TRIVIA: Are you younger or older than Python?

## Jupyter Notebook

- The file you are looking at is a Jupyter Notebook.
- Jupyter notebook files have the file extension .ipynb, for instance, the file you are looking at is:
```
notebook1.1.ipynb
``` 

- A Jupyer Notebook file combines documentation (also known as markup cells) and pieces of executable programs (also known as code cells).
- Code cells can also include comments  starting with a hashtag. 
```
# This is a comment 
``` 



### Code cells

In [None]:
# This is a code cell. To run it, either select the menu "Run / Run Selected Cells" or use the shortcut Shift-Enter
1+2

3

🥦 <font color='red'>TODO</font>: Try various maths operations in the cell below. (the maths operators are
- for addition: +
- for substraction: -
- for multiplication: *
- for division: / 
- for modulo: % 

In Python, parenthesis are used for grouping, for instance:

```Python
2 * (4 + 4)
```

will calculate (4+4) = 8

then it will multiply 2 * 8 = 16


### Markup cells

This is a mardown cell, it contains documentation such as the equation below.

$a.x^2+b.x+c=0$

What happens if you type text in a code cell (resp. if you type code in a markup cell)?

In [None]:
This cell should be a markup cell. If you execute it as code, you will get an error.

SyntaxError: ignored

🥦 <font color='red'>TODO</font>: Convert the cell above into a markup (comment) cell using the menu (Cell/Cell Type/Markup) then try using the Esc-M shortcut

🌶️ <font color='red'>TODO</font>: Calculate delta

$a.x^2+b.x+c=0$

To resolve this quadratic equation, one needs to calculate its delta (Δ) defined as

$Δ= b^2-4ac$

in the code cell below, calculate the delta for

$3x^2-10x+120=0$



### 🚑 How to reset your notebook

Data analysis involves a lot a of trial and error, especially as you are learning. 

A notebook can quickly become messy when cells are exucuted in a non-sequential order. 

If you find yourself in a situation where you want to reset - without losing your code - then you can use the menu Kernel / Restart Kernel and Clear All Outputs... then you can start again afresh.

🥦 <font color='red'>TODO</font>: Reset your notebook then re-run all cells up to this point (hint: run all above)

### Variables and data types

🥦 <font color='red'>TODO</font>: Try to execute the content of this notebook from the menu (Cell/Run Cells), then try to execute individual cells using Shift Enter.

In [None]:
# Examples of variables

PI = 3.14 
# notice the upper case "PI", typically used to denote constant variables, 
# i.e. variables that hold values that do not change

height = 175.3
name = "remia"
names = ["remia","adrian","rené","franziska"]
age_dict = {
  "remia": 26,
  "adrian": 42,
  "andreas": 38,
  "petya": 29,
  "joel": 44  
}

<font color='blue'>TIPS</font>: 
- Variables allows you to store data
- Compared to other programming languages like C, C++ or Java, Python is a lot more leniant with regards to the definition of variable types. You don't need to specify the type of a variable and you can re-use a variable to store a different data type.
- With ease of use, the onus is on you to ensure you use data types correctly.

🥦 <font color='red'>TODO</font>: 
* Insert a cell below from the menu, then using a shortcut (Ctrl-Shift '-' OR Esc-B)
* In the new cell, try the function type() on one of the variables above, e.g. type(age_dict)

<font color='blue'>TIPS</font>: 
- if you forget the parameters of a function, you can bring the online help by hitting Shift-Tab.
- if you forget the shortcut for a Jupyter command, you find the list in the menu Help/Keyboard Shortcuts

🥦 <font color='red'>TODO</font>: 
- In the cell below, type <i>print(</i> then hit Shift-Tab
- Then try <i>print?</i> then hit Shift-Enter 


🌶️ <font color='red'>TODO</font>: 
- Using 1 line of code, use the print function display the following lines by Emily Dickinson:

<blockquote>
"Hope” is the thing with feathers -    <br>       
That perches in the soul - <br>        
And sings the tune without the words - <br>        
And never stops - at all -<br>
</blockquote>

Try to google "how to print a double quote in a print statement in python".

Which sites give you answers?

<font color='blue'>TIPS</font>: 
- stackoverflow.com is a community site with "validated" answers, as longa as you can formulate the right question.
- don't use something you don't understand...



In [None]:
print('"Hope” is the thing with feathers -\nThat perches in the soul -\n
And sings the tune without the words -\nAnd never stops - at all -')

SyntaxError: ignored

# Importing packages in Python

A python package is a collection of modules. Modules that are related to each other are mainly put in the same package. When a module from an external package is required in a program, that package can be imported and its modules can be put to use.

In this course, we will mainly use the NumPy, Pandas, Matplotlib and Seaborn packages.

🍻 <font color='blue'>TRIVIA</font>: 
- There are more than 200,000 packages for Python
- Try to google up "Python packages list", what are the most used packages?

We are going to calculate the age agerage of the people stored in the age_dict variable.

🥦 <font color='red'>TODO</font>: 
- Insert a code cell below and type <i>age_dict</i> in it, then execute to see the content of the variable.
- Then run the following lines of code using Shift-Enter in each cell

In [None]:
# Importing from statistics module
from statistics import mean

In [None]:
mean?

In [None]:
mean([10,13,20,3,33,82,126,24,3])

In [None]:
# values() returns a list of all the values contained in a dictionary.
ages = age_dict.values() 
# rount() returns a rounded decimal with the chosen number of decimal digits
round(mean(ages),2)

### Numpy

NumPy - <i>short for Numerical Python</i> - is used for basic array processing and algebra.

<font color='blue'>TIPS</font>: 
- The standard way to import Numpy is <code>import numpy as np</code>
- This allows to call any functions from the numpy pacakge using "np.", e.g. np.arange(...) in the example below. 

🌶️ <font color='red'>TODO</font>: 
- try importing numpy without "as np"
- what do you need to change to make the code work?

In [None]:
# The standard way to import NumPy:
import numpy as np

# Create a 2-D array, set every second element in
# some rows and find max per row:

x = np.arange(15, dtype=np.int64).reshape(3, 5)
x[1:, ::2] = -99
x
# array([[  0,   1,   2,   3,   4],
#        [-99,   6, -99,   8, -99],
#        [-99,  11, -99,  13, -99]])

In [None]:
x.max(axis=1)
# array([ 4,  8, 13])

In [None]:
# Generate normally distributed random numbers:
rng = np.random.default_rng()
samples = rng.normal(size=2500)
samples

### Matplotlib and Seaborn

<font color='blue'>TIPS</font>: 
- Matplotlib and Seaborn are powerful visualizaiton packages for Python.
- The standard way to import Seaborn is <code>import seaborn as sns</code>
- The standard way to import Matplotlib is <code>import matplotlib.pyplot as plt</code> (pyplot is a module in the matplotlib package)
- They cover all the basic chart types (line, bar, distribution, swarmplot, heatmap, histogram, scatter plots, etc.)
- They can be configured to use specific colors, fonts, custom axis and annotations

🥦 <font color='red'>TODO</font>: 
* Run the cell below to see how Python can load a dataset and produce a visualization in a few lines of code.

In [None]:
import seaborn as sns
sns.set_theme(style="whitegrid", palette="muted")
df = sns.load_dataset("penguins") # Load the penguins dataset
sns.swarmplot(data=df, x="body_mass_g", y="sex", hue="species"); # Creates a swarmplot showing the body mass of 3 species of penguins by gender

# Bang shell 



## Bang commands and magic commands

Jupyter notebook run as a local "web site" on your computer or in a hosted development environment such as Colaboratory (Google) or Gitpod (Github)

It is possible to run commands to get some information about the machine you are running on.

🌶️ TODO: Try these 2 commands:

```Python
%run ./another_notebook.ipynb # to run a Jupyer Notebook within another Jupyter Notebook
!pwd # to show the current directory 
```
