# Intro

The purpose of this notebook is to demonstrate some basic ideas and concept necessary to data analysis using `python`. The objective is not to teach generic programming using python! Instead, it tries to introduce the strict minimum to be able to do data analysis.

A lot of the content is inspired directly by "Learn Python in Y minutes": https://learnxinyminutes.com/docs/python/. That page is aimed at people with some experience in programming, but it contains a very good series of links at the bottom.

This document is licensed under: CC BY-SA 3.0
https://creativecommons.org/licenses/by-sa/3.0/deed.en_US

# Python, Jupyter, etc

`Python` is a generic programming language which emphasizes readability. As a generic programming language, it does not target specifically data analysis: it can as well solve generic programming language, run servers, etc. However, it's readability (and good design in general) makes it relatively comparatively easy to learn and use. That makes it quite popular. It's used by single programmer and small scientific teams, as well as large institutes and corporations (like google).

There are multiple ways to run python. It can be run directly to run a `.py` file from the command line. It can also be used inside other environments. The one we are currently using is called `Jupyter lab`. `Jupyter` lets you run python code in an interactive manner, and is displayed in your browser. The code in generally run in *notebooks* (`.ipynb` files) which is what you are currently looking at.

The notebook is divided into cells. Cells can be of 3 types: 
- code, which can be run;
- markdown, for formatted text like this;
- raw, if you're trying to write a book (we won't need this)...  

The current cell is a markdown cell.

Cells can be executed by pressing `Shift+Enter`. If you double-click on the current text, the cell will change to edit mode. You can execute it and advance to the next cell by pressing `Shift+Enter` or the little "play" arrow at the top. There are other ways to do this, check the "Run" menu at the top.

For each notebook, Jupyter runs a *kernel*. The kernel executes the code sent to it, and communicates back the result. 
That lets us run code interactively. Careful when running the same code multiple times! The kernel can be stopped and restarted using the "kernel" menu or using some of the buttons. Jupyter supports many languages (over 40, as of this writing), but currently we are using python.

Below are a few examples of code cells. 

You can try it on the cells below: click in them, and press `Shift+Enter`.

In [109]:
1 + 1

2

By default only the last line gets shown to us (otherwise there would usually be waaay too much), but all the cell gets executed. 

In [56]:
# this is a `comment`, it starts with a single `#`. It is ignored, it's just for humans to read.
a = 10 # comments run until the end of a line
b = a/2  
b+3  

8.0

If we want to display multiple values, we can use the `print` function.

In [58]:
a = 10
print("a is currently", a)
# this updates the value of `a`. The RHS is evaluated first, and the result is stored in `a`.
a = a / 2
print("a is now", a)
a+3

a is currently 10
a is now 5.0


8.0

# Variables and basic data types

Data has to be stored in variables. Variables are names by which we can refer to data. The names must be made by a single word without spaces, and without special characters. Letters, numbers and the underscore are allowed. The variable name must not start with a number. Apart from this, `python` keeps track of the type and value of variables on it's own.

## Integers
Integers (or `int`) are... integer numbers, including negative. They are most often use when counting things.

In [111]:
1

1

In [112]:
type(1)  # we can get the type of an object using the function called `type`

int

In [113]:
some_useful_name = 1  # we are storing the value `1` in the variable `a`
type(some_useful_name)

int

In [59]:
# maths behave as you would expect, most of the time...
print(1+1)
print(5-8)

2
-3


In [60]:
# mutiplication is done with `*`, division by `/`
print(10*2)
print(10 / 2) # results in 5.0, which is a float, not integer

20
5.0


In [61]:
# exponent using `**`
10 ** 2 

100

In [62]:
# parentheses enforce precendence
(1+3)*2

8

In [63]:
# to improve readability, you can include `_`
10_000

10000

## Sidenote 1: When errors happen

Errors in python are called `Exceptions`. When an error happens, an exception is thrown, and an error message is printed. Errors try be informative. The complete error message is called a `Traceback`: it displays the type of exception, the line where it happened and complementary information. Usually, there is human readable error message. 

When the exception occurs somehwere else, such as inside a function, the traceback should contain multiple entries showing us every step along the way. We'll see this later.

It takes a bit of practice to learn to read `Tracebacks`, but they are usually quite helpful. 

In [72]:
# here is a simple division by 0 error.
a = 0
5/a

ZeroDivisionError: division by zero

In [73]:
# the variable below does not exist
yqwunave

NameError: name 'yqwunave' is not defined

## Floats
Floats are real numbers. They can also be negative, and maths behave .. as you would expect.

In [67]:
type(1.0) # note it's different from type(1), which is int

float

In [68]:
# maths still behave as before
print(1.0 + 2.5)
print(10.3 - 21.5)
print(2*3.1416)
print(25.0 ** (0.5))

3.5
-11.2
6.2832
5.0


In [69]:
# you can use scientific notation
1.6E-6 + 2E-8

1.62e-06

In [70]:
# mixing ints and floats will generally result in what you'd expect
1 + 3.52

4.52

## Sidenote 2: importing things
Python is a generic programming language, it is designed to do generic programming. As such, it is used in many places, such as data analysis (for us), web applications, servers, etc. All these tasks require specialized tools, which are collected in `modules`. These modules are not loaded when we start python, we need to load them every time we start a new instance of `python`.

Modules are loaded using the `import` statement. When using `import`, we can give an alias to the `module` (to save some typing), or load specific functions. Normally, the import statements are collected at the start of the file.

As an example, (which we will need below), let's use the square-root function `sqrt`. It is available in the module, called `math`, which also contains other similar function such as logarithm, cosine, etc. There is an equivalent module for use with complex numbers, called `cmath`.



In [74]:
# simplest way to import. We can access the content of math as `math.function_name`.
import math 
math.sqrt(25)

5.0

In [112]:
# If we think the module name is too long, and we want to give it a shorter name:
import math as m
print(m.sqrt(25))
print(m.cos(m.pi))  # cos pi

5.0
-1.0


In [113]:
# we can also import specific content, as follows:
from math import sqrt
sqrt(25)

5.0

Many modules come with python. You can write your own. Many more are available online, from trusted sources. Modules in widespread use are generally safe, but they can do almost anything.

## Complex
Python has complex numbers built in. The imaginary part is noted `j`, as in engineering work. (`i` is used a lot to designate the index in a sum or vector).

In [85]:
a = 3 - 0.5j
a, type(a)

((3-0.5j), complex)

We not take $\sqrt{-25}$, which should be $5\mathrm{i}$. Note the output has a very small real part. This is a numerical error, it's a consequence of approximating numbers using a finite precision.

In [95]:
(-25.0)**(0.5)

(3.061616997868383e-16+5j)

## Booleans
Booleans (type `bool`) are logical values: `True` or `False` to indicate conditions. They can be combined with logical operators (`and`, `or`, etc.).

In [88]:
True, type(True)

(True, bool)

In [89]:
print(True and False)
print(True or False)

False
True


In [90]:
not True

False

In [116]:
# we can compare numbers
print(1 < 5.0)

True


In [119]:
# `==` means equality (whereas `=` means assignment.)
1 == 1

True

In [118]:
# note the difference between `=` and `==`
a = 4  # set variable a to 4; 4 = a is not allowed!
a == 3  # check if a == 3; 3 == a is the same thing!

False

In [103]:
print("2!=1    ", 2 != 1)  # inequality
print("1 < 10  ", 1 < 10)  # less than
print("11 > 10 ", 11 > 10) # greater than
print("10 > 10 ", 10 > 10)
print("10 >= 10", 10 >= 10)  # greater or equal
print("10 <= 10", 10 <= 10) # less or equal

2!=1     True
1 < 10   True
11 > 10  True
10 > 10  False
10 >= 10 True
10 <= 10 True


In [121]:
# we can chain
print("1 < 2 < 3", 1 < 2 < 3)
a = 1.2
b = 8.5
c = 10.6
print("a < b < c", a < b < c)

1 < 2 < 3 True
a < b < c True


## Strings
Strings are used to store text, for example file names. Storing special characters is possible, but more complicated. They are created by surrounding the contents with `"` or `'`.

In [122]:
"my_very_important_file.txt", type("my_very_important_file.txt")

('my_very_important_file.txt', str)

In [123]:
# can be joined together (concatenated) using `+`
name = "data_file"
extension = ".txt"
name + extension

'data_file.txt'

In [135]:
# note that numbers stored as text count as strings, not numbers.
type("1"), type("1.5"), "1" + "1.5"

(str, str, '11.5')

In [136]:
# we can convert strings to numbers
int("1") + float("1.5")

2.5

In [137]:
# and vice versa
str(1.0), type(str(1.0))

('1.0', str)

# Collections

Collections are data types which contain multiple things. For example, a list of numbers, a list of file names. There are many types of collections. We will introduce three: `strings`, `lists` and `numpy arrays`.

We'll demonstrate the basic ideas using strings first.

## Indexing

Collections contain multiple other items. We can obtain the number of times 

In [148]:
# Collections and numpy arrays
## basic imports

## basic arrays

## array creation

## indexing



In [150]:
# functions

In [149]:
# errors

In [151]:
# basic of OO: attribute access