# Introduction to Python and Jupyter notebooks

**Date: 06/02/2020**

**Author: [Javier Sánchez](https://github.com/fjaviersanchez)** 

**Goals: Understand basic Python functionality and introduce Jupyter notebooks**

**Resources**: More python tutorials [here](https://www.w3schools.com/python/default.asp)

## Python

`Python` (named as a tribute to Monty Python) is a high-level interpreted programming language first-appeared in 1991.

Interpreted: Instructions are executed freely without needing compilation (generally easier to use).

Compiled: Instructions are converted to machine language instructions at compilation time (generally more complex to use but faster in execution/generally not platform independent).

In [1]:
# This is a typical Python program

# Everything after a # symbol is interpreted as a comment, i.e., it is not executed

var_name = 'Hello world!'
print(var_name)

Hello world!


### 0. Jupyter notebooks

[Jupyter](http://jupyter.org) creates an easy-to-read document that you can view in your web-browser with code (that runs and creates plots inside the document on the fly!) and text (with even math). The name "Jupyter" is a combination of Julia, Python, and R. However, it has support for over 40 programming languages. Jupyter is based on iPython notebooks, and, in fact you can still launch jupyter by typing ```ipython notebook``` on your terminal.

The concept is similar to Mathematica and it works similarly (to run a **code cell** you can press ```shift+enter```)

You can launch a Jupyter notebook by just typing ```jupyter notebook``` on your terminal and this will open a new tab or window on your default browser. You can also select a different browser by setting the environment variable ```$BROWSER``` to the path of the browser that you want to use before launching or using the ```--browser``` option in the command line. In Windows under "Search programs and files" from the Start menu, type ```jupyter notebook``` and select "Jupyter notebook."

A Jupyter notebook is internally a [JSON document](http://json.org) but appears as a collection of "cells". Each segment of this document is a called cell. There are several types of cells but we are interested mainly in two types:

**Markdown cells**: Used for explanatory text (like this), and written in [GitHub-flavored markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). A markdown cells is usually displayed in output format, but a double click will switch it to input mode.  **Try that now on this cell.**  Use SHIFT+RETURN to toggle back to output format.  Markdown cells can contain latex math, for example:
$$
\frac{d\log G(z)}{d\log a} \simeq \left[ \frac{\Omega_m (1 + z)^3}{\Omega_m (1 + z)^3 + \Omega_\Lambda} \right]^{0.55}
$$

**Code cells**: Contain executable source code in the language of the document’s associated kernel (usually python).  Use SHIFT+RETURN to execute the code in a cell and see its output directly below.  **Try that now for the code cell below**.  Note that the output is not editable and that each code cell has an associated label, e.g. `[3]`, where the number records the order in which cells are executed (which is arbitrary since it depends on you).  **Re-run the code cell below and note that its number increases each time.**

In [64]:
print('Hello')

Hello


### 1. Basic concepts

`variables`: A variable is a container for storing data values. The value of a variable is assigned with the assignation operator `=`.

In [2]:
x = 5
name = 'Jane'
print(x, name)

5 Jane


Variables do not need to be declared and their type can is not fixed.

In [3]:
x = 5
print(x)
x= 'Jane'
print(x)

5
Jane


String variables can be assigned using single or double quotes:

In [4]:
x = 'Jane'
print(x)
x = "Jane"
print(x)

Jane
Jane


`lists`: A list is a collection of variables ordered and changeable (you can think of it as a vector or an array). Lists are written with brackets `[]`

In [5]:
mylist = ['a', 'b', 'c']
print(mylist)

['a', 'b', 'c']


In order to access any items in a list you can "slice" the list using the index of the element in the list that you want to obtain. You can also select a range of elements in a list with the ellipsis `start:stop`. This operation will return a list

In [6]:
mylist = ['a', 'b', 'c']
print(mylist[0]) # This returns the first element of the list (the indices start in 0 in Python)
print(mylist[0:2]) # This returns a list with the first and second elements.

a
['a', 'b']


**EXERCISE** Create a list with 5 elements.

`tuple`: A tuple is a collection of variables ordered where their elements are **unchangeable**. The access to elements or parts of the tuple is similar to the way we access data in the list.

In [7]:
x = ("a", "b", "c")

In [8]:
x[1] = "d" # This will raise an error!

TypeError: 'tuple' object does not support item assignment

In [9]:
y = list(x)
y[1] = "d"
x = tuple(y)
print(x)

('a', 'd', 'c')


`sets`: Sets are collections of variables unordered and unindexed

In [10]:
x = {'a', 'b', 'c'}
for y in x:
    print(y)

c
b
a


Once created, you cannot change the items in a set but you can add new items using `update()` (for more than one element at at time) or `add()` (only one element at a time).

In [11]:
x.add('d')
print(x)
x.update(['e','f','g'])
print(x)

{'c', 'b', 'a', 'd'}
{'f', 'b', 'a', 'd', 'e', 'c', 'g'}


`dictionary`: A dictionary is a collection of items, ordered and indexed. They have `keys` and `values`.

In [12]:
x = {'a': 0, 'b': 'Hola', 'c': 'Gato', 'd':[0,1,2,3]}
print(x)

{'a': 0, 'b': 'Hola', 'c': 'Gato', 'd': [0, 1, 2, 3]}


You can get elements in a dictionary by either slicing using the name of the `key`

In [13]:
print(x['a'])
print(x['d'])

0
[0, 1, 2, 3]


Or using the `get` method:

In [14]:
print(x.get('c'))

Gato


You can change the values of a key or add new keys:

In [15]:
x['gg'] = 'Hello'
print(x)

{'a': 0, 'b': 'Hola', 'c': 'Gato', 'd': [0, 1, 2, 3], 'gg': 'Hello'}


In [16]:
x['b'] = 'Hasta luego'
print(x)

{'a': 0, 'b': 'Hasta luego', 'c': 'Gato', 'd': [0, 1, 2, 3], 'gg': 'Hello'}


You can iterate the keys of the dictionary (see `for` loops here: https://www.w3schools.com/python/python_for_loops.asp)

In [17]:
for key in x.keys():
    print(key)

a
b
c
d
gg


Or the values of a dictionary

In [18]:
for val in x.values():
    print(val)

0
Hasta luego
Gato
[0, 1, 2, 3]
Hello


Or both!

In [19]:
for key, val in x.items():
    print(key, val)

a 0
b Hasta luego
c Gato
d [0, 1, 2, 3]
gg Hello


To remove an item from a dictionary you can use `pop`, or `del`

In [20]:
x.pop('a')
print(x)

{'b': 'Hasta luego', 'c': 'Gato', 'd': [0, 1, 2, 3], 'gg': 'Hello'}


In [21]:
del x['gg']

In [22]:
print(x)

{'b': 'Hasta luego', 'c': 'Gato', 'd': [0, 1, 2, 3]}


You can even nest dictionaries:

In [23]:
#Another way to declare a dictionary
y = dict()
y['a'] = x
y['b'] = dict(a=0, b=[0,1,2], c='Hallo')
print(y)

{'a': {'b': 'Hasta luego', 'c': 'Gato', 'd': [0, 1, 2, 3]}, 'b': {'a': 0, 'b': [0, 1, 2], 'c': 'Hallo'}}


In [24]:
print(y['a']['b'])

Hasta luego


**EXERCISE** How would you access the element with the key `c` in the dictionary `b` from `y`?

### 2. Functions

A function is a block of code that only runs with called. You can pass data, known as parameters (or arguments) to the function and operate with them. Functions are declared using `def`

In [25]:
def function_sum(arg1, arg2):
    return arg1+arg2

In [26]:
print(function_sum(1,2))

3


It is usually a good practice to define functions for pieces of code that are frequently used (since this minimizes the risk of having bugs from copy/pasting and modifying by hand some names), for example, when you repeat the same operation in different datasets.

In [27]:
data_a = [1, 2, 3]
data_b = [3, 4, 5]
data_c = [4, 5, 6]
for i in range(len(data_a)):
    print('Element', i)
    print('----')
    print('data_a + data_b:', function_sum(data_a[i], data_b[i]))
    print('data_b + data_c:', function_sum(data_b[i], data_c[i]))
    print('----')

Element 0
----
data_a + data_b: 4
data_b + data_c: 7
----
Element 1
----
data_a + data_b: 6
data_b + data_c: 9
----
Element 2
----
data_a + data_b: 8
data_b + data_c: 11
----


`keyword arguments`: The order in the argument of the functions matters but you can overcome this by using the keywords. E.g.,

In [28]:
def function_sub(arg1, arg2):
    return arg1-arg2

In [29]:
print(function_sub(1, 2))

-1


In [30]:
print(function_sub(arg2=2, arg1=1))

-1


The keywords of a function are a dictionary so you can pass a dictionary as an argument of a function using `**`

In [31]:
dict_args = {'arg1':1, 'arg2': 2}
print(function_sub(**dict_args))

-1


`**kws`: This allows to pass a dictionary with arbitrary number of keys and define functions with an arbitrary number of arguments

In [32]:
def function_sub2(arg1, arg2, **kws):
    if 'arg3' in kws.keys():
        arg3 = kws['arg3']
        return function_sub(arg1, arg2)-function_sub(arg1, arg3)
    else:
        print(kws.keys())
        return function_sub(arg1, arg2)

In [33]:
dict_args['arg3']=2
print(function_sub2(**dict_args))

0


`*args`: These are arbitrary arguments that will be passed as a tuple to your function:

In [34]:
def my_function(*args):
    print('The best food is:', args[2])

In [35]:
my_function('Burger', 'Palak Paneer', 'Pizza', 'Instant Ramen')

The best food is: Pizza


**EXERCISE** Define your own function to multiply 4 arguments

Default arguments: These are arguments that if not passed to a function, they will use a preset value.

In [36]:
def my_function(arg1, arg2='Pizza'):
    print(arg1, 'with', arg2)

In [37]:
my_function('Chicken wings')

Chicken wings with Pizza


In [38]:
my_function('Chicken wings', 'Ranch')

Chicken wings with Ranch


### 3. Classes/Objects

Python is an object oriented programming language. A `class` is an object constructor or the blueprint for an object

In [39]:
class MyClass():
    myprop = 3

In [40]:
c1 = MyClass() # We get an object of the class

In [41]:
c1.myprop # Classes have properties

3

`__init__`: The `__init__` method works to initialize a class with parameters:

In [42]:
class Car():
    def __init__(self, make, model, year, **kws):
        self.make = make
        self.model= model
        self.year = year
        if 'color' in kws.keys():
            self.color = kws['color']
        if 'engine' in kws.keys():
            self.engine = kws['engine']
        if 'transmission' in kws.keys():
            self.transmission = kws['tranmission']

In [43]:
car1 = Car('Ssang Yong', 'Rexton', '2002', **{'color': 'Black Mica'})

In [44]:
car1.color

'Black Mica'

The `__init__` method is called everytime we create an object of the class

You can also define other methods in your class

In [45]:
class Car():
    def __init__(self, make, model, year, **kws):
        self.make = make
        self.model= model
        self.year = year
        if 'color' in kws.keys():
            self.color = kws['color']
        if 'engine' in kws.keys():
            self.engine = kws['engine']
        if 'transmission' in kws.keys():
            self.transmission = kws['tranmission']
    def set_color(self, color):
        self.color = color
    def set_transmission(self, transmission):
        self.transmission = transmission
    def set_engine(self, engine):
        self.engine = engine

In [46]:
car2 = Car('Renault', 'Clio', '2002')

In [47]:
car2.color # This will fail because we haven't declared a color

AttributeError: 'Car' object has no attribute 'color'

In [48]:
car2.set_color('Torch Red')

In [49]:
car2.color

'Torch Red'

**EXERCISE** Create a class (tip: you can for example think of what attributes are useful to describe cats). 

### 3.1 Class inheritance

You can create classes that "inherit" all properties and methods from a different class and extend that class. E.g.,

In [50]:
class SUV(Car):
    def __init__(self, Car, seat_number):
        self.seat_number = seat_number
        self.vehicle_type = 'SUV'

In [51]:
suv1 = SUV(car1, 5)

In [52]:
suv1.seat_number

5

In [53]:
class SUV(Car):
    def __init__(self, make, model, year, **kws):
        Car.__init__(self, make, model, year, **kws)

In [54]:
suv1 = SUV(car1.make, car1.model, car1.year)

In [55]:
suv1.make

'Ssang Yong'

We can also inherit all of the parent's properties

In [56]:
class SUV(Car):
    def __init__(self, make, model, year, seat_number, **kws):
        super().__init__(make, model, year, **kws)
        self.seat_number = seat_number
        self.vehicle_type = 'SUV'

In [57]:
suv1 = SUV(car1.make, car1.model, car1.year, 5)

In [58]:
suv1.make

'Ssang Yong'

### 4. Python modules

One of the strengths of Python is the availability of community-developed code that you can reuse. These codes are usually in modules. A module can contain any type of variable, class or function. You can invoke a module after installation, if it is included in your `PYTHONPATH` environment variable, or if it lives on the same directory as your code.

In [59]:
import mymodule

In [60]:
mymodule.car_dict

{'Make': 'Daewoo', 'Model': 'Matiz', 'Year': '1998'}

In [61]:
mymodule.my_function('Dog', 'Cat')

Perro Gato Dog Cat


`import` brings to the python namespace every part of the code defined in the module. The most typical modules to import in our field are `numpy` and `matplotlib`.

You can rename the name of the imported module using the command `as`

In [63]:
import numpy as np

Now `numpy` (numerical python) is imported into our namespace with the name `np`

### 5. Boilerplate and Jupyter magic functions

To start a notebook it is a good practice to import all the packages and define the styles that we want to use in our "boilerplate". A good starting point is:

```
import numpy as np
import matplotlib.pyplot as plt
```

With these commands we set up our notebook to use the numpy package and the matplotlib package. If we use them like that, the plots will pop-up in a new window instead of being shown in the notebook. To see them in the notebook we should use a "magic function".

There are two kinds of magics, line-oriented and cell-oriented. Line magics are prefixed with the % character and work much like OS command-line calls: they get as an argument the rest of the line, where arguments are passed without parentheses or quotes. Cell magics are prefixed with a double %%, and they are functions that get as an argument not only the rest of the line, but also the lines below it in a separate argument. A useful example is:

In [65]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


The magic ```%pylab``` sets up the interactive namespace from numpy and matplotlib and ```inline``` adds the plots to the notebook. These plots are rendered in PNG format by default.

More useful magic commands:

* `%time` and `%timeit` measure execution time.
* `%run` runs a Python script and loads all its data on the interactive namespace.
* `%config InlineBackend.figure_formats = {'png', 'retina'}` Enables high-resolution PNG rendering and if we change `'png'` to `'svg'` or any other format we change the format of plots rendered within the notebook.

The magic `%load` is really useful since it allows us to load any other Python script. It has an option ```-s``` that allows us to modify the code inside the notebook. 

Command line magic: You can run any system shell command using ```!``` before it. Example:

In [66]:
!ls

Intro_to_Python.ipynb corr_func_intro.ipynb
[34m__pycache__[m[m           mymodule.py


Advanced magic commands:

 * ```%load_ext Cython```
 * ```%cython``` or ```%%cython```

More on "magics": 
 * https://ipython.org/ipython-doc/3/interactive/magics.html
 * https://ipython.org/ipython-doc/3/interactive/tutorial.html

In [86]:
#? gives you the help of a function, you can also use shift+tab or help(function)
?np.linspace 

[0;31mSignature:[0m
[0mnp[0m[0;34m.[0m[0mlinspace[0m [0;34m([0m[0;34m[0m
[0;34m[0m    [0mstart[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstop[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnum[0m[0;34m=[0m[0;36m50[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mendpoint[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mretstep[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdtype[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0maxis[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return evenly spaced numbers over a specified interval.

Returns `num` evenly spaced samples, calculated over the
interval [`start`, `stop`].

The endpoint of the interval can optionally be excluded.

.. versionchanged:: 1.16.0
    Non-scalar `start` and `stop` are now supported.

Parameters
----------
start : array_like
    The starting value o

In [85]:
help(np.linspace)

Help on function linspace in module numpy:

linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
    Return evenly spaced numbers over a specified interval.
    
    Returns `num` evenly spaced samples, calculated over the
    interval [`start`, `stop`].
    
    The endpoint of the interval can optionally be excluded.
    
    .. versionchanged:: 1.16.0
        Non-scalar `start` and `stop` are now supported.
    
    Parameters
    ----------
    start : array_like
        The starting value of the sequence.
    stop : array_like
        The end value of the sequence, unless `endpoint` is set to False.
        In that case, the sequence consists of all but the last of ``num + 1``
        evenly spaced samples, so that `stop` is excluded.  Note that the step
        size changes when `endpoint` is False.
    num : int, optional
        Number of samples to generate. Default is 50. Must be non-negative.
    endpoint : bool, optional
        If True, `stop` is

### 6. Numpy

Numpy is a Python package that implements N-dimensional arrays objects and it is designed for scientific computing. It also implements a multitude of mathematical functions to operate efficiently with these arrays. The use of numpy arrays can significantly boost the performance of your Python script to be comparable to compiled C code. Some useful examples and tutorials can be found [here](http://www.scipy-lectures.org/intro/numpy/index.html) and [here](http://cs231n.github.io/python-numpy-tutorial/).

In [68]:
#Example of how to compute the sum of two lists
def add(x,y):
    add=0
    for element_x in x:
        add=add+element_x
    for element_y in y:
        add=add+element_y
    return add

In [74]:
mylist = [0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 20, 30]

In [75]:
add(mylist, mylist)

298

In [76]:
%timeit -n 100 add(mylist, mylist)

968 ns ± 6.92 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [78]:
mylist = np.array(mylist)

In [80]:
%timeit -n 100 mylist+mylist

673 ns ± 143 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)


**EXERCISE** Generate a list of 1000 elements and see if that increases the difference in runtime (tip: use `range` or `np.linspace`, `np.arange` to create large lists)

### 6.1 Array slicing

Numpy arrays are smart lists and you can slice them either as regular lists or using an integer or boolean numpy array.

In [87]:
x = np.random.random(1000)

In [91]:
x[0:10], x[11] # You can slice with a list

(array([0.32998626, 0.79369355, 0.17060453, 0.02611643, 0.28948341,
        0.79269949, 0.13619907, 0.14703369, 0.15897671, 0.34238208]),
 0.7690386601197027)

In [90]:
x[np.array([1, 2, 4])] # With a numpy array of integers (but not with a list of integers!)

array([0.79369355, 0.17060453, 0.28948341])

In [97]:
mask = (x > 0.5) & (x < 0.55) # With a numpy array of booleans (or condition mask)
print(mask[::30])

[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False  True False False False False False False]


In [98]:
print(x[mask])

[0.52032381 0.50003334 0.50749048 0.52941119 0.53780466 0.53548056
 0.54437741 0.50127326 0.51193447 0.5316998  0.50109129 0.54599886
 0.51547255 0.54587525 0.50816893 0.54435496 0.51430768 0.52809975
 0.51495283 0.51518938 0.50682578 0.51777326 0.52535138 0.5034982
 0.54104998 0.51467893 0.51326945 0.52777517 0.54404269 0.51787682
 0.51891459 0.52382869 0.50245173 0.52901131 0.52580377 0.53739141
 0.54803414 0.5115168  0.50995149 0.54204919 0.50312754 0.50253765
 0.53969258 0.50689089 0.54668916 0.54204508 0.53251366 0.54241345
 0.50821401 0.54408969 0.50984535 0.54268323 0.52189675 0.53371867
 0.50490789]


More information [here](https://www.w3schools.com/python/numpy_array_slicing.asp)

### 6.2 Multidimensional lists and broadcasting

Up until now we have only seen numpy arrays as lists, but in reality, they can also be multidimensional arrays (matrices/tensors).

In [101]:
A = np.array([[0,1],[1,0]])
print(A.shape)
print(A)

(2, 2)
[[0 1]
 [1 0]]


Smart broadcasting allows us to multiply scalars times matrices:

In [102]:
print(A*3)

[[0 3]
 [3 0]]


But also multiply times a vector (note that it multiplies "element by element")

In [103]:
print(A*np.array([1,0]))

[[0 0]
 [1 0]]


In [106]:
print(np.linalg.multi_dot([A,[1,0]])) #This does the prope matrix times vector

[0 1]


### 6.3 Array reshape

Sometimes it is convenient to change the shape of an array, e.g., to make element by element operations or for optimization purposes.

In [109]:
B = A.reshape((-1,4)) # -1 is like *, it automatically tries to adapt the shape given the other reshape parameters

In [110]:
B.shape

(1, 4)

In [111]:
C = A.reshape(4)

In [112]:
C.shape

(4,)

### 6.4 Array join

You can join two arrays using `np.concatenate` in a given axis, we can also horizontally or vertically stack arrays `np.hstack`, `np.vstack` 

In [114]:
a = np.array([[0, 1], [1, 1]])
b = np.array([[2, 1], [1, 1]])

In [116]:
c = np.concatenate([a,b]) # By default it concatenates in the 0th axis (i.e, rows)
print(c)

[[0 1]
 [1 1]
 [2 1]
 [1 1]]


In [119]:
d = np.vstack([a, b])
print(d)

[[0 1]
 [1 1]
 [2 1]
 [1 1]]


In [120]:
c = np.hstack([a, b])
print(c)

[[0 1 2 1]
 [1 1 1 1]]


In [122]:
d = np.concatenate([a, b], axis=1)
print(d)

[[0 1 2 1]
 [1 1 1 1]]


### 6.5 Array split

`array_split(array, N)` splits the array in `N` equal parts

In [124]:
print(np.array_split(d, 3))

[array([[0, 1, 2, 1]]), array([[1, 1, 1, 1]]), array([], shape=(0, 4), dtype=int64)]


### 6.6 Array search

You can select which elements of an array fulfill certain condition

In [126]:
print(np.where(d==1))

(array([0, 0, 1, 1, 1, 1]), array([1, 3, 0, 1, 2, 3]))


In [128]:
x = np.array([1, 3, 5, 8, 2])
print(np.where(x%2==0)) # Note that this returns a tuple and in the case of a 1D list you just want the first element

(array([3, 4]),)


***EXERCISE*** Generate 100 random numbers and count how many there are between \[0, 1/3\], \[1/3, 2/3\], and how many there are between \[2/3,1\] What would you expect? What if you generate 10,000?

### 6.7 Other

More materials here: https://www.w3schools.com/python/python_reference.asp