# Introduction to Python

## LxMLS 2020


<div style="float:right">
Luis Pedro Coelho<br />
<a href="http://big-data-biology.org/">Fudan University</a><br />   
luis@luispedro.org<br />
<a href="https://twitter.com/luispedrocoelho">@luispedrocoelho</a>    
</div>


![Fudan University](./fudan.png)

# Python Language History

- Python was started in the late 80's.
- It was intended to be both _easy to teach_ and _industrial strength_.
- It is (has always been) open-source.
- It has become one of the most widely used languages (top 10).


# Python Example

In [None]:
print("Hello World")

# More complicated example

In [None]:
numbers = [10, 7, 22, 14, 17]

total = 0.0
n = 0.0
for val in numbers:
    total = total + val
    n = n + 1
print(total / n)


"Python is executable pseudo-code."

        —Python lore (often attributed to Bruce Eckel)



# Using the notebook

Cells of code & text.

There is a **single context** and it changes _as you execute cells_.

In [None]:
a = 1

In [None]:
print(a)

In [None]:
a

In [None]:
a = 2

When you really get confused, restart everything.

# Python types

## Basic types

- Numbers (integers and floating point)
- Strings
- Lists and tuples
- Dictionaries and sets

# Python Types
## Numbers I: Integers

In [None]:
a = 1
b = 2
c = 3
a + b * c

# Python Types
## Numbers II: Floats



In [None]:
a = 1.2
b = 2.4
c = 3.6

a + b *c

# Python Types
## Numbers III: Mixing Integers & Floats

When you mix, it all gets converted to floats:

In [None]:
a = 2
b = 2.5
c = 4.4

a + b*c

# Python Types

## Numbers IV: Operations

- Addition: `a + b`
- Subtraction: `a - b`
- Multiplication: `a * b`
- Exponentiation: `a ** b`
- Unary Minus: `-a`


What about division?

What is 9 divided by 3?

What is 10 divided by 3?

## Two types of division

- integer division: `a // b`
- floating point division: `a / b`

In [None]:
a = 10
b = 3

print(a // b)
print(a / float(b))

In [None]:
a / b

# Python Types

## Strings

In [None]:
first = 'John'
last = "Doe"
full = first + " " + last

full

## String rules

- Short string literals are delimited by (`"`) or (`'`).
- Short string literals are one line only.
- Special characters are input using escape sequences (`\n` for newline,...).


In [None]:
print("Line 1\nLine 2\n")
print('Line 1\nLine 2\n')

# String Formatting

## New style

We use a method called `format` to replace placeholders in a string. Placeholders are denoted by `{}`:

In [None]:
print("Username {0}".format('luispedro'))
print("Username {0} (last login: {1})".format('luispedro', 'yesterday'))

You can also leave the numbers out:

In [None]:
print("Username {}".format('luispedro'))
print("Username {} (last login: {})".format('luispedro', 'yesterday'))

# Old style Formatting

You should not use this in new code, but you may still see it in old code: 

In [None]:
print("Username %s" % 'luispedro')
print("Username %s (last login: %s)" % ('luispedro', 'yesterday'))

# Python types: Long strings

In [None]:
long = '''Tell me, is love
Still a popular suggestion
Or merely an obsolete art?

Forgive me, for asking,
This simple question,
I am unfamiliar with his heart.'''

In [None]:
print(long)

# Lists

A python list is written with square brackets and commas:

In [None]:
values = [1, 2, 5, 10]

print(values[0]) # <- first element
print(values[1]) # <- second element

In [None]:
print(values[-1])
print(len(values))

# List methods

We call a method with the `<object>.<method>()` syntax:

In [None]:
fruits = ["Pineapple", "Banana", "Apple"]
fruits.sort()
print(fruits)

In [None]:
fruits.append("Orange")
print(fruits)

### Aside: getting help

Add a `?` at the end of an object to get help inside the notebook:

In [None]:
# fruits.sort? # <- RUN THIS CELL

# Python lists can contain mixed types

In [None]:
fruits.append(123)
print(fruits)

even other lists:

In [None]:
numbers = [1,2,3]
fruits.append(numbers)
print(fruits)

# Dictionaries

In [None]:
emails = {
    'Luis' : 'luis@luispedro.org',
    'Rita' : 'rita@gmail.com',
}

print("Luis' email is {}".format(emails['Luis']))

In [None]:
emails['Petra'] = 'petra@gmail.com' # <- add a new element
emails['Luis'] = 'luispedro@big-data-biology.org' # <- replace an element

print(emails)

In [None]:
len(emails)

# None type

There is something called `None`

In [None]:
None

It's like a null pointer.

# Python Control Structures

In [None]:
MIN_GRADE = 0.5
student = "Rita"
grade = 0.7

if grade > MIN_GRADE:
    print("{} passed".format(student))
    print("Congratulations")
else:
    print("{} needs to take the test again".format(student))

# Blocks are defined by indentation

Unlike almost all other modern programming languages, Python uses **indentation** to delimit blocks!

(This is sometimes called the [off-side rule](https://en.wikipedia.org/wiki/Off-side_rule)).

- you can use any number of spaces, but you should use **4 spaces**.
- you can use TABs, but please don't


    if <condition>:
        <statement 1>
        <statement 2>
        <statement 3>
    <statement after the if>

# Conditions


- `a == b`
- `a != b`
- `a < b`
- `a > b`
- `a >= b`
- `a in lst`
- `a not in lst`



# Else clause and friends


In [None]:
a = 2
if a > 3:
    print("Greater than 3")
elif a > 2:
    print("Greater than 2")
elif a > 1:
    print("Greater than 1")
else:
    print("Kind of small")

# Loops

## for loop

In Python, the `for` loop is over a "sequence":

    for <name> in <sequence>:
        <block>

In [None]:
students = ['Luis', "Ece", "Rita"]

for st in students:
    print(st)

We can also loop over dictionaries:

In [None]:
for n in emails:
    print(n)

# while loop

By now, you can probably guess the syntax:

    while <condition>:
        <block>

In [None]:
# poor man's division
a = 2325250
b = 23

c = a
i = 0
while c > b:
    c -= b
    i += 1
    
print("{} // {} = {}".format(a, b, i))

# More loopy stuff

In many other languages, for loops are over integers (0, 1, 2, ... , N). How can we achieve the same in Python?

In [None]:
range(5)

In [None]:
for i in range(5):
    print(i**2)

# Conditions

We have seen comparisons already (e.g., `a < b`), they return booleans (either `True` or `False`):

In [None]:
a = 1
b = 2
print(a < b)
print(a == b)

We can also use booleans as values:

In [None]:
condition = True
if condition:
    print("Yep")

# Other Conditions

- Many things can be evaluated as conditions
  - lists (the empty list gets evaluated as `False`, otherwise `True`)
  - dicts (same)
  - strings (empty string evaluates to `False`, otherwise `True`)
  - numbers (zero is `False`, else `True`)
- Several other objects can be evaluated in conditions

In [None]:
if fruits:
    print("Fruity")
else:
    print("Nope")

# Break and continue

Like in other languages, in a loop `break` exits the loop and `continue` goes to the next iteration immediately:

In [None]:
numbers = [1, 6, -13, -4, 2]

total = 0
n = 0.0

for v in numbers:
    if v < 0:
        continue
    total += v
    n += 1
total/n

# Functions

In [None]:
def double(x):
    '''
    Returns the double of its argument
    '''
    return 2 * x

double(3)

Is that a comment?

It's a documentation string.

Try `double?`

# Calling a function

In [None]:
a = 4
double(a)

In [None]:
double(2.3)

In [None]:
double(double(a))

# Functions with default arguments

In [None]:
def greet(name, greeting="Hello"):
    print("{} {}".format(greeting, name))
    
greet("Mario")
greet("Mario", "Goodbye")

You can also specify the argument names (and then the order does not matter):

In [None]:
greet(greeting="Howdy", name="Mario")

This is very helpful for functions with >10 arguments!

# Breakpoint

Questions up to this point?

# NUMPY

Unlike R/MATLAB, Python relies on libraries for numerics.

- No builtin types for numeric computation
- However, packages like `numpy` are _quasi-standard_

## Basic array array type

`numpy.array`, which is a multi-dimensional array of numbers.


In [None]:
import numpy as np # <- import a library, like include/require in other languages

A = np.array([
    [0,1,2],
    [2,3,4],
    [4,5,6],
    [6,7,8]])
print(A[0,0])
print(A[0,1])
print(A[1,0])


# Why do we need numpy?

Couldn't we just use lists?

In [None]:
A = np.array([1,2,3])
B = [1, 2, 3]

1. numpy arrays have extra numeric methods.
2. efficiency
3. expressiveness


In [None]:
A

In [None]:
A.mean()

In [None]:
A.std()

In [None]:
A.max()

You can also use numeric operations with arrays, they work **element-wise**:

In [None]:
A

In [None]:
A + 1

In [None]:
A * 2

Operations with two arrays also work **element-wise**:

In [None]:
B = np.array([1,1,2])
A

In [None]:
A + B

In [None]:
A * B

# Matrix/vector operations

In [None]:
A = np.array([
            [1,0,1],
            [0,2,0],
            [0,0,1],
        ])
B = np.array([1,2,3])
np.dot(A,B)

## Numpy arrays can be very efficient

A list in Python is an array of pointers to objects:


![python list](./python-list.svg)

while a numpy array really does hold its data in memory:

![numpy array](./numpy-array.svg)

Computers are **really good** at processing contiguous blocks of memory.

These arrays can also be passed to other libraries (including those written in C or FORTRAN).

# Timing measurements

One simple example (using the magic command `%timeit`):

In [None]:
a = range(1024)
%timeit sum(a)

In [None]:
b = np.arange(1024)
%timeit b.sum()

Actually, not that big of a difference, but the difference gets larger for larger arrays and more complex operations:

In [None]:
a = range(1024*1024)
%timeit sum(v*v for v in a)

In [None]:
b = np.arange(1024*1024)
%timeit (b**2).sum()

Now, it starts to matter.

# Numpy arrays are *homogeneous*

- All members of an array have the same type
- Either integer or floating pooint
- Defined **when you first create the array**

In [None]:
A = np.array([0, 1, 2]) # <- IMPLICIT TYPE
A.dtype

In [None]:
B = np.array([0.5, 1.1, 2.1])
B.dtype

In [None]:
C = np.array([0, 1, 2], dtype=np.float64) # <- EXPLICIT TYPE
C.dtype

Besides the speed, it is also more expressive.

# Numpy data types

- `np.int8`, `np.int16`, `np.int32`, `np.int64`
- `np.uint8`, `np.uint16`, `np.uint32`, `np.uint64`
- `np.float32`, `np.float64`, `np.float16`, (and, sometimes, `np.float128`)
- `np.bool`

Note that these can over/underflow:

In [None]:
A = np.array([1,2,3], np.uint8)
A - 10

# Reduce along axis operations

If you have a multidimensional array, you can reduce it along one of its axis:

In [None]:
A = np.array([
    [0,0,1],
    [1,2,3],
    [2,4,2],
    [1,0,1]])

In [None]:
A.max(0)

In [None]:
A.max(1)

In [None]:
A.max()

# Slicing

In [None]:
A = np.array([
    [0,1,2],
    [2,3,4],
    [4,5,6],
    [6,7,8]])

A.shape

In [None]:
A[0]

In [None]:
A[0].shape

In [None]:
A[1]

In [None]:
A[:,2]

# Slices share memory!

A slice is a *view* into another array:

In [None]:
A

In [None]:
B = A[0]
B[0] = -1
A

# Argument passing is by reference

In [None]:
def double_array(A):
    A *= 2
A = np.arange(20)
double_array(A)

A

You need to be careful, but you can always make a copy:

In [None]:
A = np.arange(20)
B = A.copy()
double_array(B)
print(A)
print(B)

# Logical Arrays

Arrays of booleans:

In [None]:
A = np.array([-1,0,1,2,-2,3,4,-2])
A > 0

In [None]:
( (A > 0) & (A < 3) ).mean()

# Logical indexing

In [None]:
A[A < 0] = 0
# or

A *= (A > 0)
A

# Some helper functions

In [None]:
np.zeros((10,10))

In [None]:
np.ones(10)

In [None]:
A = np.array([1,2,3,4,5])
B = np.zeros_like(A)
B

# Matplotlib

- Matplotlib is a plotting library.
- Very flexible.
- Very active project.
- Ugly plots by default (next version will fix this).

In [None]:
import matplotlib.pyplot as plt
from matplotlib import style

style.use('fivethirtyeight')
X = np.linspace(-4, 4, 1000)
plt.plot(X, X**2*np.cos(X**2))


Ok, that was disappointing.

How can we see the plot?

Using a magic command:

In [None]:
%matplotlib inline

Now, we try again:

In [None]:
X = np.linspace(-4, 4, 1000)
plt.plot(X, X**2*np.cos(X**2))

# Interactive exploration

Personally, this is the killer feature of the notebooks:

In [None]:
from ipywidgets import interact

@interact(power=(1,10))
def plot_cos(power=n):
    plt.plot(X, X**power*np.cos(X**power))

# Interactive exploration

- You can do complicated things with complicated code, but 95% of the time, the simple ones work
- bools -> checkbox
- int/floats -> slider
- dict -> selectbox

In [None]:
@interact(power=(1,3.), function={'sin' : np.sin, 'cos': np.cos}, multiply=True)
def plot_cos_more(power, function, multiply):
    Y = function(X ** power)
    if multiply:
        Y *= X**power
    plt.plot(X, Y)

# What if I don't like this notebook thing

Then, don't use it!

[Well known talk on "I don't like notebooks" by Joel Grus](https://www.youtube.com/watch?v=7jiPeIFXb6U)

Frankly, I like it for exploration and for visualization, not so great for actual computing.

### Alternatives

1. A text editor (vim, emacs, nano...)
2. An IDE (spyder)
3. Ipython on the shell

# Documentation:

- Numpy docs: [http://docs.scipy.org/doc/](http://docs.scipy.org/doc/)
- Matplotlib: [http://matplotlib.org](http://matplotlib.org)
- Python docs: [http://docs.python.org](http://docs.python.org)
