# 02- Python, NumPy, Jupyter

In this lesson we will take a whirlwind tour of the Python Programming language, efficient numerical Python (NumPy) for scientific/ML applications, and the Jupyter lab/notebook environment. 

This presentation uses materials from the more extensive [Official Python Tutorial](https://docs.python.org/3/tutorial/) and [NumPy Learner Documentation](https://numpy.org/learn/) where you can dive much deeper.

## Python

Python is an *interpreted* language: code is run line-by-line instead of compiling everything ahead of time. You can write standard output with `print()` Note that you do not need to end lines with semicolons.

In [None]:
print("Hello World")

This makes Python flexible and easy for prototyping, but also means means that errors will generally not be detected until runtime -- there is no compiler to help catch type errors.

In [None]:
x = 5
s = "hello"
print(x)
print(x + s)

### Basic Types and IO

The most important basic types are `int`, `float`, `bool`, and `str`. Python typing is dynamic and does not have to be declared explicitly.

#### Numeric and Boolean Types

Python will convert `int` to `float` automatically if, for example, you divide.

In [None]:
17 / 3  # classic division returns a float

Use `//` for integer division and `%` for the remainder.

In [None]:
17 // 3  # floor division discards the fractional part

In [None]:
17 % 3  # the % operator returns the remainder of the division

Use `**` for exponentiation.

In [None]:
2 ** 7  # 2 to the power of 7

`bool`s can be `True` or `False` and are technically a subtype of `int`.

In [None]:
p = True
q = False
print(p+q) # interprets True as 1 and False as 0

You can combine `bool`s with `and`, `or`, and `not`.

In [None]:
p = True
q = False
print(p and q)
print(p or q)
print(not p)

#### String Type

Python does not have a basic character type. Strings (`str`) can be designated with single or double ticks. You can concatenate strings with `+`.

In [None]:
a = 'a string'
b = "also a string"
print(a+b)

Multiline strings use triple quotes.

In [None]:
c = """a
multiline
string
"""
print(c)

You can get the length of a string (number of characters) with `len()`. You can index individual characters in the string using `string[i]` style notation, running from 0 to the length minus 1. Python also let's you use `-1` to reference the *last* character.

Remember there is no character type, so indexing just gives you a string of size 1. The following table helps to visualize the indexing conventions.

```
 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1
```

In [None]:
word = 'Python'
print(len(word)) # the number of characters, including spaces
print(word[0]) # the first character 
print(word[-1]) # # the last character

In addition to indexing, **slicing** is also supported using `[i:j]` style notation (inclusive of `i`, exclusive of `j`).

In [None]:
word = 'Python'
print(word[0:2])  # characters from position 0 (included) to 2 (excluded)
print(word[2:5])  # characters from position 2 (included) to 5 (excluded)

You can easily get a prefix or suffix by slicing: Omit the start index for a prefix or the end index for a suffix.

In [None]:
word = 'Python'
word[:2]   # prefix from the beginning up to position 2 (excluded)
word[4:]   # suffix from position 4 (included) to the end

#### Formatted String Literals

Sometimes you want to print some non-string values inside of a line of text. The easiest way to do that in Python is to use a *formatted string literal*. Begin a string with `f` or `F` before the opening quotation mark or triple quotation mark. Inside this string, you can write a Python expression between `{` and `}` characters that can refer to variables or literal values.

In [None]:
year = 2016
event = 'Referendum'
print(f'Results of the {year} {event}')

### Conditionals and Loops

#### If Statements

Python supports `if` statements along with optional `elif` for "else if" and `else`. 

**Important** Observe that Python uses a colon `:` and then indentation to denote code blocks -- this is a style convention in other languages but is the actual requirement in Python, instead of curly braces.

In [None]:
x = int(input("Please enter an integer: "))

if x < 0:
    print('Negative')
elif x == 0:
    print('Zero')
else:
    print('Positive')

Remember you can use `and`, `or`, and `not` in conditionals. You might want to use `!=` for not equal.

In [None]:
x = int(input("Please enter an integer: "))

if x < 0 and x % 2 != 0:
    print('Negative and odd')
elif x == 0:
    print('Zero')
else:
    print('Something else')

#### While and For Loops

Python supports a typical `while` loop. Note again that indentation is used to designate the code block corresponding to the loop.

In [None]:
i = 0
while i < 5:
    print(2**i, end=',') # ends the printout with a comma instead of a newline
    i += 1

More often you may use the `for` loop along with `in` (which can also be used to check for membership in a data structure).

In [None]:
word = "Python"
for char in word:
    print(char, end=", ")

If you want to iterate over a sequence of numbers, as in a C/Java style `for int i=0; i<n; i++` style for loop, then you likely want to use the `range()` function.

In [None]:
for i in range(5):
    print(i)

To iterate over the indices of a sequence, you can combine range() and len() as follows.

In [None]:
word = "Python"
for i in range(len(word)):
    print(word[i], end=", ")

### Functions and Parameters

Functions are defined using `def` as shown below. The `return` statement does not need parentheses. 

The *docstring* for the function is the triple-quote comment immediately after the function signature. This is the most common place to comment code in Python and is automatically accessible to many development environments.

In [None]:
def avg(x, y):
    """Returns average of x and y."""
    return (x + y) / 2

print(avg(5, 3))

Python functions can return as many values as you like by separating with commas: They are returned as a tuple (see below) and can be assigned to multiple variables also using comma separation.

In [None]:
def int_div(a, b):
    """Returns the quotient and remainder of a divided by b"""
    return a // b, a % b

q, r = int_div(5, 3)
print(f"Quotient: {q}; Reminder:{r}")

Functions can be assigned to variables in Python (common in functional programming but a bit strange if you aren't used to it!).

In [None]:
def do_a_func(f):
    f()

def func():
    print("hello!")

f = func

do_a_func(f)

Python functions are often specified with a variable number of arguments by giving default values for some of the parameters. All parameters without default arguments must come first.

In [None]:
def repeat(s, times=2):
    for i in range(times):
        print(s, end=",")
    
    print() # I'm still part of the function!

repeat("hello") # use default argument
repeat("world", 3) # specify argument by position
repeat("!", times=3) # use the keyword

### Lists, Tuples, Dictionaries

#### Lists and Tuples

Python supports lists, which are implemented using dynamically resized arrays (but arrays themselves are not exposed to the user -- see NumPy). You define and index them using square brackets. The indexing conventions are the same as for strings.

In [None]:
fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']

print(fruits[0]) # first element
print(fruits[-1]) # last element

You can loop over a list with or without indices. `enumerate` is handy if you want both.

In [None]:
fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']

for fruit in fruits:
    print(fruit, end=",")

print()

for i, fruit in enumerate(fruits):
    print(f'index: {i}, fruit: {fruit}')

Sometimes you want do do the opposite of `enumerate`, that is, you want to iterate over two parallel lists together. Use the `zip` function for this, which returns tuples of elements from whatever lists you iterate over.

In [None]:
for num, text in zip([1, 2, 3], ['sugar', 'spice', 'everything nice']):
    print(f'number: {num}, message: {text}')

You can also create an empty list with `[]`. Lists are dynamic in size: You can get the current size with `len`, and can add elements with `append`. Use `+` to concatenate two lists.

In [None]:
fruits = []
fruits.append("apple")
fruits.append("banana")

animals = []
animals.append("dog")
animals.append("cat")

combined = fruits + animals

for s in range(len(combined)):
    print(combined[s])

Python supports **list comprehensions** which allow you to make new lists where each element is the result of some operations applied to each member of another sequence or iterable.

In [None]:
squares = [x**2 for x in range(10)]
print(squares)

You can also have nested lists in Python. You can declare these explicitly or, often conveniently, with list comprehensions.

In [None]:
matrix = [
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
]

transpose = [[row[i] for row in matrix] for i in range(4)]

print(matrix)
print(transpose)

Simple assignment, for lists, creates a second reference to the same list object, rather than a separate list in memory.

In [None]:
my_list = ['a', 'b', 'c']
another_list = my_list
another_list.append('gamma')
print(my_list)

Tuples are like lists except they are an ordered iterable object, except they are immutable. They are designated with parentheses instead of square brackets.

In [None]:
my_tuple = ('a', 'b', 'c')
for i in range(len(my_tuple)):
    print(f'index: {i}, value: {my_tuple[i]}')

my_tuple[0] = 'alpha' # This would have been fine for a list

#### Dictionaries

A "Dictionary" is Python's built-in assosciative or map data structure (a lookup table), implemented as a dynamic hash table. 

It is best to think of a dictionary as a set of key: value pairs, with the requirement that the keys are unique (within one dictionary). Curly braces create a dictionary: `{}`, and you can use square braces `[]` to lookup the value assosciated with a key, or to add an entry, or to update a value.

In [None]:
tel = {'jack': 4098, 'sape': 4139}

print(tel['jack']) # lookup the value assosciated with the key jack

tel['guido'] = 4127 # add a key value pair (guido, 4127)
print(tel['guido'])

tel['jack'] = 0 # update the value assosciated with jack to 0
print(tel['jack'])

You can use `in` to check if a given key is in a dictionary. You can also use this to conveniently loop through the elements of a dictionary.

In [None]:
knights = {'gallahad': 'the pure', 'robin': 'the brave'}
for key in knights:
    print(f'name: {key}, trait: {knights[key]}')

### Classes, Objects, and Methods

Python supports object-oriented programming. You can define a `class`, inside of which you can define attributes accessed with the `.`. 

`self` is used to refer to an object on which an attribute is referenced, and should be the first parameter for any **instance** method. Instance variables should be declared as `self.var_name`. 

`__init__(self, ...)` is the constructor method run at object creation. Other instance methods can be declared and then called on the object later.

Note again how Python uses indentation for code blocks.

In [None]:
class Pair:
    def __init__(self, x, y):
        self.x = x # this is an instance variable
        self.y = y # another instance variable

    def max_coord(self): # this is an instance method
        return max(self.x, self.y)

p = Pair(0, 1) # Create a Pair object
print(p.max_coord()) # Call max_coord() method

Instance variables are for data unique to each instance and class variables are for attributes and methods shared by all instances of the class. Sometimes this leads to confusions like the following.

In [None]:
class Dog:
    tricks = []             # mistaken use of a class variable
    def __init__(self, name):
        self.name = name

    def add_trick(self, trick):
        self.tricks.append(trick)

d = Dog('Fido')
e = Dog('Buddy')
d.add_trick('roll over')
e.add_trick('play dead')
print(d.tricks)               # unexpectedly shared by all dogs

What we should have done instead is make tricks an instance variable.

In [None]:
class Dog:
    def __init__(self, name):
        self.name = name
        self.tricks = []    # creates a new empty list for each dog

    def add_trick(self, trick):
        self.tricks.append(trick)

d = Dog('Fido')
e = Dog('Buddy')
d.add_trick('roll over')
e.add_trick('play dead')
print(d.tricks)

## NumPy

Python is a great language, but it has one glaring problem for large scale machine learning. It is...slow. Veeeeery slow. Here are some timing loops to iterate over a list of size `n` in Java versus Python on my laptop.

|               n    |     Java (ms)    |     Python (ms)    |
|-------------------:|-----------------:|-------------------:|
|          10,000    |           < 1    |               1    |
|         100,000    |             4    |              12    |
|       1,000,000    |            17    |              97    |
|      10,000,000    |            51    |             910    |
|     100,000,000    |           181    |           8,820    |

In general you should think of regular Python iteration as **10-100 times slower** than in compiled languages such as Java or C++. 

This doesn't matter for small tasks like iterating over a list of a hundred elements. For intensive computing on big data, we turn to to the NumPy library for numerical Python. 

### NumPy Arrays

Where we have lists in basic Python, with NumPy. In order to use them, we will need to `import` the `NumPy` package. 

It is a common convention to abbreviate the package as `np` using the `as` keyword in Python -- you don't have to do this but will commonly see it in other code.

In [None]:
import numpy as np

There are several ways to create arrays. For example, you can create an array from a regular Python list or tuple using the array function. 

Note how we use `.` to access a function within the numpy package.

In [None]:
a = np.array([2, 3, 4])
print(a)

Regular Python lists can hold elements of different types and are dynamic in size. Numpy arrays have a fixed size and can only elements of one type.

In [None]:
a = np.array([2.5, 3, 4])
print(a.dtype)
print(len(a))

Often we will work with 2-dimensional NumPy arrays (rows and columns, like a matrix in mathematics).

You can initialize a numpy array prefilled with zeros or ones by passing a `shape` argument: a single integer for a 1-dimensional array or a tuple of two integers (ros, columns) for a 2-dimensional array.

In [None]:
print(np.zeros(3))

print(np.ones((4, 4)))

Or, you might want to initialize an ordered sequence, similar to the Python `range` function. You can do this using the `arrange` function.

In [None]:
print(np.arange(10, 30, 5)) # from 10 (inclusive) to 30 (exclusive) by 5s

You can use `.shape` to get the tuple of number of rows by number of columns. For example, if we want to create another Numpy array of the same size.

In [None]:
A = np.array([[1, 2], [3, 4], [5, 6]])
print(A) 
print(A.shape) # tuple of number of rows by number of columns
B = np.zeros(A.shape) # empty array of same shape
print(B)

Sometimes we want to change the shape of an array: most commonly to "flatten" a 2-d into a 1-d or vice versa. We can do this with `ravel` and `reshape`.

In [None]:
a_vector = np.arange(4)
a_matrix = a_vector.reshape((2, 2))

print(a_matrix)
print(a_matrix.ravel())

Other common structure changes include transposing (exchange rows for columns) with `.T` and stacking arrays vertically `vstack` or horizontally `hstack`.

In [None]:
a = np.array([[1, 2], [3, 4]])
b = a.T
print(a)
print(b)

In [None]:
print(np.vstack((a, b)))
print(np.hstack((a, b)))

In general, Numpy will try to avoid copying all of the elements of an array. If you really want to create a totally separate copy in memory, use the `copy()` method.

In [None]:
a = np.array([[1, 2], [3, 4]])
b = a.copy()
print(b)

### Slicing and Iterating

One-dimensional arrays can be indexed, sliced and iterated over, much like regular Python lists.

In [None]:
a = np.array([3.5, 9.3, 4.2, 7.6])
print(a)
print(a[2])
print(a[1:3])

for val in a[2:]:
    print(val, end=",")

print()

for i in range(len(a)):
    print(f'index: {i}, value: {a[i]}')

For 2-dimensional arrays, indexing and slicing with a single value/range will always refer to the **rows**.

In [None]:
A = np.array([[1, 2], [3, 4], [5, 6]])
print(A[0]) # first row
print(A[1:]) # all rows starting from second

You can index a particular row, column position using `[row, column]`. You can also slice columns by using a colon `:` in the row position.

In [None]:
A = np.array([[1, 2], [3, 4], [5, 6]])
print(A)
print(A[2, 1]) # third row, second column
print(A[:,0]) # all of the first column

You can assign whole slices at a time, for example, to assign new values to an entire column.

In [None]:
A = np.array([[1, 2], [3, 4], [5, 6]])
print(A)

A[:,0] = 0
print(A)

By default, iterating over a 2-d array will iterate over rows.

In [None]:
A = np.array([[1, 2], [3, 4], [5, 6]])

for row in A:
    print(row)

If necessary (it rarely is), you can also, of course, write a nested loop over the rows and columns.

In [None]:
A = np.array([[1, 2], [3, 4], [5, 6]])

for row in A: # using enhanced style loop
    for val in row:
        print(val, end=",")

print() 

n_rows, n_cols = A.shape
for i in range(n_rows): # using traditional index style loop
    for j in range(n_cols):
        print(A[i, j], end=",")

### Operations and Universal Functions

Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result. This is a vectorized operation that is substantially faster for large arrays than writing for loops over Python lists.

In [None]:
a = np.array([20, 30, 40, 50])
b = np.array([0, 1, 2, 3])
c = a + b # elementwise sum
print(c)

The same principle of elementwise operations holds for 2-dimensional arrays. Multiplication with `*` is also applied elementwise -- matrix multiplication is performed with `@` or `.dot`.

In [None]:
A = np.array([[1, 2], [0, 1]])
B = np.array([[0, 1], [1, 3]])
print(f'A matrix: \n{A}\n')
print(f'B matrix: \n{B}\n')
print(f'elementwise product: \n{A * B}\n') # elementwise product
print(f'matrix product: \n{A @ B}\n') # elementwise product

You can also perform elementwise boolean comparisons, which will return a boolean array.

In [None]:
a = np.array([ 9.1, -9.8,  7.4 , -2.6])
print(a)
print(a >= 0)

What is returned is called a boolean mask. Interestingly, you can *filter* a numpy array by indexing with such a mask.

In [None]:
a = np.array([ 9.1, -9.8,  7.4 , -2.6])
nonneg = a >= 0
print(a[nonneg])

To combine boolean arrays using elementwise logical operations, you must use `&` for and, `|` for or.

In [None]:
a = np.array([ 9.1, -9.8,  7.4 , -2.6])
nonneg = a >= 0
small = a < 8
print(a[nonneg & small])

Many helpful unary operations (that take an array as input and return a single value) are implemented for you in Numpy. For example, you may often want to get the `sum`, `max`, or `mean` of elements in an array.

In [None]:
a = np.array([ 9.1, -9.8,  7.4 , -2.6])
print(a.sum())
print(a.max())
print(a.mean())

For a 2-dimensional array, these unary operations will simply treat the array like a flattened 1-dimensional array. However, you can use the `axis=0` argument to get the value of each column separately, or the `axis=1` argument to get the value of each row separately.

In [None]:
b = np.arange(12).reshape(3, 4)
print(b)
print(b.sum(axis=0))     # sum of each column
print(b.min(axis=1))     # min of each row

Numpy provides a number of helpful mathematical functions that can be applied elementwise and efficiently. These are called *universal functions* or **ufuncs**. For example, you can get the square root of every element with `np.sqrt`, the logarithms with `np.log` (the natural logarithm, base $e$, is the default).

In [None]:
B = np.array([1, 2, 3])
print(np.sqrt(B))
print(np.log(B))

Numpy supports easy random number generation with the `np.random` module, including the ability to set a random `seed` for reproducible testing results. You can pass a `size` argument to get a numpy array of random results.

In [None]:
rng = np.random.default_rng(seed=2024)
print(rng.random(size=(3, 3))) # uniform random between 0 and 1
print()
print(rng.normal(size=(3, 3))) # standard normal/gaussian

Numpy has some handy functions for getting the *index* of the max/min element (`argmax`/`argmin`) or to get the indices of the elements in sorted order with `argsort`.

In [None]:
a = np.array([5, 3, 9, 7, 2, 4])
print(f'max value is at index: {a.argmax()}')
print(f'min value is at index: {a.argmin()}')
print(f'index of min, index of second min, ...: {np.argsort(a)}')

## Jupyter

We have been working in a Jupyter notebook file (`.ipynb`), which you can open within Jupyter lab, either through an OIT provided container, or on your own device if you choose to manage your own installation.

There are a number of advantages to a Jupyter environment for prototyping machine learning applications. Cells are a very lightweight and flexible mechanism for writing and executing code. 

Also, it is easy to embed visualizaitons produced by code directly alongside the code. Note how in the following example, we do not print anything at the end: The last return value gets rendered in Jupyter.

In [None]:
import seaborn as sns
data = [i**2 for i in range(100)]
sns.displot(data)

We can also embed nicely formatted text using Markdown and mathematics with LaTeX alongside our code in these markdown cells.

Markdown is a simple and lightweight formatting language used in jupyter notebooks, github/gitlab, and many places on the web. Using markdown cells, you can write text to explain your code and results, present tables lists and tables, etc. We give some brief examples of core functionality here, but you should check out the [basic syntax guide](https://www.markdownguide.org/basic-syntax/) as well.

### Lists

1. First item
2. Second item
3. Third item

- something
- another thing
- one more

### Tables

| column 1 | column 2 | column 3 |
|----------|----------|----------|
| 1        | 2        | 3        |
| 4        | 5        | 6        |
| 7        | 8        | 9        |



### Math

Jupyter notebooks in Markdown support rendering $\LaTeX$, which is the default typesetting language for well-rendered and easy to read matehatmics. We give a few examples below. For more info, we recommend [Overleaf](https://www.overleaf.com/learn/latex/Mathematical_expressions), which gives a nice introduction in their documentation. See especially the Mathematics section. Latex commands always begin with the backslash character.

You can use single dollar signs to render an inline equation such as $4x^2 + 3 = 12$. Only use inline math for short things like variable names $x$ that do not require too much vertical space, otherwise they will break up the line spacing in the paragraph and be difficult to read like $\left( \frac{x^3 + y}{\frac{1}{x+y}}\right)^2 = 0$

Instead, use double dollar signs to render centered equations. $$ \left( \frac{x^3 + y}{\frac{1}{x+y}}\right)^2 = 0 $$


You can also use the `\begin{equation}` and `\end{equation}` syntax instead of double dollar signs or if you want numbered equations.
\begin{equation}
\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)
\end{equation}


$$
\begin{align}
\dot{x} & = \sigma(y-x) \\
\dot{y} & = \rho x - y - xz \\
\dot{z} & = -\beta z + xy
\end{align}
$$

One thing to beware of is out of order cell execution, a common programming pitfall! In general, you should always ensure that your notebook output is correct after a restart and run all before turning anything in.

In [None]:
x = 0
y = 1

In [None]:
y += 1
x = y

In [None]:
x