# Pete's Notes on Python

## By Pete Kyle 



### Why Python

Python is popular because:

1. The language is flexible.
2. The code is easy to read.
3. It is actively supported with thousands of packages to implement various tasks easily.

If you are new to Python, I recommend the Python tutorial https://docs.python.org/3/tutorial/index.html . 

You will find yourself wanting to read documentation, which can be found on the main Python webpage https://www.python.org/ .

You might also want to learn the language using online lecture notes, such as https://lectures.scientific-python.org/# .



### Getting Started with Python

I recommend installing Python using Anaconda as an interface to the conda package manager. This allows you to create custom Python environments with specific collections of packages. To install an environment for BUFN400, open a terminal from Anaconda, then execute

`conda create -c defaults --name environmentname numpy scipy matplotlib pandas numba notebook conda numexpr jupyter nbconvert nodejs`

then

`conda activate environmentname`

Python code can be created using Python files (extension .py) with IDEs like Spyder or PyCharm. 

This course will use notebook files (extension ipynb) created with Jupyter Notebook or Jupyter-lab; Google Colab can also be used. Notebook files conveniently allow mixing cells with code and cells with Markdown text.




### Important Packages

For data science and finance, this bootcamp will introduce the most important basic Python packages, which are:

* **Numpy**: Has fast algorithms for numerical calculations of data in arrays. 

* **Pandas**: Has reasonably fast, flexible algorithms for manipulating data for finance and other research.

* **Scipy**: Augments Numpy with specialized functions for mathematical calculations.

* **Matplotlib**: Flexible, extensive plotting functionality.

For data science and finance, some other important packages are **scikit-learn** for machine learning, **statsmodels** for traditional statistics, **seaborn** for data visualization, **pytorch** for neural nets on gpus, and **numba** for speeding up Python loops.


### Documantation and Help

For absolute beginners, I recommend 

* The beginner's guide for numpy, https://numpy.org/doc/stable/user/absolute_beginners.html

* The "10 minutes to pandas" guide, https://pandas.pydata.org/docs/user_guide/10min.html

Scipy and matplotlib do not seem to have documentation for absolute beginners.

The websites numpy.org, pandas.org, scipy.org, and matplotlib.org have good documentation. I recommend the user guides:

* Numpy: https://numpy.org/doc/stable/user/index.html

* Pandas: https://pandas.pydata.org/docs/user_guide/index.html

* Scipy: https://docs.scipy.org/doc/scipy/tutorial/index.html#user-guide

* Matplotlib: https://matplotlib.org/stable/tutorials/index.html



### Which container to use

When working with data, you need to put the data into the type of container which makes it best to work with.

Python has several built-in container classes, including **list**, **dictionary**, and **tuple**. Numpy has the **ndarray**. Pandas has the **dataframe**. Different containers are appropriate for different purposes:



#### Python List

A list is an ordered sequence of objects, which may be of different type. It is fast to **append** to a list.

Use a Python list when you are creating a sequence of objects when: 

* You may want to interate over the list, but speed of iteration is not an issue.
* The objects in the list may be of different types.
* You do not know how many objects will be in the list. 
* You want to keep things simple and obvious.

It is typical to start with an empty list, then add elements one by one.

Sample Exercise: Given a list of numbers which may be of type **int** or **float**, create a list of the squares of those numbers, perserving their type:


In [1]:
ns = [1, 3, 5, 9.9]

n2s = []
for n in ns:
    n2s.append(n**2)
    
print(f"{ns=}, {n2s=}\n")
for i in range(len(ns)):
    print(f"{i}, {ns[i]}, {n2s[i]}, {type(ns[i])}, {type(n2s[i])}")


ns=[1, 3, 5, 9.9], n2s=[1, 9, 25, 98.01]

0, 1, 1, <class 'int'>, <class 'int'>
1, 3, 9, <class 'int'>, <class 'int'>
2, 5, 25, <class 'int'>, <class 'int'>
3, 9.9, 98.01, <class 'float'>, <class 'float'>


##### List Comprehension

A useful shortcut for creating lists is the **list comprehension**.

Sample Exercise: Creat the same list of squares as above using a list comprehension:

In [2]:
n2sc = [n**2 for n in ns]  # list comprehension syntax

print(n2sc)

[1, 9, 25, 98.01]


#### Python Dictionary

A Python dictionary is a hash-table of name-object pairs. The names are typically strings but can be any hashable object. The objects can be of different types.  Looking up objects by name is fast. Adding to the dictionary is fast.

Use Python dictionaries when:

1. You want te to manipulate or iterate over the names of objects, such as printing the name of objects as strings:
2. Fast look-up is important.
3. You want to keep similar data together (like in a namespace) to increase transparency and avoid name clashes.

We can put the lists `ns`, `n2s`, and their types into a dictionary:


In [3]:
d = {'nums' : ns, 'nums_squared' : n2s}
d['nums_type'] = [type(n) for n in ns]
d['nums_squared_type'] = [type(n) for n in n2s]

for k in d:
    print(f"{k}: {d[k]}")


nums: [1, 3, 5, 9.9]
nums_squared: [1, 9, 25, 98.01]
nums_type: [<class 'int'>, <class 'int'>, <class 'int'>, <class 'float'>]
nums_squared_type: [<class 'int'>, <class 'int'>, <class 'int'>, <class 'float'>]



### Python Tuple

A Python **tuple** is an immutable sequence of objects, which may be of different types.  Immutability means that elements in the tuple cannot be replaced by different elements, nor can elements be added or deleted, without creating a new tuple. Examples of immutable types are float, int, tuple. Examples of mutable types are lists and dictionaries.

Use tuples when it is important to keep elements in a precise, unchanging order and not to replace their elements with new ones. 

Here are examples illustrating mutable and immutable types.  This example also illustrates Python's `try ... except` syntax for throwing exceptions.


In [4]:
l = [1,2,3] #list (mutable)
d = {'a' : 1, 'b' : 2, 'c' : 3} # dictionary (mutable)
t = (1,2,3)  # tuple (immutable)

for x in [l, d, t]:
    try:
        x[1] = 200
    except:
        print(f"{type(x)} is immutable", end=" ")
    print(x)    

[1, 200, 3]
{'a': 1, 'b': 2, 'c': 3, 1: 200}
<class 'tuple'> is immutable (1, 2, 3)


If the elements of a tuple are themselves mutable, changes which do not create new elements are allowed:


In [5]:
t = (1, [2], (3,)) 
t[1][0] = 999
print(f"{t=}")


t=(1, [999], (3,))


The notation `(5,)` is uesed to denote a tuple containing the single element 5 because the notation `(5)` means the integer 5

In [6]:
t = (5,)
print(f"{t=}, {type(t)}")
n = (5)
print(f"{n=}, {type(n)}")
      

t=(5,), <class 'tuple'>
n=5, <class 'int'>


### Numpy ndarray

Numpy arrays are optimized for fast math calculations by requiring elements to have the same type and be stored contiguously in memory. This also makes numpy arrys use less memory than lists or other Python containers.

Use numpy arrays when you want to do fast, efficient numerical calculations, especially element-by-element evaluation of functions and matrix operations like inner product and matrix product.


Suppose `x` is a numpy array. 

The homogeneous element type is `x.dtype`.

The dimensions of the array are `x.shape`, which is a tuple with `x.ndim` elements. The number of elements is `x.size` 

#### Creation of numpy arrays

Numpy arrays can be created in many ways. Arrays keep track of their own size, shape, and `dtype` of elements.


1. Create a $2 \times 3$ numpy array (matrix) from a list of lists:


In [7]:
a = np.array([[1, 3, 5], [7, 9, 11]], dtype=np.float32)

print(f"{a=}\n{a.size=}, {a.shape=}, {a.dtype=}")

NameError: name 'np' is not defined

2. Create an array of 11 equally spaced points between 0.00 and 1.00, including endpoints:

In [None]:
b = np.linspace(start=0.00, stop=1.00, num=11, endpoint=True, dtype=np.float64)

print(f"{b=}\n{b.size=}, {b.shape=}, {b.dtype=}")

3. Create a $2 \times 5$ array of random numbers drawn from a normal distribution with mean 10 and standard deviation 2:


In [None]:
rng = np.random.default_rng(seed=12345)
x = rng.normal(loc=10.0, scale=2.0, size=(2,5))
print(f"{x=}\n{x.shape=}, {x.ndim=}, {x.dtype=}")

In finance, it is common to illustrate concepts and test models using simulated data. 

Therefore random number generators are used frequently.


#### Operations on Numpy arrays

Numpy arrays are optimized for two kinds of calculations: element-by-element function evaluation and common matrix algebra calculations.


1. Element-by-element function evaluations: Numpy has a concept call **universal functions** or **U-functions**, which result in many common functions like addition, subtraction,  multiplication, division, powers, exp, log, sin, cos, etc. being automatically applied to arrays element by element.

In [None]:
x = np.array(range(5), dtype=np.float64)
res1 = x**2
res2 = np.log(x+1) + 5 * x - np.exp(x / 2.00)

print(f"{x=}\n{res1=}\n{res2=}")


2. Matrix algebra calculations:  Numpy is optimized for common matrix algebra calculations like inner product, matrix-vector product, matrix-matrix product, and matrix decompositions. To perform matrix multiplication, you may use the "at" symbol `@`, `np.dot`, or `np.matmul`.

Exercise: Multiply a $2 \times 3$ matrix by a vector of length 3:


In [None]:
x = np.arange(6).reshape((2,3))
y = np.array([10, 30, 50])
res = x @ y
print(f"{x=}\n{y=}\n{res=}")
print(f"{x.shape=}, {y.shape=}, {res.shape=}")

In [None]:
v = np.array([1,2,3])
w = np.array([10,20,30]).reshape((3,1))
vw = v @ w
vw2 = np.dot(v, w)
print(f"{v=}\n{w=}\n{vw=}\n{vw2=}")
print(f"{v.shape=}, {w.shape=}, {vw.shape=}, {vw2.shape=}")

### Python loops are slow!

It is much faster to use pre-optimized numpy functions rather than your own hand-coded python loops to do the same thing.

When looping over many elements, best practice is to avoid python loops whereever possible.

Example: Use python's timeit package to illustrate the difference in speed for calculation of inner products using numpy and hand-coded python loops over lists or numpy arrays:

In [None]:
import timeit

n = 10**5

va = [ m / n for m in range(n) ]
vb = [ m**2 / n for m in range(n) ]

xa = np.array(va)
xb = np.array(vb)

print(f"{type(v[0])=}, {x.dtype=}")

def f(a, b):
    res = 0.00
    for i in range(len(a)):
        res += a[i] * b[i]
    return res

print(f"{f(va, vb) = }, {f(xa, xb) = },{np.dot(xa, xb) = }")

%timeit -r 3 -n 5 f(va, vb)
%timeit -r 3 -n 5 f(xa, xb)
%timeit -r 3 -n 15 np.dot(xa, xb)