<a href="https://colab.research.google.com/github/neworldemancer/DSF5/blob/master/Python_refresher.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework: Python for machine learning and Data Analysis
#### Here are listed the most common Python & NumPy methods used in the course, as well as usage examples. This sheet doesn't contain extensive information. Please refer to the documentation for details.


Prepared by Mykhailo Vladymyrov,
Science IT Support, University of Bern, 2023

Modified by Aris Marcolongo for the CAS ADS M3, 2024

This work is licensed under <a href="https://creativecommons.org/share-your-work/public-domain/cc0/">CC0</a>.


# 0. Most common data structures: List, tuple, set, dict

In [None]:
# A `tuple` can contain any number of any element and can't be modified
x_coordinates = (0, 1, 2, 3)

#               ^ ---------^ tuple is written in normal brackets

In [None]:
# To see what any object is, it's a good idea to print it:
print(x_coordinates)

In [None]:
# or just:
x_coordinates

In [None]:
# As well as check it's type:
type(x_coordinates)

In [None]:
# and available methods and properties
dir(x_coordinates)

In [None]:
# the `__doc__` property often contains useful info
print(x_coordinates.__doc__)

In [None]:
# Function len called on any collection — array like object — will return it's length

x_coordinates_length = len(x_coordinates)
print('length of the `x_coordinates` is', x_coordinates_length)

In [None]:
print('Also it is handy to use so-called f-strings (formatted string): they allow to easily format the output:')
print(f'For example:\n\tlength of the `x_coordinates={x_coordinates}` is {x_coordinates_length}')

#     ^------- f before string marks an f-string


In [None]:
# `list` is similar to `tuple`, but can be modified:

y_coordinates = [1, 1, 4]

#               ^ ---------^ list is written in square brackets

In [None]:
# One can loop through elements of a collection:

for x in x_coordinates:
  print (x)

In [None]:
# Or also obtain the index of the element:

for idx, y in enumerate(y_coordinates):
  print (f'y[{idx}] = {y}')

In [None]:
# Several collections can be iterated together by zipping them:

for x, y in zip(x_coordinates, y_coordinates):
  print (x, y)

In [None]:
# Elements of the list can be modified:
print(y_coordinates[0])
y_coordinates[0] = 0
print(y_coordinates[1])
print(y_coordinates)

In [None]:
# `list` can be created from another collection:

x_coordinates = list(x_coordinates)
print(f'now `x_coordinates` is {type(x_coordinates)}')

In [None]:
# Elements can be appended to a list


y_coordinates.append(9)
print(y_coordinates)

y_coordinates.append(16)
print(y_coordinates)

y_coordinates.append(25)
print(y_coordinates)


In [None]:
# Extended with another list:
x_coordinates.extend([4, 5])

In [None]:
# or added

all_numbers = x_coordinates + y_coordinates
print(all_numbers)

In [None]:
# `set` - is a collection of unique elements:
unique_numbers = set(all_numbers)
print(unique_numbers)

In [None]:
# `dictionary` is a collection where the values are assigned to unique keys and can be accessed by the key:

uptime_hours = {'jupyter': 10, 'chrome': 30}

In [None]:
print(uptime_hours['jupyter'])

In [None]:
# `list` comprehensions are a quick way to define a list:

y_coordinates = [x**2 for x in x_coordinates]
print(y_coordinates)

In [None]:
# `set`:
values = {v%7 for v in y_coordinates}
print(values)

In [None]:
# `dictionary`

x_at_y = {y:x for x, y in zip(x_coordinates, y_coordinates)}
print(x_at_y)
print(x_at_y[25])

# 1. Functions, classes and modules

### Functions

In [None]:
# A function in python takes an input and returns an output:

def my_addition(input):
    output = input + 2
    return output

print(my_addition(5))

In [None]:
# The number of parameters can be more than one

def my_addition(input, delta):
    output = input + delta
    return output

print(my_addition(5,7))

In [None]:
# Arguments can also have default values:

def my_addition(input, delta=10):
    output = input + delta
    return output

print(my_addition(5))
print(my_addition(5, 12))

In [None]:
# When calling a function, keyword arguments can be added at the end after positional ones.
res = my_addition(5, delta=9)
print(res)

### Classes and Modules

#### -- Classes: how to define a class

We will not program many classes in the course, but use them often. In the following please focus on understanding and remembering the terminology!

Main properties:

- Classes combine data-structures and functions together

- Classes contain an `__init__` method and additional ones, with the basic syntax:

```
class class_name():
    def __init__(self, ...):
        ...

    def method_name(self, ...):
        ...
```

, where the dots ... should be replaced by parameters or code. 

NB: For practical purposes we consider the `self` parameter as purely syntactic. It appears in the definition of the class (as above) but not when using it.

In [None]:
# Example of a class definition:

class MyChatBot():
    def __init__(self, name):
        print(f'Hello {name}. This is the init method.')



To `use a class`, we instantiate/construct `objects` (also called `instances`) of that class, using the following syntax:

`object_name =  class_name(...)`

where the ... contains parameters of the `__init__` method

NB: The init method is called immediately when the object is created, using the parameters provided during construction.

In [None]:
# Basic class structure:

class MyChatBot():
    def __init__(self, name):
        print(f'Hello {name}. This is the init method.')

# Creating objects beloning to the class. Note that the __init__ function is executed:
chatbot = MyChatBot('Mike')

#### -- Classes: how to use methods and variables

To use the class methods, instantiate an object and call `object_name.method_name`. This is always referred to as the `dot` notation:

In [None]:
# Basic structure with additional method:
class MyChatBot():
    def __init__(self, name):
        print(f'Hello {name}. This is the init method.')
    def answer(self, question): # This is a class method!
        print(f'Hello. I am not sure I understood. Did you just say "{question}" ?')

# Using of the method with dot notation:
chatbot = MyChatBot('Mike')
chatbot.answer('What is the answer to life?')

A class has `attributes` in addition to `methods`, where variables are stored. 

Variables/attributes are accessed:
- `during development`:  inside the class definition use the syntax `self.variable_name` ;

- `using the class via an object`: use the dot notation `object_name.variable_name`, similar to methods ;



In [None]:
# Here we define a variable inside the class, called "name" and use it in the answer method:

class MyChatBot():
    def __init__(self, name):
        print(f'Hello {name}. This is the init method.')
        self.name = name # This is a class variable!
    def answer(self, question):
        print(f'Hello {self.name}. I am not sure I understood. Did you just say "{question}" ?')

# Usage of new method is as before:
chatbot = MyChatBot('Mike')
chatbot.answer('What is the answer to life?')

# The variable can now be accessed with the dot notation also outside of the class:
print('Accessing variable value outside of the class: ', chatbot.name)

#### -- Modules: how to import classes and functions defined in external files or libraries

Modules can be considered as `a collection of methods, variables and classes`, that can be imported from a python program. 

Often classes are not explicitely defined, but imported from modules and objects constructed, without accessing directly the class definition.

We import modules, or classes contained in modules, with the following syntaxes:

`import module_name`

`import module_name as module_alias`

`from module_name import class_name`

For demonstration we import the class Counter from the collections library and use it to process a list:  

In [None]:
from collections import Counter

fruits = ['apple', 'banana', 'orange', 'apple', 'orange', 'banana', 'apple', 'banana', 'apple', 'banana', 'kiwi']
fruit_counter = Counter(fruits)

# Get the most common fruits using the most_common() method
most_common_fruits = fruit_counter.most_common()
print(most_common_fruits[0])

Methods from modules are called as well with the dot syntax (`module_name.method_name`)

For demonstration we import here the math module and use the `sqrt` and `factorial` methods :

In [None]:
import math

# Calculate the square root of a number
num = 16
sqrt_value = math.sqrt(num)
print(f"The square root of {num} is {sqrt_value}")

# Calculate the factorial of a number
factorial_value = math.factorial(5)
print(f"The factorial of 5 is {factorial_value}")

# Use the value of pi from the math module
print(f"The value of pi is {math.pi}")

Other commonly used modules are `matplotlib.pyplot` and `numpy`, that we are going to explore more in detail.

# 2. Using `matplotlib.pyplot` module

In [None]:
import matplotlib.pyplot as plt

In [None]:
x_coordinates = list(range(6))
y_coordinates = [x**2 for x in x_coordinates]

In [None]:
# Simple plot

plt.plot(x_coordinates, y_coordinates)

In [None]:
# Scatter plot
plt.scatter(x_coordinates, y_coordinates, marker='x', color='b', s=10)

In [None]:
# Scatter plot with axes range
plt.scatter(x_coordinates, y_coordinates, marker='x', color='b', s=10)
plt.xlim(0, 10)
plt.ylim(0, 10)

In [None]:
# Plot with isotropic axes

plt.scatter(x_coordinates, y_coordinates, marker='x', color='b', s=10)

current_axis = plt.gca()
current_axis.set_aspect('equal')

In [None]:
# Big plot, with details

plt.figure(figsize=(8,8))  # plot size

plt.plot(x_coordinates, y_coordinates)

plt.xlabel('x coordinate')
plt.ylabel('y coordinate')
plt.title('x=y^2')


In [None]:
# Multiple plots can be combined with subplots:

fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(8, 8))

ax[0][0].scatter(x_coordinates, y_coordinates, marker='x', color='b', s=30)
ax[0][1].scatter(x_coordinates, y_coordinates, marker='o', c=y_coordinates, s=30)

ax[1][0].plot(x_coordinates, y_coordinates)
ax[1][1].scatter(x_coordinates, y_coordinates, marker='^', c=y_coordinates, s=30, cmap=plt.cm.Accent)


In [None]:
# To make a plot in 3D

fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, projection='3d')
z_coordinates = y_coordinates

ax.scatter3D(x_coordinates, y_coordinates, z_coordinates, marker='x', s=20)


# 3. Using  `numpy` module

In [None]:
import numpy as np

`numpy` arrays are similar to `list`s mutable containers with much richer functionality

In [None]:
# Array-like objects can be converted to `numpy` array:
arr = np.array(x_coordinates)
print(arr)

In [None]:
# Attribute `shape` shows the size of an array.
print(arr.shape)

In [None]:
# Shape is a `tuple` because an array can have more than one dimension:

arr2d = np.asarray([[1,2,3], [4, 5, 6]])
print(arr2d)
print(arr2d.shape)  # last element of the shape - most inner dimension of the array

In [None]:
# Multidimensional array can be reshaped to different a shape, preserving the total number of elements and their order

arr2d_r1 = arr2d.reshape((3, 2))
print(arr2d_r1)

In [None]:
# or turned into 1-d with flatten method:

arr2d_1d = arr2d.flatten()
print(arr2d_1d, arr2d_1d.shape)

In [None]:
# Slicing of an array:
# start:stop - elements from start (included) till stop (excluded)

print(arr[1:3])

In [None]:
# Slicing of an array:
# start:stop:step - elements from start (included) till stop (excluded) with stride step

print(arr[1:6:2])

In [None]:
# Similarly for a multidimensional array:
#
print(arr2d[0:1, 1:3])

In [None]:
# `:` means take elements along the axis
#
print(arr2d[:, 1:3])

In [None]:
# To take elements over several sequential axes - use ellipsis (...):
#
print(arr2d[..., 1:3])

In [None]:
# To generate sequential integers (similar to `range`):
numbers = np.arange(0, 20, 2)
print(numbers)

In [None]:
# For floating point values

x_coord = np.linspace(start=-1, stop=1, num=5)
print (x_coord)

In [None]:
# Operations on the numpy arrays can be performed in a pythonic way:

y_coord = 2*x_coord**2 + 3

plt.plot(x_coord, y_coord)
plt.gca().set_aspect('equal')

In [None]:
# To generate uniformly distributed random numbers:
rnd = np.random.uniform(0, 10, size=10)
print(rnd)
print(rnd.shape)

In [None]:
# or normally distributed, 3D array:
rnd = np.random.normal(loc=0.5, scale=2, size=(3, 4, 5))
print(rnd)
print(rnd.shape)

In [None]:
# Let's check distribution:
plt.hist(rnd.flatten(), 10);

In [None]:
# Arrays statistics can be obtained using the array methods:
print('array `rnd` mean = ', rnd.mean())
print('array `rnd` standard deviation = ', rnd.std())
print('array `rnd` minimum = ', rnd.min())
print('array `rnd` maximum = ', rnd.max())
print()

# , or with numpy functions :
print('array `rnd` mean = ', np.mean(rnd))
print('array `rnd` standard deviation = ', np.std(rnd))
print('array `rnd` minimum = ', np.min(rnd))
print('array `rnd` maximum = ', np.max(rnd))

print()
print('array `rnd` maximum 40th percentile= ', np.percentile(rnd, 40))

In [None]:
# To get an element from a 1D array:
print(np.random.choice(arr))

In [None]:
# or several elements:
mtx = np.random.choice(arr, size=(2,2))
print(mtx)

In [None]:
# Matrix multiplication
np.dot([1, 2], mtx)  # can be also written as [1, 2]@ mtx

In [None]:
# Sometimes it's needed to obtain a grid of values, given a set of values along each axis,
# e.g. for a grid search or visualization. `meshgrid` does it so:


# define set of values along x, y axes
x_coords = np.linspace(0, 4.5, 10)      # 10 values between 0 and 45
y_coords = np.linspace(-10, -5.5, 10)   # 10 values between -10 and -6.5

# create the meshgrid
meshgrid_x, meshgrid_y = np.meshgrid(x_coords, y_coords)
print(meshgrid_x.shape, meshgrid_y.shape)  # all x, y coordinates of the mesh

In [None]:
print('x_coords:', x_coords)
print('y_coords:', y_coords)

In [None]:
print(meshgrid_x)

In [None]:
print(meshgrid_y)

In [None]:
# To plot the points created by the meshgrid we need to flatten them:

x_coordinates = meshgrid_x.flatten()
y_coordinates = meshgrid_y.flatten()

plt.scatter(x_coordinates, y_coordinates)

In [None]:
# We can use the meshgrid to visualize a 2-dimensional function

def function_2d(x, y):
    return np.sin(x) + np.cos(y)
z = function_2d(meshgrid_x, meshgrid_y)

# Contourplot visualization

plt.figure(figsize=(5,5))
plt.contourf(meshgrid_x, meshgrid_y, z, cmap='viridis')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Contourplots of f(x,y)=sin(x)+cos(y)')

# 2-d surface

plt.figure(figsize=(5, 5))
ax = plt.axes(projection='3d')
surface = ax.plot_surface(meshgrid_x, meshgrid_y, z, cmap='viridis')
plt.title('Visualization of f(x,y)=sin(x)+cos(y) as a 2d surface')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')

In [None]:
# Arrays can be concatenated along a specific axis, provided the other dimensions are same

arr_1 = np.zeros(shape=(5, 3, 6))
arr_2 = np.ones(shape=(5, 4, 6))

print(arr_1.shape, arr_2.shape)
#print(arr_1)
#print(arr_2)

In [None]:
arr_conc = np.concatenate((arr_1, arr_2), axis=1)

print(arr_conc.shape)

In [None]:
# stack along new last axis
arr_stack = np.stack((arr_1, arr_2[:, :-1]), axis=-1)

print(arr_stack.shape)

In [None]:
# Similarly to indexing one element, multiple elements of an array can be obtained
y = np.arange(6)**2
print(y)

print(f'y[2] = {y[2]}')
print(f'y[4] = {y[4]}')
print(f'y[[2,4]] = {y[[2,4]]}')

In [None]:
# This trick can be useful to shuffle several arrays of elements coherently. 
# We first define a permutation and use it to reindex the original array:

shuffled_indexes = np.random.permutation(len(y))
y_shuffled = y[shuffled_indexes]

print(f'y: {y}')
print(f'shuffled_indexes: {shuffled_indexes}')
print(f'y_shuffled: {y_shuffled}')
print()
print(f'y[shuffled_indexes] = y_shuffled : {y}[{shuffled_indexes}] = {y_shuffled}')

In [None]:
# Boolean arrays of the same dimensions can be used as masks
mask = [True, False, False, False, False, True]
y_mask = y[mask]

print(f'y[mask] = y_mask: {y}[{mask}] = {y[mask]}')

In [None]:
# This is useful to select a group of elements:
mask_above_2 = y > 2
print(mask_above_2)

mask_less_17 = y <17
print(mask_less_17)

mask = mask_above_2 * mask_less_17  # elementwise `and` operation on Boolean numpy arrays
print(mask)

print('values between 2 and 17:', y[mask])

In [None]:
# To get index of first smallest or largest element use `argmin` and `argmax`:

print(y_shuffled)

ixd_smallest = y_shuffled.argmin()
ixd_largest = y_shuffled.argmax()

print(f'index of smallest element: {ixd_smallest}. {y_shuffled}[{ixd_smallest}] = {y_shuffled[ixd_smallest]}')
print(f'index of largest element: {ixd_largest}. {y_shuffled}[{ixd_largest}] = {y_shuffled[ixd_largest]}')

In [None]:
# Form a multidimensional array, or if more the one element has to be found - use argwhere:
coords_elements_above_3 = np.argwhere(rnd > 3)
print(coords_elements_above_3)

for i, j, k in coords_elements_above_3:
  print(f'rnd[{i}, {j}, {k}] = {rnd[i, j, k]}')

In [None]:
# Transposing index groups allows to obtain index arrays for each axis
arr_i, arr_j, arr_k = coords_elements_above_3.T
print(arr_i)
print(arr_j)
print(arr_k)

# 4. Images

Colored images are often stored as 3-d numpy arrays:

In [None]:
# Load an image from a file or a URL
from skimage import io

image = io.imread('https://github.com/neworldemancer/DSF5/raw/master/figures/unibe.jpg')

print(f'The image is stored as a numpy array {type(image)}')

# Note that the resulting image is of dimension N_pixel_1 x N_pixel_2 x 3, because it is a colored image with 3 color channels (red-green-blue, also called RGB color coding) 

print('Dimensions of a colored image: ', image.shape)

In [None]:
# Display an image with pyplot
plt.imshow(image)

Gray scale images are 2-d objects:

In [None]:

from skimage import color

# Convert the image to grayscale
gray_image = color.rgb2gray(image)

print('Dimensions of a grayscale image: ', gray_image.shape)

# Display the grayscale image
plt.imshow(gray_image, cmap='gray')

print(gray_image.shape)


Visualization of a 2-d histogram as an image:

In [None]:
# Any 2D map can be visualized similarly, e.g. we can create a 2-d histogram of a dataset:

# generate a random dataset of shape (1000 x 2)
values = np.random.multivariate_normal([0, 0], [[1, 0.3],[ 0.3, 0.2]], size=1000)

# plot the dataset 
plt.scatter(values[:,0], values[:,1], s=5)
plt.title('Dataset scatterplot')

plt.figure()

# make a 2-d histogram of the dataset
h, bx, by = np.histogram2d(values[:,0], values[:,1], bins = 20)

print(h.shape)

plt.title('Histogram visualization')
plt.imshow(h, origin='lower')

