# Prerequisites

**Question**: What do you need to know before starting this course?

```{admonition} Objectives
- Understand basic mathematical concepts required for this course
- Write computer programs in Python
- Compute with NumPy arrays
- Analyse tabular data using `pandas`
- Plot data using `matplotlib`
```

**Expected time to complete**: 3 hours

Intro paragraph to complete.

## Basic linear algebra


## Basic python programming

**Running**

Following the .gif below, click the rocket symbol (<i class="fas fa-rocket"></i>) to launch this page as an interactive notebook in Google Colab (faster but requiring a Google account) or Binder.


![Alt Text](https://source-separation.github.io/tutorial/_images/run_cloud.gif)
<!-- 
https://youtu.be/seKOq-VMJgY?t=1082 -->

Click the `Run Cell` button or press the keyboard shortcut `Ctrl`+`Enter` to execute the code in a cell.


### Variable and expressions

#### Creating variables

- In Python variables are names to store values. 
- The `=` symbol assigns the value on the right to the name on the left.
- The variable is created when a value is assigned to it.    
- Variable Naming ([click to learn more about Python naming convention](https://namingconvention.org/python/))
  - can only contain letters, digits, and underscore _ (typically used to separate words in long variable names)
  - cannot start with a digit
  - are case sensitive (age, Age and AGE are three different variables)

Try the following code in the cell below to assign `1` to a variable `x` and `"hello world"` to a variable `string_varible`:

In [None]:
x = 1
string_variable = "Hello world"

```{Note}
Variables must be created before they are used. If a variable doesn't exist yet, or if the name has been mis-spelled, Python reports an error. 
```

```{warning}
Some variable names are reserved for special use in Python. For example, `print` is a function that prints the value of a variable. If you try to use `print` as a variable name, Python will report an error.
```

Run the following code to see the full list of reserved words in Python:

In [None]:
import keyword

keyword.kwlist

#### Built-in functions

**Print**

- `print` is a Python built-in function displays the value of an expression.
- Provide values to the function (i.e., the things to print) in parentheses.

In [None]:
print("Value of the string variable is ", string_variable)
print("The first character of the string variable is ", string_variable[0])
print("The first five characters of the string variable are ", string_variable[:5])
print(x, "+ 1 =", x + 1)

**Type**

- The `type` function returns the type of an expression.

In [None]:
type(string_variable)

In [None]:
type(x)

**Length**

- The `len` function returns the length of a string, or the number of elements in other type of variables, such as  list and tuple.

In [None]:
print(len(string_variable))

**Range**

- The `range` function returns a sequence of numbers.

In [None]:
for i in range(3):
    print("loop: ", i)

#### Variable types

- _Numbers_

   - Integers (e.g., `1`, `2`, `3`) and floating point numbers (e.g., `1.0`, `2.5`, `3.14159`) are the two main numeric types in Python.

In [None]:
y = 1
z = 3.14
print(type(y))
print(type(z))

- _Strings_

   - Strings are sequences of characters.
   - Strings are created by enclosing characters in single quotes (`'...'`) or double quotes (`"..."`).
   - Strings can be concatenated (glued together) with the `+` operator, and repeated with `*`.

In [None]:
print(string_variable + " " + "Python is fun!")

In [None]:
print(3 * string_variable)

- _Booleans_

   - Booleans are either `True` or `False`.
   - Booleans are often used in `if` statements to control the flow of a program, or used in `while` or `for` loops to control the number of times a loop is executed.

In [None]:
create_int_variable = True
if create_int_variable:
    new_int_variable = 123

new_int_variable

- _None_

   - `None` is a special value that represents the absence of a value.
   - `None` is the only value of the type `NoneType`.
   - `None` is frequently used to represent the absence of a value, as when default arguments are not passed to a function.
   - `None` is also frequently returned by functions that don't explicitly return anything in order to explicitly signal the absence of a return value.
   - `None` is a singleton object, there is only one `None` object and it is unique.
   - `None` is immutable, it cannot be changed in any way.
   - `None` is comparable to any other object using the `is` operator, but it is never equal to any other object using the `==` operator.
   - `None` is a singleton object, there is only one `None` object and it is unique.

In [None]:
none_variable = None
print(none_variable is None)

- _Lists_
   - Lists are ordered sequences of values.
   - Lists are created by enclosing values in square brackets (`[...]`).
   - Lists can contain values of different types.
   - Lists can be indexed, sliced, and nested.
   - Lists are mutable and dynamic.

In [None]:
new_list = [1, 2, 3, 4, 5, None]

Using the `append()` method can append an element to the end of the list.

In [None]:
new_list.append("Hello world")
print(new_list)

- _Dictionaries_
   - Dictionaries are unordered sets of key: value pairs, and created by enclosing pairs in curly braces (`{...}`).
   - Dictionaries can contain values of different types.
   - Dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys.
   - Dictionaries are mutable and dynamic.
   - Dictionaries have no concept of order among elements.
   - Dictionaries are sometimes found in other programming languages as “associative memories”, “associative arrays”, “associative lists”,“hashes”, “hash tables”, or “maps”.

- _Tuples_
   - Tuples are ordered sequences of values.
   - Tuples are created by enclosing values in parentheses (`(...)`).
   - Tuples can contain values of different types.
   - Tuples can be indexed and sliced.
   - Tuples are immutable and dynamic.
<!-- 8. _Sets_
1. _Frozensets_
2.  _Bytes_ -->

- _Type conversion_

   - Python can convert values from one type to another.
   - This is called type conversion, and is sometimes also called type casting.
   - The syntax for type conversion is to use the type name as a function.
   - For example, `int("32")` converts the string `32` to an integer, and `float(32)` converts the integer `32` to a floating-point number.
   - Type conversion can also be done with the built-in functions `str()`, `int()`, and `float()`.

#### Indexing and slicing

Indexing is used to access a single element of a sequence (e.g., a string, a list, or a tuple).

- Each position in the string (first, second, etc.) is given a number. This number is called an index or sometimes a subscript.
- Indices are numbered from 0.
- Use the position’s index in square brackets to get the character at that position.

In [None]:
print(string_variable[0])

Index value can be negative, which counts from the right. For example, the index value `-1` refers to the last character in the string, `-2` refers to the second-last character, and so on. See the following example to get the last element of a list:

In [None]:
print(new_list[-1])

Slicing is used to access a subsequence of a sequence.
- A part of a string is called a substring. A substring can be as short as a single character.
- An item in a list is called an element. Whenever we treat a string as if it were a list, the string’s elements are its individual characters.
- A slice is a part of a string (or, more generally, a part of any list-like thing).
- We take a slice with the notation `[start:stop]`, where `start` is the integer index of the first element we want and `stop` is the integer index of the element just after the last element we want.
- The difference between `stop` and `start` is the slice’s length.
- Taking a slice does not change the contents of the original string. Instead, taking a slice returns a copy of part of the original string.

In [None]:
# elements beginning to index 5 (not included)
string_variable[:5]

In [None]:
# elements from index 3 to 5 (not included)
string_variable[3:5]

In [None]:
# elements from index index 6 to end
string_variable[6:]

#### Calculations

Variables can be used in calculations as if they were values

In [None]:
x + 1


### Exercises

min 3 max 5

### Basic matrices/arrays operations

In python we can use the `numpy` package to perform basic matrix operations. The `numpy` package is a fundamental package for scientific computing with Python. It contains among other things:

- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- a collection of routines for linear algebra, Fourier transform, and random number generation
- a collection of routines for numerical integration and optimization
- key tools for working with numerical data in Python
- support for large, multi-dimensional arrays and matrices

#### Creating arrays

In [None]:
# import numpy library first
import numpy as np

# crate a 1D array
x = np.array([1, 2, 3, 4, 5])
print(x)

In [None]:
# crate matrices
x = [[1, 2, 3], [4, 5, 6]]
print(x)

x = np.array([1, 2, 3, 4, 5, 6])
print(x)

x = np.reshape(x, (2, 3))
print(x)

Try to run the following cell multiple times, can you get the same output every time?

In [None]:
# generate random numbers/array/matrices

mu, sigma = 0, 1
a = np.random.randint(0, 10, 5)
b = np.random.random((2, 3))

print(a)
print(b)

x = np.random.normal(mu, sigma, 5)
y = x + np.random.normal(20, 0.1, 5)

print(x)
print(y)

Run the following cell multiple times, can you get the same output every time?

In [None]:
np.random.seed(123)

print(np.random.normal(mu, sigma, 5))

#### Basic operations

In [None]:
#  Matrix

a = np.array([[1, 2], [3, 4]])
print(a + 1)
print(a - 1)
print(a * 2)
print(a / 2)

In [None]:
# inner product
b = np.array([[1, 2, 3], [4, 5, 6]])

print(np.dot(a, b))

In [None]:
c = np.array([1, 2])

print(np.dot(a, c))

#### Statistics with NumPy

In [None]:
x = np.array([1, 2, 3, 4, 5, 6])

print(np.sqrt(x))

print(x**2)

print(np.square(x))

In [None]:
x = np.random.randint(0, 10, 5)

print("The mean value of the array x is: ", np.mean(x))
print("The median value of the array x is: ", np.median(x))
print("The standard deviation of the array x is: ", np.std(x))
print("The variance of the array x is: ", np.var(x))
print("The max value of the array x is: ", np.min(x))
print("The min value of the array x is: ", np.max(x))

#### Indexing in NumPy

In [None]:
A = np.arange(1, 17, 1).reshape(4, 4).transpose()
print(A)

In [None]:
# one thing to note here is that in python, the index starts from 0, not 1
print(A[2, 3])

In [None]:
# try the same index as the book, but we got different number. The reason is R starts the index from 1 (Matlab too), but Python starts the index from 0. To select the same number (10) as the book did, we reduce the index by 1
print(A[1, 2])

In [None]:
# to select a submatrix, need the non-singleton dimension of your indexing array to be aligned with the axis you're indexing into,
# e.g. for an n x m 2D subarray: A[n by 1 array,1 by m array]
A[[[0], [2]], [1, 3]]

In [None]:
# this is another way to do that
A[0:3:2, 1:4:2]

In [None]:
# select all columns in those two rows
A[0:3:2, :]

In [None]:
# select all row in those two columns
A[:, 1:4:2]

In [None]:
# the last two examples include either no index for the columns or no index for the rows. These indicate that Python should include all columns or all rows, respectively
A[0, :]

In [None]:
# '-' sign has a different meaning and good usage in Python. This means index from the end, -1 means the last element
A[-1, -1]

In [None]:
# there are other ways to let Python keep all rows except certain index. For example, we could also use boolean.
ind = np.ones((4,), bool)
ind[[0, 2]] = False
print(ind)

In [None]:
A[ind, :]

In [None]:
# we do not specify the row or column, the default is the for the row
A[ind]

In [None]:
# we use .shape to get the shape of the matrix
A.shape

### Graphics

In python, matplotlib is the most used library for plot matplotlib.pyplot is a collection of command style functions that make matplotlib work like MATLAB.


In [None]:
from matplotlib import pyplot as plt

%matplotlib inline

x = np.random.normal(0, 1, 100)
y = np.random.normal(0, 1, 100)

plt.plot(x, y, "bo")  # please use plt.plot? to look at more options
plt.ylabel("this is the y-axis")
plt.xlabel("this is the x-axis")
plt.title("Plot of X vs Y")
plt.savefig("Figure.pdf")  # use plt.savefig function to save images
plt.show()

In [None]:
# note the arange excludes right end of rande specification
x = np.arange(1, 11)
print(x)

In [None]:
# note: np.arange actually can result in unexpected results; check np.arange(0.2, 0.6, 0.4) vs np.arange(0.2, 1.6, 1.4)
print(np.arange(0.2, 0.6, 0.4))
print(np.arange(0.2, 1.6, 1.4))

In [None]:
# in order to use Pi, math module needs to loaded first
import math

x = np.linspace(-math.pi, math.pi, num=50)
print(x)

In [None]:
import matplotlib.cm as cm
import matplotlib.mlab as mlab

y = x
X, Y = np.meshgrid(x, y)

In [None]:
%whos

In [None]:
# same as above, use plt.contour? to explore the options
f = np.cos(Y) / (1 + np.square(X))
CS = plt.contour(X, Y, f)
plt.show()

In [None]:
# I think imshow looks nicer for heatmap, use 'extent =' fix the x, y axis
fa = (f - f.T) / 2  # f.T for transpose or tranpose(f)
plt.imshow(fa, extent=(x[0], x[-1], y[0], y[-1]))
plt.show()

In [None]:
from mpl_toolkits.mplot3d import axes3d

fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.plot_wireframe(X, Y, fa)
plt.show()

### Loading Data

In Python, Pandas is a common used module to read from file into a data frame. I downloaded the Auto.csv from the book website. First, take a look at the csv file. There are headers, missing value is marked by '?'. The data is separated by comma. We can use the `read_csv` function to read the csv file into a data frame. The `read_csv` function has many parameters, we can use `?` to get the documentation of the function. 

In [None]:
import pandas as pd
import urllib

data_url = "https://github.com/pykale/transparentML/raw/main/data/Auto.csv"
# res = urllib.urlopen(data_url)
Auto = pd.read_csv(data_url, header=0, na_values="?")
# we could use .head to see the first few rows (default = 5) of the data
Auto.head()

In [None]:
# Use the same .shape function as in ndarray to find out the dimension of the data frame
Auto.shape

In [None]:
# an alternative way to select the first 4 rows.
Auto[:4]

In [None]:
# an alternative way to select the first 4 rows and first 2 columns.
Auto.iloc[:4, :2]

In [None]:
# we can use list to find the column names or use .columns
print(list(Auto))
print(Auto.columns)

In [None]:
# Use .isnull and .sum to find out how many NaNs in each variables
Auto.isnull().sum()

In [None]:
# after the previous steps, there are 397 obs in the data and only 5 with missing values. We can just drop the ones with missing values
print(Auto.shape)
Auto = Auto.dropna()
print(Auto.shape)

In [None]:
# refer a column of data frame by name, by using a '.'. Ref the options in plt.plot for more.
plt.plot(Auto.cylinders, Auto.mpg, "ro")
plt.show()

In [None]:
# use .hist to get the histogram of certain variables. column = to specify which variable
Auto.hist(column=["cylinders", "mpg"])
plt.show()

In [None]:
# use the .describe() to get a summary of the data frame. Use .describe ( include = 'all' ) for mix types, use describe(include = [np.number]) for numerical columns, use describe(include = ['O']) for objects.
Auto.describe()

In [None]:
# we can change type of certain variable(s). Here changed the cylinders into categorical variable
Auto["cylinders"] = Auto["cylinders"].astype("category")

In [None]:
Auto.describe()

In [None]:
Auto.describe(include="all")


#### Exercises

## Basic probability and statistics


### Probability

- Marginal probability
- Conditional probability
- Joint probability

## Quiz

_Not for now. To finish in the next cycle._ Complete [Quiz 0](https://forms.gle/8Q5Z7Z7Z7Z7Z7Z7Z7) to check your understanding of this topic. You are advised to score at least 50% to proceed to the next topic.

## Summary

In this topic, you learned how to:
- Use ...

## References and further reading

This material is based on the following resources:
- [Python](https://www.python.org/)
- [NumPy](https://numpy.org/)
- [Pandas](https://pandas.pydata.org/)
- [Matplotlib](https://matplotlib.org/)
- Coursera online course [Programming for Everybody (Getting Started with Python)](https://www.coursera.org/learn/python)