# Python
## Week 1:  Python Basics and NumPy

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work by <span xmlns:cc="http://creativecommons.org/ns#" property="cc:attributionName">Jephian Lin</span> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.

## 1. Basic concepts and where to ask for help

### What is Python?
* Python is a programming language designed for **readability**
* Python has many *convenient* features making it **relatively slow**
* Python is popular because the **wide variety of packages**

### Python? Anaconda?
* Python was created by **Guido van Rossum** in 1991
* Guido likes the British comedy group **Monty Python**, so he named it as Python
* In terms of biology, both Python and Anaconda are both snakes  [(One is longer, the other is heavier)](http://www.differencebetween.net/science/nature/difference-between-python-and-anaconda/)
* In terms of computer science, **Anaconda is a Python distribution**, containing many Python packages.
* Both [Python](https://www.python.org/) and [Anaconda](https://www.anaconda.com/) can be downloaded from web and install on your own machine.

### Cloud services
* Free cloud services are getting popular:
* [Kaggle](https://www.kaggle.com/), [Colaboratory](https://colab.research.google.com/), [CoCalc](https://cocalc.com), etc.
* Uh... Kaggle and Colaboratory thrived independently, but are then bought by Google...
* CoCalc was aim on mathematical computations, especially on algebra system, and it uses Google servers...
* For basic computation, you really don't need to install Python on your own machine, provided that you have internet...

### Script?
* running Python by command lines (default option)
* fast, offline, easy to interact with other applications

### Notebook?
* running Python on a browser (e.g., **Jupyter**)
* cross-platform, used by most Cloud services, rich text format

### Your best friends
* `shift+enter`: evaluate a cell
* `tab`: autocomplete or show the possible complettions
* _object_.: press `tab` to see functions under _object_
* _func_?: evaluate to read the documentation of _func_
* _func_??: evaluate to read the source code of _func_
* Google: the answers are likely available online

Press `shift+enter` to evaluate the cell below.

In [None]:
1+1

Move your text cursor to the end of `ran` and press `tab`.  
Jupyter will autocomplete `ran` to be `range`.


In [None]:
ran

After you tell Python what `a` is, type `a.` and press `tab` to see related functions.

In [None]:
a = 'Hello'
type(a)

In [None]:
a.

In [None]:
a.count('l')

Different objects have different functions associated with them.  
For example, 
```Python
a = 'Hello'
a.upper()
```
will return `HELLO`, but 
```Python
a = 1 
a.upper()
```
will return AttributeError.

In [None]:
a = 1
type(a)

In [None]:
a.upper()

In [None]:
### With a = 1, press tab to see functions related to an integer

a.

Evaluate 
```Python
a = 'Hello'
a.upper?
```
to read the documentation of the function `upper`.  
(And you may presss `Esc` to close the documentation.)

In [None]:
a = 'Hello'
a.upper?

The function `upper` is associated with a string, so only `upper?` wouldn't work.

In [None]:
upper?

To become an expert, you will read others' code and see how they deal with it.  
Use ``??`` to check the source code if available.

In [None]:
import random

random.randint??

In [None]:
### evaluate this cell several times to get different numbers

random.randint(1,5)

Finally, Google is always ready to help.  
For example, Google "how to swap two variables in python".

In [None]:
a = 1
b = 2
a,b = b,a
print(a,b)

### Assign and print
In Python, a single `=` means to assign a value.  
For example, `a = 'Hello'` means assign the variable `a` as a string `'Hello'`.  
Here we call `a` as a **variable** and `'Hello'` as the **value** of the variable.

To see the value of a variable, use `print`.

In [None]:
a = 'Hello'
print(a)

In [None]:
a = 123
b = 'Hello'
c = 'Everybody'
print(a,b,c)

In [None]:
print(a,b,c,sep='! ')

#### Exercise
Now you notice the exclamation mark `!` only appears between the variables.  
This is normal, if you read the documentation of `print`, you will see 
> sep:   string inserted between values, default a space.

Read the documentation (by evaluating `print?`) carefully and find a way to output `123! Hello! Everybody!`.

In [None]:
print(a,b,c,sep='! ',???)

#### Exercise

Suppose someone wrote the following:
```Python
a = 'I come from taiwan'
```
This is annoying since it should be `Taiwan` but not `taiwan`.  

Type `a.` and press `tab` to see all related functions.  
Find a function under `a` that allows you to replace `taiwan` by `Taiwan`.

In [None]:
a = 'I come from taiwan'
a.???

#### Exercise 

When you collect data, if you did not carefully tell the participants  
how to fill in the form, then you will get all kinds of answers.  
Suppose you are setting up a time for a meeting and you get the  
following answers from three different people.
```Python
a = 'Monday Wednesday Friday'
b = 'Monday, Tuesday, Thursday'
c = 'Monday;Friday'
```
Extract the dates for each one by the `split` function.

In [None]:
a = 'Monday Wednesday Friday'
b = 'Monday, Tuesday, Thursday'
c = 'Monday;Friday'
print(a.split()) ### this is correct
print(b.split()) ### how to remove the comma?
print(c.split()) ### how to remove the semicolon?

#### Exercise
Reading the source code build up your knowledge on programming.

Evaluate the cell below and read the source code.  
The `random` package uses the value of pi.  
Find out how to get the value of pi.

In [None]:
import random
random??

In [None]:
from m??? import ?? as _pi
print(_pi)

**Finding the possible solutions is an essential part of programming,  
and it is a skill that will benefit you in the long run.**

Seriously, I think no one learn programming _only_ from shcool.

### Online resources for Python
1. [Kaggle Learn](https://www.kaggle.com/learn/overview) allows you to learn and run Python on Cloud.
2. [Python for Everybody](https://www.py4e.com/book) is a free/open-sourced book with free course videos that provide the details of Python.
2. [Coursera](https://www.coursera.org/) offers lots of (kind of) _free_ course.

### Jupyter shortcuts
* Press `Esc` to enter the **Command Mode**
* Press `Enter` to enter the **Edit Mode**
* In Command Mode, press `A` (`B`) to insert a cell above (below)
* In Command Mode, press `H` to read all shortcuts

### Python installation
If you are a Linux user, you can do 
```bash
sudo apt install python
```
in Ubuntu or 
```bash
sudo pacman -S python
```
in Arch Linux to install Python easily.  

If you are using Windows or Mac, then you will have to download the installation package from [Python website](https://www.python.org/).

### Python package installation

Warning: Installing packages through Jupyter is not recommended.  
This part is only to illustrate the installation process.

Code in this section is unlikely to work due to  
the settings on different machines,  
lack of internet or  
lack of permission.

### Technicalities
The exclamation mark `!` allows you to run command in your shell.  
`cat` is a program that print the content of a file.  
`/etc/os-release` stores the OS information of the machine.  

Alternatively, you can do `lsb_release`.

Note: These commands are mainly for Linux.

In [None]:
!cat /etc/os-release

### Install with `pip`
You may find packages on [the Python Package Index](https://pypi.org/), also known as PyPI.  

In general, you have to find the package official website  
and follow the instruction to install.  
Take NumPy as an example, find its [installation guide](https://www.scipy.org/install.html) and follow the instructions therein.  

However, `pip` provides you an easy way to install packages available on PyPI.  
For NumPy, you can do 
```Python
pip install numpy
```
and it will download the package and install.  

Note:  You can do `pip uninstall numpy` to uninstall.

In [None]:
!pip install numpy

In [None]:
!pip install funniesttest

### Virtual environment
Chances are that the cells above won't work well.  
This is actually good!  
Python has way too many packages available and some can conflict with each other.  

If a package is not so fundamental that everyone need that  
then don't install it globally.  
Creating a **virtual environment** is a better approach.

In [None]:
!virtualenv my_project ### create a virtual environment called my_project
!source my_project/bin/activate && pip install funniesttest ### go to the virtual environment and install

This avoids the permission issue  
but still need internet to access the package.

### Offline installation
Suppose you plan to install the package `funniesttest`  
and you already have the package file obtained from [here](https://pypi.org/project/funniesttest/) on PyPI. 

Do the following steps (in the virtual environment if necessary)  
1. Unpack the package by `gzip` or `tar` if necessary.
```bash
gzip -d filename.tar.gz
tar -xvf filename.tar
```
2. Go to the folder.
```bash
cd foldername
```
3. Install with `pip`.  Note that the dot at the end is not a period!
```Python
pip install .
```

If the package has a file `filename.whl`, then ignore above and do
```Python
pip install filename.whl
```

In [None]:
### create a virtual environment called my_project
!virtualenv my_project 

In [None]:
### unpack the package
!cp funniesttest-1.0.tar.gz my_project
!cd my_project && gzip -dk funniesttest-1.0.tar.gz && tar -xvf funniesttest-1.0.tar
### show where we are and list what's in the folder
!pwd
!ls

In [None]:
### activate my_project
### go to the folder
### then install
!source my_project/bin/activate && cd my_project/funniesttest-1.0/ && pip install .

In [None]:
### reset
### run this cell only when 
### you want to wipe out the virtual environment
!rm -rf my_project

### Conclusion
To get some experiecnes of Python, use Cloud services.  
For machines that you are not the owner, ask IT for help.  
For your own machine, it is nice to get your hand dirty and go through the installation by yourself.

## 2. Data types, Boolean tests, and Arithmetic operators

Types are importants.  
Different data types carries different properties and functions.  

For example, a string carries the function `upper`,  
but an integer does not.

In [None]:
year_of_birth = 1987 ### integer
height = 159.9 ### float 
name = 'Jephian' ### string
boss_at_office = True ### boolean values; True or False
yyyymmdd = (1987,3,21) ### tuple
friends = ['John','Jim','Jacob'] ### list
name_to_age = {'John': 15, 'Jim': 20, 'Jacob': 100} ### dictionary

**Avoid meaningless name** for a variable.  For examples,
```Python
a = 1987
b = 159.9
c = 'Jephian'
```
are no good, but we still use it occasionally for convenience.

Use `type` to **check the type** of a variable.

In [None]:
type(height)

### Boolean tests

A boolean test checks if a statement is `True` or `False`.

Check if an element is **in a list or not**.

In [None]:
'Jeffrey' in friends

In [None]:
'John' in friends

In [None]:
1 in [2,3,5]

In [None]:
1 not in [2,3,5]

**Compare numbers**  
`a = 1` means assign the value of `a` as `1`.  
`a == 1` is a boolean test to check if `a` equals `1`.

In [None]:
a = 1
a == 1

In [None]:
a == 2

In [None]:
1 > 2

In [None]:
2 > 2

In [None]:
2 >= 2

In [None]:
1 != 2 ### check if 1 is not equal to 2

Check if a variable **is an instance of a type**

In [None]:
isinstance(1,int)

In [None]:
isinstance(1.5,float)

In [None]:
isinstance(1,float)

In [None]:
isinstance('1',int)

In [None]:
isinstance('1',str)

### Arithmetic operators

In [None]:
print("23 + 4 =", 23 + 4) ### addition
print("23 - 4 =", 23 - 4) ### substraction
print("23 * 4 =", 23 * 4) ### multiplication
print("23 / 4 =", 23 / 4) ### division
print("23 ** 4 =", 23 ** 4) ### exponent
print("23 % 4 =", 23 % 4) ### remainder
print("23 // 4 =", 23 // 4) ### integer division

#### Exercise
Guess the output of the following.  
Then evaluate the cell to check the answer.

In [None]:
type(2/3)

In [None]:
a = 2
a in [2,3,5]

In [None]:
a = 2
a in ["a","b","c"]

In [None]:
a = 2
type(a) == int

In [None]:
a = 2.0
type(a) == int

In [None]:
2 == 2.0 ### Python does not check the type (for numbers)!

In [None]:
(2,3,5) == [2,3,5] ### But tuples and lists are really different things...

#### Exercise
Guess the output.  
Then try to figure out what are **keys** and **values** of a dictionary.

In [None]:
d={"two":2, "three":3, "five":5}
print(2 in d.keys())
print(2 in d.values())

#### Exercise
Guess the output of the following.  
Then evaluate the cell to check the answer.

Recall that `P and Q` is `True` only when **both** `P` and `Q` are `True`,  
and `P or Q` is `True` if **one of** `P` and `Q` are `True`.

In [None]:
50 % 3 == 0

In [None]:
50 % 3 == 0 or 51 % 3 == 0

In [None]:
50%3==0 and 51%3==0

Lists are compared by [lexicographic order](https://en.wikipedia.org/wiki/Lexicographical_order).

In [None]:
[1,2,3,4] > [2,3,4]

Strings are compared by its [ASCII code](https://en.wikipedia.org/wiki/ASCII) with lexicographic order.

In [None]:
"Z" > "B"

In [None]:
"Z" > "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"

## 3. NumPy

[NumPy](http://www.numpy.org/) is a Python package that  
takes care of high dimentional data (such as matrices).

With the built-in functions in NumPy, one can easily 
modify the data, do matrix multiplication, and vector inner product etc.

The official website offers a [quickstart tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html). 

`import numpy` allows you to access all functions  
in NumPy.

In [None]:
import numpy

print(numpy.arange(1,10))
print(numpy.zeros(10))

But you don't want to type `numpy` everytime.
```Python
import numpy as np
```
allows you to abbreviate `numpy` as `np`.

In [None]:
import numpy as np
print(np.arange(1,10))
print(np.zeros(10))

A list is a one-dimensional array.  
A list of lists is a two-dimensional array.  (Also called a matrix)

Use `np.array(list)` to create an array.

In [None]:
one_dim = np.array(
[1,2,3,4,5]
)

two_dim = np.array(
[
[1,2,3,4,5],    
[2,3,4,5,6],
[3,4,5,6,7]
]
)

print('one_dim =')
print(one_dim)
print()
print('two_dim =')
print(two_dim)
print()
print('array is a special data type in numpy:')
print(type(one_dim))

### Basic attributes

Suppose `ndarray` is an array.  
`ndarray.ndim` is the dimension.  
`ndarray.shape` is the shape, the size in each dimension  
`ndarray.size` is the total number of elements

In [None]:
print(two_dim.ndim)
print(two_dim.shape)
print(two_dim.size)

### Operations
Most operations are taken entrywisely.

In [None]:
A = np.array(
[[1,3],
[5,7]])

B = np.array(
[[2,4]
 ,[6,8]])

In [None]:
A + 1

In [None]:
A * 2

In [None]:
A + B

In [None]:
A * B ### entrywise product

In [None]:
A @ B ### matrix product

Exponentation is also taken entrywisely.

In [None]:
A ** 2

In [None]:
2 ** A

Functions like `log`, `exp`, `sqrt`, `sign` etc. are called **universal functions**.  
They are also taken entrywisely.

In [None]:
np.log(A)

In [None]:
np.exp(A)

In [None]:
np.sqrt(A)

In [None]:
np.sign(A)

In [None]:
np.max(A)

In [None]:
np.min(A)

In [None]:
np.sum(A)

Comparison is entrywise and returns a **boolean array**

In [None]:
A > 4

#### Exercise
In a class there are 4 students.  
`m1`, `m2`, `f` stores the scores of  
the first midterm, the second midterm, and the final exam  
for each of the 4 students.  
The total points for each exam is $100$.

Calculate the total scores by the following formula  
$t = 30\%\ m_1 + 35\%\ m_2 + 35\%\ f.$

In [None]:
m1 = np.array([30,50,20,70])
m2 = np.array([60,30,90,80])
f = np.array([80,40,60,100])

t

#### Exercise
Now you have the total scores.  
Find the average, max, and min of the total scores.

In [None]:
print("average:", t.???)
print("min:", t.???)
print("max:", t.???)

#### Exercise
Looks like students did not do well  
and you have to fail $3$ out of $4$ students.  

Think of a (reasonable) scheme to curve the grades  
so that no one fails.  
(Open answer)

In [None]:
new_t = 

### Create array
`np.array(list)` is the standard way to input an array.  
Some arrays that are frequently used have built-in functions to create them.

Recall that `shape` is a tuple.

`np.zeros(shape)` creates an all-zeros array of the given shape.

In [None]:
np.zeros((3,4)) ### you need to input a shape (a tuple)

`np.ones(shape)` creates an all-ones array of the given shape.

In [None]:
np.ones((3,4))

`np.eye(n)` returns an identity matrix of order $n$.

In [None]:
np.eye(4) 

`np.arrange(a,b)` returns an array `[a,...,b-1]`. 

In [None]:
np.arange(1,10)

`np.linspace(a,b,num)` returns an array of `num` numbers  
evenly spreaded between `a` and `b`.

In [None]:
np.linspace(1,10,5)

`np.random` contains various functions  
to create a **random array**  
of the shape `d0,...,dn`.

`np.random.rand(d0,...,dn)`: uniform distribution on $[0,1)$  
`np.random.randint(a,b,size=(d0,...,dn))`: uniform distribution on `a,...,b-1`  
`np.random.randn(d0,...,dn)`: normal distribution with mean $0$ and variance $1$

In [None]:
np.random.rand(2,3)

In [None]:
np.random.randint(5,size=(2,3))

In [None]:
np.random.randn(2,3)

You may **reshape** or **resize** an array.

`reshape` creates a new array, while  
`resize` modifies the original array (and returns nothing).

In [None]:
a = np.arange(8)
b = a.reshape(2,4)
print(a)
print(b)

In [None]:
a = np.arange(8)
b = a.resize(2,4)
print(a)
print(b)

### Index and Axis

A high-dimensional array is  
a list of a list of ... of a list.

In [None]:
a = np.arange(24).reshape(2,3,4)
a

`a` is a list of two two-dimensional arrays `a[0]` and `a[1]`

In [None]:
a[0]

`a[0]` is a list of three one-dimensional arrays `a[0][0]`, `a[0][1]`, and `a[0][2]`.

In [None]:
a[0][2]

Therefore, you may use 
`a[i][j][k]` to reach every entry.  
That is, `i`, `j`, and `k` control  
the $0$-th, the $1$-st, and the $2$-nd axes.

### Statistics functions
`sum`, `min`, `max` allow you to understand the statistic of the data.

In [None]:
scores = np.array([60,80,70,100])

In [None]:
scores.min()

In [None]:
scores.max()

In [None]:
scores.sum()

In [None]:
### average
scores.sum() / scores.size

These functions can be applied to a high-dimensional array  
**along a given axis**.

In [None]:
a = np.arange(24).reshape(2,3,4)
a

In [None]:
a.sum(axis=0)

In [None]:
a.max(axis=1)

In [None]:
a.min(axis=2)

The **histogram** categorizes the data and count the amount in each category.

In [None]:
a = np.random.randn(1000)
np.histogram(a)

`matplotlib` is a plotting library  
and we will talk more about it later.

In [None]:
import matplotlib.pyplot as plt

(n, bins) = np.histogram(a)
plt.plot(.5*(bins[1:]+bins[:-1]), n)
plt.show()

## Homework

**Remember to import numpy as np**

In [None]:
import numpy as np

#### Problem 1
Create an array `a` by `arange(16)`  
then `reshape` it to shape `(2,2,2,2)`.

Take the maximum of `a` along axis 0.

Can you do it in one line?

In [None]:
### Your answer below 


#### Problem 2
When you take the sum of a boolean array,  
`True` is treated as `1` and `False` is treated as `0`,  
so the sum of a boolean array is the number of `True` in the array.

Use this fact to find the number of positive entries in `a`.

In [None]:
### The first few lines give you an array a.
np.random.seed(10)
a = np.random.randn(1000)
np.random.seed(None)

### Your answer below
a

#### Problem 3
Suppose you have a dice and you roll it 10000 times.  
The you store the number you get each time in the array `nums`.  

Do you think it is a fair dice?  Why?  
(You can write your answer as a comment in the code or print it.)

If it is not fair, which number occurs much more often than the others?  
(The numbers on the dice are 1,...,6.)

In [None]:
### The first few lines give you the array nums.
import random
random.seed(10)
k = random.randint(1,6)
random.seed(None)
nums = np.hstack((np.random.randint(1,7,3000),k*np.ones(4000),np.random.randint(1,7,3000)))

### Your answer below
nums

#### Problem 4
Use the same `nums` in Problem 3.  

Use `np.histogram` to output the frequency of each number.  
(Hint:  Read the documentation of `histogram` and  
set `bins` as 6 and set `range` from 0.5 to 6.5.

In [None]:
### Your answer below
np.histogram(???)

#### Problem 5
Now try to create a fair dice by `np.random.randint`.  
Roll it 10000 times and store the number in `fair_nums`.  
Use Problem 4 to see how fair it is.

In [None]:
### Your answer below
fair_nums = np.random.randint(???)
np.histogram(???)