# Python

In this course, we will use the Python programming language for all tutorials, exercises, and assignments. 

Python is a great general-purpose programming language. With the help of several popular libraries (e.g., numpy, scipy, pandas, matplotlib, sklearn), it provides a powerful environment for data analytics and computing.

It'd be great if you have some experience with Python and numpy. If not, we will take this notebook as a crash course on basics for Python programming and its use for scientific computing. 

Many say that Python code is like pseudocode for it can express very powerful ideas in very few lines of code while being very readable. 

As an example, here is an implementation of the classic quicksort algorithm in Python:

In [None]:
# Define a fuction named "quicksort" that takes an array "arr" as input
def quicksort(arr): 
    # check if the array is empty
    if len(arr) <= 1:
        return arr
    # find the element in the middle of the of array
    pivot = arr[len(arr) // 2]
    # save all elements less than the pivot in an list called "left"
    left = [x for x in arr if x < pivot]
    # save all elements equal to the pivot in an list called "middle"
    middle = [x for x in arr if x == pivot]
    # save all elements greater than the pivot in an list called "right"
    right = [x for x in arr if x > pivot]
    # combine left + middle + right together
    # notice that this quicksort function is recurisive: it calls itself.  
    return quicksort(left) + middle + quicksort(right)

'''
You can call this function to sort a list of numbers. 
You can also use this function to sort a list of numbers. 

BTW. Here I show another way of commenting multiple lines in Python code.
'''
print(quicksort([3,8,16,9,14,8,10,1,2,19]))
print(quicksort(['Google','Apple','Microsoft','Amazon']))

## Basic Data Types

Like most languages, Python has a number of basic types including integers, floats, booleans, and strings. 

**Numbers**: Integers and floats work as you would expect from other languages:

In [None]:
x = 3
print(type(x)) # Prints "<class 'int'>"
print(x)       
print(x + 1)   # Addition
print(x - 1)   # Subtraction
print(x * 2)   # Multiplication
print(x ** 2)  # Exponentiation
print(x ^ 3)   # Also exponentiation

x = x + 1  # Assign the value (x+1) to x
print(x)
x += 1     # This is simpler. 
print(x)
x *= 2     # Assign the vlaue (x*2) to x
print(x) 

y = 2.5
print(type(y)) # Prints "<class 'float'>"
print(y, y + 1, y * 2, y ** 2) # Prints multiple items together

**Booleans**: Python implements all of the usual operators for Boolean logic:

In [None]:
t = True
f = False
print(type(t)) # Prints "<class 'bool'>"
print(t and f) # Logical AND; prints "False"
print(t or f)  # Logical OR; prints "True"
print(not t)   # Logical NOT; prints "False"

x = x - 9
if x > 3: 
    print(str(x) + ' is greater than 3.')

**Strings**: Python has great support for strings:

In [None]:
hello = 'hello'    # String literals can use single quotes
world = "world"    # or double quotes; it does not matter.
print(hello)       
print(len(hello))  # len() returns the string length
hw = hello + ' ' + world  # String concatenation
print(hw)  
hw12 = '%s %s %d' % (hello, world, 2021)  # string formatting
print(hw12)  

String objects have a bunch of useful methods:

In [None]:
s = "hello"
print(s.capitalize())  # Capitalize a string; prints "Hello"
print(s.upper())       # Convert a string to uppercase; prints "HELLO"
print('Hello'.lower()) # Convert a string to lowercase; prints "hello"
print(s.replace('l', '(ell)'))  # Replace all instances of one substring with another;
print('  world '.strip())  # Strip leading and trailing whitespace; prints "world"

In [12]:
# More functions/operaitons for strings
chords = 'C G Am Em F C Dm G'
print(chords.split()) # Split a string to a list of strings, default by ' '

print(chords.replace(' ','-').split('-')) # Split a string by '-'

['C', 'G', 'Am', 'Em', 'F', 'C', 'Dm', 'G']
['C', 'G', 'Am', 'Em', 'F', 'C', 'Dm', 'G']


How to extract a substring from a string? You can use the following templates: 
- ``string[start:end]``: Get all characters from index start to end-1
- ``string[:end]``: Get all characters from the beginning of the string to end-1
- ``string[start:]``: Get all characters from index start to the end of the string
- ``string[start:end:step]``: Get all characters from start to end-1 discounting every step character

In [None]:
print(len(chords)) # The length of a string

# substrings
print(chords[0]) # The first character
print(chords[0:3]) # The first three characters
print(chords[:3]) # The first three characters
print(chords[5:10]) # The characters from index 5 to 9
print(chords[-4]) # The 4th characters from the end
print(chords[-4:]) # The last four characters

Count the occurrence of a character in a string. 

For example, count the occurrence of spaces (' ') in the string. 

In [None]:
# Using a loop
count = 0
for i in range(len(chords)): 
    if chords[i] == ' ': 
        count += 1
count

In [None]:
# You can also iterate all characters in string like this: 
count = 0
for ch in chords: 
    if ch == ' ': 
        count += 1
count

In [None]:
# Using count()
chords.count(' ')

# Python List

See https://realpython.com/python-lists-tuples/

A Python list is a collection of arbitrary objects, similar to an array in many other programming languages.

- Lists are ordered.
- Lists can contain any arbitrary objects.
- List elements can be accessed by index.
- Lists can be nested to arbitrary depth.

In [2]:
# one dimensional list/array
a = ['apple', 'orange', 'kiwi', 'grape', 'cherry']
print(a[4])
print(a[1:3])
print(a[:3])
print(a[2:])
print(a[:-2])
print(a[1:-2])

cherry
['orange', 'kiwi']
['apple', 'orange', 'kiwi']
['kiwi', 'grape', 'cherry']
['apple', 'orange', 'kiwi']
['orange', 'kiwi']


In [7]:
# nested list/two dimentional array
b = [['apple', 2], ['orange', 5], ['kiwi', 4], ['grape', 3], ['cherry', 25]]
# show the quantity of kiwi
print(b[2][1])

4


In [4]:
# make sure you know the following list methods
c = a.append('peach')
print(c) # NOTE that append() change a list in place and does not return a new list
print(a)
a.remove('kiwi')
print(a)

None
['apple', 'orange', 'kiwi', 'grape', 'cherry', 'peach']
['apple', 'orange', 'grape', 'cherry', 'peach']


### List Comprehension
List comprehension formula:

``new_list = [expression (if conditional for changing the value) for member in iterable (if conditional for filtering the value)]``

In [5]:
d = [5, -2, 7, 3, -4, 10]

# create a list by replacing each number that is smaller than 9 with 'p' if positive, and 'n' if negative
d_new = ['p' if i > 0 else 'n' for i in d if i <9]
print(d_new)

['p', 'n', 'p', 'p', 'n']


In [6]:
# iterate over a list via list comprehension
[print(i) for i in d]

5
-2
7
3
-4
10


[None, None, None, None, None, None]

In [None]:
# another way 
for i in d:
    print(i)

In [None]:
# check out f strings: https://realpython.com/python-f-strings/
# if you want to get the index

for i in range(len(d)):
    print(f'the number at index of {i} is {d[i]}')

## Python Dictionary

A dictionary stores (key, value) pairs. 

In [13]:
d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print(d['cat'])       # Get an entry from a dictionary; prints "cute"
print('cat' in d)     # Check if a dictionary has a given key; prints "True"
d['fish'] = 'wet'     # Set an entry in a dictionary
print(d['fish'])      # Prints "wet"
# print(d['monkey'])  # KeyError: 'monkey' not a key of d
print(d.get('monkey', 'N/A'))  # Get an element with a default; prints "N/A"
print(d.get('fish', 'N/A'))    # Get an element with a default; prints "wet"
del d['fish']         # Remove an element from a dictionary
print(d.get('fish', 'N/A')) # "fish" is no longer a key; prints "N/A"

cute
True
wet
N/A
wet
N/A


It is easy to iterate over the keys in a dictionary:

In [10]:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
    legs = d[animal]
    print('A %s has %d legs.' % (animal, legs))
# Prints "A person has 2 legs", "A cat has 4 legs", "A spider has 8 legs"

('person', 2)
('cat', 4)
('spider', 8)


## Python Set

Unlike a list, a set is an *unordered* collection of distinct elements. 

In [9]:
animals = {'cat', 'dog'}
print('cat' in animals)   # Check if an element is in a set; prints "True"
print('fish' in animals)  # prints "False"
animals.add('fish')       # Add an element to a set
print('fish' in animals)  # Prints "True"
print(len(animals))       # Number of elements in a set; prints "3"
animals.add('cat')        # Adding an element that is already in the set does nothing
print(len(animals))       # Prints "3"
animals.remove('cat')     # Remove an element from a set
print(len(animals))       # Prints "2"

True
False
True
3
3
2


## Python Tuple
A tuple is an (immutable) ordered list of values. A tuple is in many ways similar to a list; one of the most important differences is that tuples can be used as keys in dictionaries and as elements of sets, while lists cannot. 

In [11]:
t = ('foo', 'bar', 'baz', 'qux', 'quux', 'corge')
print(t[0])
print(t[-1])

foo
corge


Tuples can be used as keys in dictionaries and as elements of sets. 

In [None]:
d = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys
print(d)
t = (5, 6)        # Create a tuple
print(type(t))    # Prints "<class 'tuple'>"
print(d[t])       # Given a key (5, 6), find the corresponding value in dictionary d. 

# Numpy Basics

Referece Chapter: https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html

- NumPy arrays are like Python's built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.
- NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python.

In [3]:
import numpy as np
np.__version__

'1.14.0'

Creating Arrays from Python Lists

In [3]:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

All elements in a NumPy array must be of the same type. If types do not match, NumPy will upcast if possible: integers are up-cast to floating point:

In [4]:
# up-cast
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

In [5]:
# explicit data type
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

In [6]:
# nested lists result in multi-dimensional arrays
# np.array([range(i, i + 3) for i in [2, 4, 6]])
np.array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

In [7]:
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

Creating Arrays from Scratch

In [8]:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [9]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [10]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [11]:
# Create an array filled with a linear sequence
# Starting at 0 (inclusive), ending at 20 (exlusive), stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [12]:
# Create an array of five values evenly spaced between 0 and 10
np.linspace(0, 10, 5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [13]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.93074935, 0.58165129, 0.41598154],
       [0.03528346, 0.78844325, 0.40981586],
       [0.85925164, 0.48453973, 0.53811313]])

In [14]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

array([[-0.36589896,  0.78556532,  1.45614107],
       [-0.3610184 ,  0.86082952,  0.09295142],
       [-0.40495247, -1.07386006,  0.7747443 ]])

In [15]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[6, 3, 4],
       [2, 5, 1],
       [9, 8, 1]])

In [16]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [17]:
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array: 3 rows x 4 columns
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array: 3 4 rows x 5 columns two-dimentional array

print(x1)
print("********")
print(x2)
print("********")
print(x3)

[5 0 3 3 7 9]
********
[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]
********
[[[8 1 5 9 8]
  [9 4 3 0 3]
  [5 0 2 3 8]
  [1 3 3 3 7]]

 [[0 1 9 9 0]
  [4 7 3 2 7]
  [2 0 0 4 5]
  [5 6 8 4 1]]

 [[4 9 8 1 1]
  [7 9 9 3 6]
  [7 2 0 3 5]
  [9 4 4 6 4]]]


In [18]:
print(f'x3 ndim: {x3.ndim}')
print(f'x3 shape: {x3.shape}')
print(f'x3 size: {x3.size}')

x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60


In [19]:
print("dtype:", x3.dtype)

dtype: int32


# Numpy Array Indexing and Slicing

To access a slice of an array x, use this:

``x[start:stop:step]``

The slice extends from the ‘start’ index and ends one item before the ‘stop’ index.

If any of these are unspecified, they default to the values start=0, stop=size of dimension, step=1.

In [20]:
# you need to understand the following 
print(x1[4])
print(x2[2, 0])
print(x2[2, :])
print(x1[::-1]) # all elements reversed

7
1
[1 6 7 7]
[9 7 3 3 0 5]


In [21]:
print(x3[2]) # second element of x3 which is a 4x5 array, same as x3[2,:]

[[4 9 8 1 1]
 [7 9 9 3 6]
 [7 2 0 3 5]
 [9 4 4 6 4]]


In [22]:
print(x3[2, 2, 4]) # second 4x5 array, row 3 and column 5, which gives you 5
print(x3[2, 1:4, 4]) # think what this is trying to do

5
[6 5 4]


# Numpy slices are views of the array - not copies!!

This default behavior of Numpy is very useful: when we handle large datasets, this allows us to access and process part of the datasset without making copies of the underlying data (could be slow and costly).

In [23]:
# slices of a list are copies of the list
# changing slices does not change the list
a = [1, 2, 3, 4, 5]
b = a[2:4]
print(b)
b[1] = 9
print(b)
print(a)

[3, 4]
[3, 9]
[1, 2, 3, 4, 5]


In [24]:
# slices of a Numpy array are views of the array
# changing the slices will change the original array!!!
c = np.array([1, 2, 3, 4, 5])
#d = c[2] # note c[2] is not a slice of c, if you want to have one element as a slice use c[2:3]
d = c[2:4]
print(d)
d[1] = 9
print(d)
print(c)

[3 4]
[3 9]
[1 2 3 9 5]


If you want to make a copy of the slice, you have to use the copy() method:

In [25]:
# c won't change
e = c[2:4].copy()
print(e)
e[1] = 8
print(e)
print(c)

[3 9]
[3 8]
[1 2 3 9 5]


# Other Useful Methods

- ``sort()``, ``argsort()``
- ``reshape()``
- ``concatenate()``
- ``split()``, ``vsplit()``: split arrays vertically or horizontally
- ``vstack()``, ``hstack()``: stacking arrays vertically or horizontally

In [26]:
# sorting 

a = np.array([2, 5, 4, 8, 3])
b = np.sort(a)
print(f'b is {b}')
print(f'a is not changed: {a}')
c = np.argsort(a) # argsort() returns the indexes of the sorted array
print(f'c is the indexes of the sorted a: {c}')
a.sort() # sorting in place, a is changed
print(f'a has been changed: {a}')

b is [2 3 4 5 8]
a is not changed: [2 5 4 8 3]
c is the indexes of the sorted a: [0 4 2 1 3]
a has been changed: [2 3 4 5 8]


In [27]:
# reshaping
a = np.arange(1, 10)
print(f'a is a simple one dimensional array: {a}')
grid = a.reshape((3, 3))
print(f'a becomes a 3x3 grid after reshaping:')
print(grid)

a is a simple one dimensional array: [1 2 3 4 5 6 7 8 9]
a becomes a 3x3 grid after reshaping:
[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [28]:
# Concatenation of arrays
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
print(np.concatenate([x, y]))

# + for numpy is sum, not concatenation
z = x + y
print(z)

# list is different, + is concatenation
a = [1, 2]
b = [3, 4]
print(a + b)

[1 2 3 3 2 1]
[4 4 4]
[1, 2, 3, 4]


## axis

axis 0 means row (default), axis 1 means column

In [29]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [30]:
# concatenate along the first axis (row), default
np.concatenate([grid, grid], axis=0)

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [31]:
# concatenate along the second axis (column) (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [32]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [33]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

In [34]:
# splitting arrays
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2 = np.split(x, [3])
print(x1, x2)

[1 2 3] [99 99  3  2  1]


In [17]:
# splitting arrays
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 6])
print(x1, x2, x3)

[1 2 3] [99 99  3] [2 1]


In [18]:
# split vertically
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [19]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


## Computation on Arrays

In [None]:
x = np.arange(5)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division

In [None]:
print("-x     = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2  = ", x % 2)

## NumPy functions

These arithmetic operations are convenient wrappers around specific functions built into NumPy. The following table lists the arithmetic operators implemented in NumPy:

- ``+``:	``np.add``	Addition 
- ``-``:	``np.subtract``	Subtraction 
- ``-``:	``np.negative``	Unary negation 
- ``*``:	``np.multiply``	Multiplication 
- ``/``:	``np.divide``	Division
- ``//``:	``np.floor_divide``	Floor division 
- ``**``:	``np.power``	Exponentiation 
- ``%``:	``np.mod``	Modulus/remainder 

In [36]:
np.add(x,2)

array([  3,   4,   5, 101, 101,   5,   4,   3])

In [None]:
# Absolute value
x = np.array([-2, -1, 0, 1, 2])
abs(x)

In [None]:
np.absolute(x)

In [None]:
np.abs(x)

In [None]:
# Exponents  
x = [1, 2, 3, 4, 5]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))

In [None]:
# Logorithms
x = [1, 2, 4, 10, 100]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

## Aggregation functions

Aggregates available in NumPy can be extremely useful for summarizing a set of values. 

As a simple example, let's consider the heights (cm) of US presidents. 

In [4]:
heights_cm = [189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173,
              174, 183, 183, 168, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183,
              177, 185, 188, 188, 182, 185, 190, 183]
heights = np.array(heights_cm)
print(heights)

[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
 174 183 183 168 170 178 182 180 183 178 182 188 175 179 183 193 182 183
 177 185 188 188 182 185 190 183]


In [5]:
print("Mean height:       ", heights.mean())
print("Standard deviation:", heights.std())
print("Minimum height:    ", heights.min())
print("Maximum height:    ", heights.max())

Mean height:        180.04545454545453
Standard deviation: 6.957515705579717
Minimum height:     163
Maximum height:     193


In [None]:
print("25th percentile:   ", np.percentile(heights, 25))
print("Median:            ", np.median(heights))
print("75th percentile:   ", np.percentile(heights, 75))