In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Introduction to Programming in Python

In addition to a multitude of algorithms you are about to learn, in this course we introduce two useful tools that are used extensively in both industry and academia: Python programming language and Jupyter notebook. This tutorial is meant to get you up to speed with the language and environment we will use during this course. Learning Python is simple once you know object oriented programming, and it is relatively simple to run, install and debug. 

## Python

Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms. Some benefits of python are:
* Interpreted language: the language is processed by the interpreter at runtime, so you don’t have to compile the program before execution.
* Interactive: you can directly interact with the interpreter at the Python prompt for writing your program.
* Versatile: supports the development of applications ranging from games to browsers to text processing.


## Jupyter Notebook 

Jupyter notebook is a powerful tool for interactively developing and presenting data science projects. A notebook integrates code and its output into a single document that combines visualisations, narrative text, mathematical equations, and other rich media. The intuitive workflow promotes iterative and rapid development, making notebooks an increasingly popular choice at the heart of contemporary data science, analysis, and increasingly science at large. There are two fairly prominent terms that you should notice: **cells** and **kernels**, which are key both to understanding Jupyter and to what makes it more than just a word processor. 

A **kernel** is a "computational engine" that executes the code contained in a notebook document.

A **cell** is a container for text to be displayed in the notebook or code to be executed by the notebook's kernel.

There are two types of cells: a *code* cell contains code to be executed in the kernel and displays its output below.
A *Markdown* cell contains text formatted using Markdown and displays its output in-place when it is run.

The first cell in a new notebook is always a code cell. Let's test it out with a classic hello world example. Type `print('Hello World!')` into the cell and press `Ctrl + Enter`.

In [None]:
print('Hello World!')

## Basic Data Types
Much like Java, python has a number of basic types including integers, floats, Booleans and strings. Most behave as you expect.

### Numbers
For more examples and information, use the official [documentation](https://docs.python.org/3.5/library/stdtypes.html#numeric-types-int-float-complex).

In [None]:
x = 3

In [None]:
# The type function works on every possible variable
print(type(x))

In [None]:
print(x)

In [None]:
print(x + 1) # addition

In [None]:
print(x - 1) # subtraction

In [None]:
print(x * 2) # multiplication

In [None]:
print(x ** 2) # exponentiation

In [None]:
x += 1 # same as x = x + 1
print(x)

In [None]:
x *= 2 # same as x = x * 2
print(x)

In [None]:
y = 2.5
print(type(y))
print(y, y + 1, y * 2, y ** 2) # Prints "2.5 3.5 5.0 6.25"

### Print the result of $\frac{x^x - 4\cdot y}{x - y}$

In [None]:
## Your code here ##

### Booleans

Booleans in python can be written as keywords (`True` , `False`) or as binary numbers (`0`, `1`). Basic Boolean operators are English words rather than symbols.

In [None]:
t = True
f = False

In [None]:
print(t == 1)
print(f == 0) 
print(f == 1)

In [None]:
print(type(t)) # Prints "<class 'bool'>"
print(t and f) # Logical AND; prints "False"
print(t or f)  # Logical OR; prints "True"
print(not t)   # Logical NOT; prints "False"
print(t != f)  # Logical XOR; prints "True"

## Strings

Strings are declared using single or double quotes (but not both):

In [None]:
str1 = 'hello'
str2 = "hello"
str1 == str2

This way, it is easy to create quotes within any string. Note that if the last line in a cell is a variable, that variable is printed so we don't need to use a `print` statement.

In [None]:
str1 = "Y'all"
str1

In [1]:
str2 = "Y'all"
str2

SyntaxError: unterminated string literal (detected at line 1) (3159203118.py, line 1)

Python strings can be manipulated in many ways. Use the [documentation](https://docs.python.org/3.5/library/stdtypes.html#string-methods) for more useful methods.

In [None]:
s1 = 'hello'
s2 = 'world'
s3 = '  world '
print(len(s1))
print(s1 + ' ' + s2)
print(s3)

### Find methods in the documentation for the following:

1. Capitalize the first letter in s1
2. Convert a string to uppercase (use s1)
3. Replace all instances of one substring with another (Replace the letter 'l' with '(ell)' in s1)
4. Remove leading and trailing whitespaces from s3.

A single line should suffice for each part.

In [None]:
## Your code here ##

## Containers
Python includes several built-in container types: lists, dictionaries, sets, and tuples.

### [Lists](https://docs.python.org/3.5/tutorial/datastructures.html#more-on-lists)
A list is the Python equivalent of an array, but is resizable and can contain elements of different types:

In [None]:
xs = [3, 1, 2]    # Create a list

In [None]:
xs = [3, 1, 2]    # Create a list
print(xs, xs[2])  # Prints "[3, 1, 2] 2"
print(xs[-1])     # Negative indices count from the end of the list; prints "2"
xs[2] = 'foo'     # Lists can contain elements of different types
print(xs)         # Prints "[3, 1, 'foo']"
xs.append('bar')  # Add a new element to the end of the list
print(xs)         # Prints "[3, 1, 'foo', 'bar']"
x = xs.pop()      # Remove and return the last element of the list
print(x, xs)      # Prints "bar [3, 1, 'foo']"

**Slicing**: In addition to accessing list elements one at a time, Python provides concise syntax to access sublists. This is known as slicing.

In [None]:
nums = list(range(5))     # range is a built-in function that creates a list of integers
print(nums)               # Prints "[0, 1, 2, 3, 4]"
print(nums[2:4])          # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print(nums[2:])           # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print(nums[:2])           # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print(nums[:])            # Get a slice of the whole list; prints "[0, 1, 2, 3, 4]"
print(nums[:-1])          # Slice indices can be negative; prints "[0, 1, 2, 3]"
print(nums[::-1])         # Print the list backwards
nums[2:4] = [8, 9]        # Assign a new sublist to a slice
print(nums)               # Prints "[0, 1, 8, 9, 4]"

**Loops**: You can easily loop over the elements of a list.

In [None]:
animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print(animal)

If you want access to the index of each element within the body of a loop, use the built-in `enumerate` function.

In [None]:
animals = ['cat', 'dog', 'monkey']
for i, animal in enumerate(animals):
#     print('#%d: %s' % (i + 1, animal))
    
    print('Number {}: {}'.format(i, animal))

**List comprehensions**: When programming, frequently we want to transform one type of data into another. As a simple example, consider the following code that computes square numbers:

In [None]:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
print(squares)  

You can make this code simpler using a list comprehension.

In [None]:
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print(squares)

List comprehensions can also contain conditions.

In [None]:
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print(even_squares)

### [Dictionaries](https://docs.python.org/3.5/library/stdtypes.html#dict)
A dictionary stores (key, value) pairs, similar to a Map in Java.

In [None]:
d = {'cat': 'cute', 'dog': 'furry'}

In [None]:
d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print(d['cat'])       # Get an entry from a dictionary; prints "cute"
print('cat' in d)     # Check if a dictionary has a given key; prints "True"
d['fish'] = 'wet'     # Set an entry in a dictionary
print(d['fish'])      # Prints "wet"
# print(d['monkey'])  # KeyError: 'monkey' not a key of d
print(d.get('monkey', 'N/A'))  # Get an element with a default; prints "N/A"
print(d.get('fish', 'N/A'))    # Get an element with a default; prints "wet"
del d['fish']         # Remove an element from a dictionary
print(d.get('fish', 'N/A')) # "fish" is no longer a key; prints "N/A"

**Loops**: It is easy to iterate over the keys in a dictionary.

In [None]:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
    print(animal)
#     legs = d[animal]
#     print('A %s has %d legs' % (animal, legs))

If you want access to keys and their corresponding values, use the `items` method.

In [None]:
d = {'person': 2, 'cat': 4, 'spider': 8}
for items in d.items():
    print(items)
#     print('A %s has %d legs' % (animal, legs))

**Dictionary comprehensions**: These are similar to list comprehensions, but allow you to easily construct dictionaries.

In [None]:
nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print(even_num_to_square)  # Prints "{0: 0, 2: 4, 4: 16}"

### [Sets](https://docs.python.org/3.5/library/stdtypes.html#set)
A set is an unordered collection of distinct elements. As a simple example, consider the following:



In [None]:
animals = {'cat', 'dog', 'cat', 'dog'}
print('cat' in animals)   # Check if an element is in a set; prints "True"
print('fish' in animals)  # prints "False"
animals.add('fish')       # Add an element to a set
print('fish' in animals)  # Prints "True"
print(len(animals))       # Number of elements in a set; prints "3"
animals.add('cat')        # Adding an element that is already in the set does nothing
print(len(animals))       # Prints "3"
animals.remove('cat')     # Remove an element from a set
print(len(animals))       # Prints "2"

### [Tuples](https://docs.python.org/3.5/tutorial/datastructures.html#tuples-and-sequences)
A tuple is an (immutable) ordered list of values. A tuple is in many ways similar to a list

In [None]:
lst = [1, 2, 3]
print(lst)
lst[0] = 11
print(lst)

In [None]:
tup = (1, 2, 3)
print(tup)
tup[0] = 11

One of the most important differences between sets and lists is that tuples can be used as keys in dictionaries and as elements of sets, while lists cannot. Here is a trivial example:

In [None]:
d = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys
t = (5, 6)        # Create a tuple
print(type(t))    # Prints "<class 'tuple'>"
print(d[t])       # Prints "5"
print(d[(1, 2)])  # Prints "1"

## Functions
Python functions are defined using the `def` keyword. Notice that the indentation in python is **mandatory**.

In [None]:
def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print(sign(x))

We will often define functions to take optional keyword arguments.

In [None]:
def hello(name, caps=True):
    if caps: # caps == true
        print('HELLO, {}!'.format(name.upper()))
    else:
        print('Hello, {}'.format(name))

In [None]:
hello('Bob', caps=False)

In [None]:
hello('Bob', caps=True) # Prints "Hello, Bob"

### Write a function that calculates the first n numbers of a Fibonacci sequence as an array and prints it.

In [None]:
def fib(n):
    pass

fib(10) # should print [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

## Classes
The syntax for defining classes in Python is straightforward:

In [None]:
class Greeter():
    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Instance method
    def greet(self, loud=False):
        if loud:
            print('HELLO, %s!' % self.name.upper())
        else:
            print('Hello, %s' % self.name)

# g = Greeter('Fred')  # Construct an instance of the Greeter class
# g.greet()            # Call an instance method; prints "Hello, Fred"
# g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"

In [None]:
g = Greeter('Fred')

In [None]:
g.name

In [None]:
g.greet(loud=True)

# Numpy
numpy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Adding the functionality of numpy to our current session is done by the `import` built-in function.

In [None]:
import numpy as np

Note that above we imported the numpy package "as np". This is for convenience; it allow us to use np as a prefix instead of numpy. numpy is in very widespread use, and the convention is to use the np abbreviation.


### Numpy Arrays

The core functionality of numpy is its "ndarray", for n-dimensional array, data structure. Just like with type conversions with lists, tuples, and other data types we've looked at, we can convert a list to a NumPy array using `np.array()`. A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

In [None]:
# Create a NumPy array from a list
arr = np.array([1, 2, 3, 4])
print(arr)
print(type(arr))
print(arr.shape)
print("------------------")
mat = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(mat)
print(type(mat))
print(mat.shape)

Accessing values in a ndarray is done by using square brackets.

In [None]:
arr = np.array([1, 2, 3, 4])

In [None]:
print(arr[0])
arr[0] = 10
print(arr)

In [None]:
print(mat[1])
print(mat[1,1])
print(mat[1][1])

Numpy arrays support a wide range of methods and operations. 

In [None]:
print(arr)

In [None]:
print(arr.max())
print(arr.min())
print(arr.sum())
print(arr.mean())
print(arr.std())

There are other ways to make ndarrays:

In [None]:
n = 10
np.zeros(n)

In [None]:
np.ones(n)

In [None]:
np.zeros_like(arr) # create a ndarray with the same shape as arr

In [None]:
np.eye(4) # creates a unit matrix

In [None]:
np.random.random(size=(5,5)) # creates an array filled with random values

### Consider the vector [1, 2, 3, 4, 5], how to build a new vector with 2 consecutive zeros between each value? This can be done in 2-3 lines without loops.

In [None]:
arr = np.array([1,2,3,4,5]) # result should be [1. 0. 0. 2. 0. 0. 3. 0. 0. 4. 0. 0. 5.]

### Broadcasting
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:

In [None]:
x = np.eye(4)

In [None]:
x

In [None]:
x + 55

In [None]:
np.eye(4)

In [None]:
mat = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [None]:
mat = np.vstack((mat, [10,11,12]))
print(mat)
vec = np.array([-1,0,1])
print(vec)

In [None]:
result = np.zeros_like(mat)

for row in range(4):
    result[row] = mat[row] + vec
    
print(result)

While this works, explicit loops in python are often slow and should be generally avoided. However, notice that adding `vec` to each row is the same as creating a matrix by stacking 4 copies of `vec` vertically.

In [None]:
mat_vec = np.tile(vec, (4,1))
print(mat_vec)

In [None]:
result = mat + mat_vec
print(result)

Broadcasting allows us to perform this type of computation without making any copies.

In [None]:
mat + vec

In [None]:
vec.shape

This line works even though `mat.shape = (4,3)` and `vec.shape = (3,)` due to broadcasting.
Broadcasting two arrays together follows these rules:

1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension

The [documentation](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) contains additional examples and explanations. 

Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the documentation.

### Subtract the mean of each row of a matrix. This can be done in a single line.

In [None]:
X = np.random.rand(5, 10)

Y = None

## Numpy arrays vs python arrays

You should always use numpy implementations whenever possible. numpy saves arrays in a homogeneous and contiguous block of memory, unlike regular Python arrays which are scattered across the system memory. Spatial locality in memory access results in performance gains notably due to the CPU cache and allows numpy to take advantage of vectorized instructions of modern CPUs. In addition, a large part of numpy is written in C, thus the performance boost when using numpy will be significant and well worth your while. For example, run the following blocks of code: 

In [None]:
arr1 = np.random.choice(10, size=10_000_000)
arr2 = np.random.choice(10, size=10_000_000)

In [None]:
%%time
naive_dot = 0
for i in range(10_000_000):
    naive_dot += arr1[i] * arr2[i]

In [None]:
%%time
numpy_dot = arr1.dot(arr2)

In [None]:
numpy_dot==naive_dot

## Matplotlib
Matplotlib is a plotting library. In this section give a brief introduction to the `matplotlib.pyplot` module, which provides a plotting system similar to that of MATLAB.

## [Plots](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot)
The most important function in matplotlib is plot, which allows you to plot 2D data.

In [None]:
import matplotlib.pyplot as plt

In [None]:
x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)

# Plot the points using matplotlib
plt.plot(x, y)
plt.show()  # You must call plt.show() to make graphics appear.

Adding multiple lines, legend and axis labels is done with methods

In [None]:
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)

# Plot the points using matplotlib
plt.plot(x, y_sin)
plt.plot(x, y_cos)
plt.xlabel('x axis label')
plt.ylabel('y axis label')
plt.title('Sine and Cosine')
plt.legend(['Sine', 'Cosine'])
plt.show()

### [Subplots](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.subplot)

In [None]:
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)

# Set up a subplot grid that has height 2 and width 1,
# and set the first such subplot as active.
plt.subplot(2, 1, 1)

# Make the first plot
plt.plot(x, y_sin)
plt.title('Sine')

# Set the second subplot as active, and make the second plot.
plt.subplot(2, 1, 2)
plt.plot(x, y_cos)
plt.title('Cosine')

# Show the figure.
plt.show()

# Jupyter Notebook Tricks

`esc` or tapping the space on the left of the cell will take you into command mode where you can navigate the notebook using the arrow keys and use additional commands:
* `A` to insert a new cell above the current cell, `B` to insert a new cell below.
* `M` to change the current cell to Markdown mode, `Y` to change it back to code.
* Press `D` twice to delete the current cell.

Once in a cell, `Shift + Tab` will show you the Docstring (documentation) for the the object you have just typed in a code cell. you can keep pressing this short cut to cycle through a few modes of documentation.

`Shift + M` will merge multiple cells.

Adding a semicolon `;` at the end of a line will suppress the output (useful when plotting).

## Magic Commands

`%%time` will give you information about a single run of the code in your cell.
`%%timeit` uses the Python timeit module which runs a statement 100,000 times (by default) and then provides the mean of the fastest three times.