# A Minimal Introduction to Python for ML

These notes are intended as an introduction to the minimal Python skills recommended for its use in machine learning.

This is not an attempt to provide a comprehensive introduction to Python. A more comprehensive introduction can be found at [The Python Tutorial](https://docs.python.org/3/tutorial/index.html) by the The Python Software Foundation. Most relevant to the content below are the following sections:

> 3) An Informal Introduction to Python
>
> 4) More Control Flow Tools
>
> 5) Data Structures (5.1, 5.5-6)
>
> 6) Modules (beginning and 6.4)
>
> 7) Input and Output (7.2)
>
> 9) Classes (9.1, 9.3, 9.9)
>
> 10) Brief Tour of the Standard Library (10.6)
>
> 12) Virtual Environments and Packages

In addition to base Python, we have some additional content on some very powerful libraries that help with machine learning:

* `NumPy` for math and multidimensional arrays
* `matplotlib` for plotting

We do not cover Python installation: this is pretty straightforward with [Anaconda](https://www.anaconda.com) or unnecessary with [Google Colab](http://colab.research.google.com). There are many other options; for example, PyCharm, Visual Studio Code, Eclipse + PyDev, etc.

The notes below are ideal for readers who have experience with some programming language; for example, C++, MATLAB, Java, etc.

## Variables

In [None]:
x1 = 'string'
print('The type of x1 is', type(x1))

x2 = 5
print('The type of x2 is', type(x2))

x3 = 5.5
print('The type of x3 is', type(x2))

x4 = int(x3)
print('x4 is', x4)
print('The type of x4 is', type(x4))

x5 = str(x2)
print(x5)
print('The type of x5 is', type(x5))

x6 = [3, 4, 1]
print('The type of x6 is', type(x6))

x7 = (3, 4, 1)
print('The type of x7 is', type(x7))

The type of x1 is <class 'str'>
The type of x2 is <class 'int'>
The type of x3 is <class 'int'>
x4 is 5
The type of x4 is <class 'int'>
5
The type of x5 is <class 'str'>
The type of x6 is <class 'list'>
The type of x7 is <class 'tuple'>


## Lists

Lists are 

In [None]:
my_list = [1, 2, 3]

# create a list made up of 4 copies of the original list
three_lists = 4 * my_list

# print the new list
print(three_lists)

# append a 4 into the list
my_list.append(4)

# print the new list
print(my_list)

# append a 5 into the list
my_list.append(5)

# print the new list
print(my_list)

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3, 4, 5]


## Flow Control

### `if` Statements

In [None]:
my_string = 'yellow'

if my_string[0] == 'h':
    print('my_string starts with the letter h')
    
else:
    print('my_string does not start with the letter h')
    print('The first letter of my_string is', my_string[0])

my_string does not start with the letter h
The first letter of my_string is y


### Loops

In [None]:
for x in range(5):
    print(x)

0
1
2
3
4


In [None]:
my_list = [6, 2, 8]

# print half of each number in my_list
for x in my_list:
    print(x // 2)

# create a list consisting of half of each number in my_list
print([x // 2 for x in my_list])

3
1
4
[3, 1, 4]


### Exercises

1. Write a loop that extracts the first $n$ digits of $\pi$ as integers.

2. Write a loop to estimate $e^x$ as the sum of the first $n$ terms of the Taylor series expansion $e^x\approx\sum\limits_{i=0}^n \frac{x^i}{i!}$

## Functions

Functions are blocks of reusable code that can run whenever they are called. They may or may not have input parameters and return some outputs.

### `print` function

The `print` function in Python is one of the simplest built-in functions. It takes input parameters and displays them, separated by spaces by default.

To run it, we type `print(inputs)` where `inputs` is the thing or things we want to print. Some examples:

In [None]:
# print a string <- when a line starts with #, the text is treated as a comment.
#                   It is best practice to use comments to describe your code.
print('Hello, World!')

# print a number
print(4)

# print a list
print([4, 2, 5])

# print a variable
a = 4
print(a)

# print a sum of two variables
b = 3
print(a + b)

# print a mixture of numbers, their sum, and strings
print('The sum of', a, 'and', b, 'is', a + b)

Hello, World!
4
[4, 2, 5]
4
7
The sum of 4 and 3 is 7


### Custom Functions

Python has many useful built-in functions, but we will frequently want to write our own custom functions--usually whenever we have some code we want to reuse.

Suppose we want to make a function called `add` that adds two inputs `a` and `b` and returns the result.

To create a function, we write `def` and then the function's name `add`, followed by parenthesis with the two input parameters it requires, and a colon `:`. All lines below this, as long as they are indented one level, will be run:

In [None]:
def add(a, b):
    # this code runs when the function is called
    print('Adding', a, 'and', b, '...')
    return a + b

# this code does NOT run when the function is called
print('hi')

hi


Note we have defined the function and run the code block where it was defined, but we did not call the function. We can call our `add` function with any two valid inputs for which addition is defined:

In [None]:
# add two numbers
sum = add(3, 4)

# print the result and skip a line
print(sum, '\n')

# add two strings
sum = add('Machine', 'Learning')

# print the result and skip a line
print(sum, '\n')

# add two lists (this will CONCATENATE the lists into one long list)
sum = add([1, 2, 3], [4, 5, 6])

# print the result
print(sum)

Adding 3 and 4 ...
7 

Adding Machine and Learning ...
MachineLearning 

Adding [1, 2, 3] and [4, 5, 6] ...
[1, 2, 3, 4, 5, 6]


## Classes

Class bundle variables and functions together. Creating a new class creates a new *type* of object, allowing new *instances* of that type to be made.

For example, suppose we want to make a linear regression class that can read $n$ datapoints, each with $d$ dimensions, as an $n\times d$ matrix and fit a linear regression model to the points by the method of ordinary least squares.

In [None]:
# import the numpy library
import numpy as np

# create a class
class LinearRegression:

    def __init__(self, X, y):
        # append a column of ones to X and save as a class variable
        self.data = np.hstack((np.ones([X.shape[0], 1]), X))
        
        # save the training labels as a class variable
        self.outputs = y
        
    # fit the model to the (training) data
    def fit(self, X, y):
        # get the data and outputs
        X = self.data
        y = self.outputs

        # compute optimal values for theta and save as a class variable
        self.theta = np.linalg.inv(X.T @ X) @ X.T @ y
                
    # predict the output from input (testing) data
    def predict(self, X):
        
        # append a column of ones at the beginning of X
        X = np.hstack((np.ones([X.shape[0],1]), X))
        
        # return the outputs
        return X @ self.theta

## Importing and Using Libraries

### NumPy Basics

In [None]:
import numpy as np

### pandas Basics

In [None]:
import pandas as pd

### scikit-learn Basics

In [None]:
import sklearn