# P-ai AI/ML Workshop: Session 1

Welcome to P-ai's first session of the AI/ML workshop series! Today we'll learn about
- What *is* machine learning? How is it different from "AI" or "Deep Learning"?
- An overview of a typical machine learning problem and solution
- The coding environment
- Python crash course
<img src="https://images.squarespace-cdn.com/content/5d5aca05ce74150001a5af3e/1580018583262-NKE94RECI46GRULKS152/Screen+Shot+2019-12-05+at+11.18.53+AM.png?content-type=image%2Fpng" width="200px">

## 1. AI, GI, ML, DL– what do they mean?

- AI (Artificial Intelligence): Something artificially created that appears to behave intelligently
- GI (General Intelligence): Refers to an AI that can learn anything*, like a human
- ML (Machine Learning) and DL (Deep Learning)... we'll see!

\* *Well, not anything. Humans aren't very good at reading QR codes, for example*

According to the Oxford Dictionary, machine learning is:

>"The use and development of computer systems that are able to **learn** and adapt **without following explicit instructions**, by using **algorithms and statistical models** to analyze and **draw inferences from patterns in data**."

A machine learning *algorithm* is an algorithm (designed by a human) that can discover patterns within data *on its own*. Let's put ourselves in the shoes of a machine learning model for a moment:

| x  | y  |
-----|-----
| 2  | 5  |
| 21 | 43 |
| 36 | 73 |
| 7  | 15 |
| 16 | ?  |

Even though you weren't told what the relationship is between $x$ and $y$, you were able to analyze the data and find a mapping $f$ from $x$ to $y$, which allowed you to predict $f(16)$. For certain tasks, this comes somewhat naturally to humans, but computers need to be programmed to know *how* to look for patterns.  
Let's briefly cover arguably the simplest algorithm that falls under the umbrella of machine learning: linear regression- or, as you might be more familiar with it, line of best fit.

### Linear Regression

<img src="https://imgs.xkcd.com/comics/linear_regression.png" width="400px">

*Quick side note: an **algorithm** is a set of instructions for doing something, like a baking recipe or a flow chart. We humans use algorithms sometimes (e.g. to calculate our GPA, or a strict morning routine), but most of the time we just "intuit" things. For example, noticing that a friend looks sad or seeing a red light. Computers can only follow algorithms, so finding algorithms to solve "wishy-washy" problems that humans are so great at is at the heart of machine learning.*

**Linear regression** tries to find a **linear relationship** between X and y data. If we go back to our previous example of $x$ and $y$, this means that the model will try to find a line $\hat{y} = mx + b$ such that $\hat{y} \approx y$, i.e. the model **fits** the data best.

$$ \hat{y} = mx + b $$

Some questions this raises:

> Which **parameters** does the model need to **learn**?

In this case, the slope $m$ and y-intercept $b$.

> How will the model find those parameters?

We'll answer this in the a later session!

> What does it mean for the model to **fit the data best**?

We'll hold off on answering this question too, but it's worth thinking about. First, let's consider which parameters the model is trying to find in this case.

Here's our training data again:

| x  | y  |
-----|-----
| 2  | 5  |
| 21 | 43 |
| 36 | 73 |
| 7  | 15 |

> What values of $m$ and $b$ is the model trying to find?

$m = 2$ and $b = 1$. This would make our model $\hat{y} = 2x + 1$, which would fit our data perfectly. Then, using this model, we can make predictions, or **inferences**:

$$ f(16) = 2(16) + 1 = 33 $$

We glossed over **how** the model found those values, but we fit a linear regression model to our data, and used it to make an inference. The key takeaways are this:

- Machine learning models learn relationships in data.
- As the ML engineer, you choose the right model for your data.
    - In this case, there was a perfectly linear relationship betwen $x$ and $y$, so linear regression is a good choice. If it were exponential, say, the linear regression model would fit the data poorly and make poor inferences.
- The ML model has **trainable parameters**; that is, parameters that the model **learns on its own from the data**.
    - In this case, there were two trainable parameters: $m$ and $b$.
- The ML model follows an algorithm to find the parameters that **best fit** the data it's trained on.
    - We'll learn exactly what that means in a later session!

<img src="https://www.musictech.net/wp-content/uploads/2020/05/Radius-2-.jpg" width="500px">
<div style="width: 100%; text-align: center; font-style: italic;">Think of trainable parameters like knobs on a machine- some combination of values will be optimal, and the model follows an algorithm to find them</div>

Finally, to clear up two similar-seeming terms:
- A machine learning *algorithm* is an algorithm for learning patterns from data
    - e.g. Linear Regression
- A machine learning *model* is trained on data *using* an ML algorithm
    - e.g. Our Linear Regression model $\hat{y} = 2x + 1$
    
### The start of your ML demistification journey!

<img src="https://miro.medium.com/max/1050/1*-bhLYOP-EahyDXPJvCPrhg.jpeg" width="400px">

### So... what about Deep Learning?

Deep Learning is a subset of Machine Learning, and is nearly synonymous with Neural Networks. We'll cover them in more detail in a later session, but suffice to say it's just another class of machine learning algorithms that have become very popular in recent years due to their ability to learn very complex patterns with (lots of) data. That being said, they're not sentient robots- like most machine learning algorithms, they're just very good function approximators.

<img src="https://assets.website-files.com/5fb24a974499e90dae242d98/5fb24a974499e96f7b2431db_AI%20venn%20diagram.png" width="500px">

## 2. A typical machine learning problem and solution

Now's probably a good time to talk about which problems could use machine learning and which don't. Based on what you've learned so far, try to classify the following problems as `ML` or `not ML` problems.

I want a computer to...
1. calculate BMI from patients' weight and height
2. predict forest fires from locational data (e.g. humidity, precipitation, season, etc.)
3. translate between English and Spanish
4. translate between English and Morse Code
5. identify the genre of songs directly from the audio
6. play tic-tac-toe
7. estimate the price of a house given a picture of it

```
[!!!] ANSWERS BELOW [!!!]

1. Not ML; BMI = mass / height^2
2. ML; no precise algorithm to determine this but information in data might be able to predict forest fires
3. ML; word-level translation is algorithmic, but not sentence-level translation
4. Not ML; each letter can be directly converted to Morse and vice versa
5. ML; complex relationships in audio features might suggest a genre, but there is no definite algorithm
6. Bit of a trick question- not (necessarily) ML; there is a known optimal strategy for X and O
7. ML; image features might suggest a price range but will likely require ML to extract
```

Note that it's not necessarily clear-cut in every situation. For example, if you want to find an optimal path from point A to point B (think Google Maps), there is an algorithm called **Dijkstra's Algorithm** that finds the least expensive path between two points given "roads" their respective "costs". That being said, in real life there are often dozens of other variables, like the time of day, day of the week, amount of traffic, weather, and construction. In these cases, an algorithm might be supplemented with ML algorithms.

<img src="https://imgs.xkcd.com/comics/self_driving.png" width="400px">

Okay, back to the original topic: a typical machine learning problem and solution. Here goes:

**HIGH LEVEL**
1. Have a problem
2. Acquire data
3. Train a machine learning model on said data
4. Evaluate model and iterate on solution

**MEDIUM LEVEL**
1. Have a problem
    1. Determine whether the problem even needs machine learning
    2. If it's not a new problem, read up on what other people have tried
2. Acquire data
    1. Get as much data as you can pertaining to your problem
    2. Explore and visualize your data to become most familiar with what your model will try to learn
    3. Clean data (remove bad or useless data)
    4. Process / feature engineer data (make as "learnable" as possible)
3. Train a machine learning model on said data
    1. Choose a domain of ML to try (e.g. classification, regression, reinforcement learning)
    2. Choose a machine learning algorithm to use (or design your own!)
    3. Train baseline model on data
4. Evaluate model and iterate on solution
    1. Check metrics e.g. accuracy, precision & recall, sensitivity & specificity
    2. Tune hyperparameters
    3. Improve data quality
    4. Try other models / change model architecture
    
There are many details missing from this schematic, but it gives a sense of how these problems are typically approached. As we move forward, we'll refer back to this workflow!

## 3. The coding environment

<img src="https://external-preview.redd.it/x3_gxnFSyB9nhAfWu0k6OdiOXmRdyfsBnaODGM94NRw.jpg?auto=webp&s=f7d788cbb356acad0f118567c23fbf27d7867bd4" width="500px">

One of the strongest aspects of Python is its rich library support, especially when it comes to machine learning. Some of the most popular libraries for ML include:

- `numpy`: pretty much the go-to library for anything linear algebra related (i.e. vectors, matrices, etc.)
- `scikit-learn`: a great library for non-deep learning machine algorithms like decision trees, logicistic regression, etc.
- `tensorflow`: a very powerful library developed by Google for deep learning
- `keras`: an API built on top of `tensorflow` that provides more user-friendly ways to build deep learning models

When using Python, you first need to install these packages on your computer. There are a few ways you can do this- you can install them directly using `pip` (comes installed with Python), or a package manager like `conda`. We'll use `pip` because it's simpler and requires less overhead.

In [None]:
# The exclamation point is just to run shell commands in Jupyter Notebook
# If running from your terminal, you don't need the !
!pip install numpy

Once a dependency is installed, you can import it directly into Python:

In [None]:
# Python lets you rename packages when you import them
# Data scientists will make fun of you if you do not rename numpy to np
import numpy as np

np.zeros((3, 4), dtype=int)

We can really leverage the power of Python libraries by using the Keras API to build a deep learning model with just a few lines of code- don't worry about understanding it yet, it's just to show how powerful Keras and Tensorflow are (but you will by the end of the workshops!)

In [None]:
!pip install tensorflow

In [None]:
# Machine learning engineers will make fun of you if you do not rename tensorflow to tf
from tensorflow.keras import models, layers

In [None]:
# Define model architecture
model = models.Sequential()
model.add(layers.InputLayer(input_shape=(10,)))
model.add(layers.Dense(64))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(32))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(1, activation="sigmoid"))
# Print model summary
model.summary()

In [None]:
# Compile model
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])

And that's it!

You can install all these libraries directly onto your computer, but you might find that becomes a bit messy after a while. For future reference, you can create a **virtual environment** to keep all your dependencies for one project in the same place. It's not necessary, but I personally found it very convenient, even if I was saving the same dependencies to multiple locations. To learn more, check out the documentation for `venv` here: https://docs.python.org/3/tutorial/venv.html

Now, for some FAQs about the Python development environment...

> Which IDE (editor) should I use?

If you're looking for a no-nonsense IDE for programming in general, VSCode is a great place to start. Other common options include Atom, PyCharm, and SublimeText.

> I downloaded Python 3, but my computer is still running my Python files with Python 2 and causing errors. What's the deal?

When you run a command like `python script.py`, your computer probably defaults to Python 2.x. You can explicitly use Python 3.x by running `python3 scripy.py` instead. Same goes for `pip`- you may need to explicitly use `pip3` instead of regular `pip` if you're not in a Python 3.x environment.

> I'm getting an error when installing packages with pip / `import`ing libraries in Python. Why could that be?

This is usually a mismatch between the version of Python you're using and the supported versions of the package you're installing. For example, the newest versions of most libraries don't support Python 2.x. To install a specific version of a library, you can run `pip install [library name]==[version]`.

> What's up with the different import statements?

I'll show you:

In [None]:
# Import the entire library
import numpy as np

# To use the function array(), we need to preface it with np.
np.array([[1, 2], [3, 4]])

In [None]:
# This doesn't work since ones() isn't defined
array([[1, 2], [3, 4]])

In [None]:
# Import a specific function or functions
from numpy import array, dot

# Since we imported array() and dot() into the namespace, we can call them directly
arr1 = array([[1, 2], [3, 4]])
arr2 = array([[5, 6], [7, 8]])
dot(arr1, arr2)

In [None]:
# Import everything from a library
from numpy import *

# Wildcard imports can be confusing for this reason, since functions from imports look like user-defined functions
ones(5)

Side note: you can save all your dependencies to a file by running this command:

`pip freeze > requirements.txt`

This saves all your `pip` dependencies to a file called `requirements.txt`. Then, other people can install all your dependencies *and their exact versions* with:

`pip install -r requirements.txt`

Nice!

## 4. Python crash course


Code speaks louder than words:

### Variables, numbers, and strings

In [None]:
# x is a variable. I set its value to the number 5
x = 12
print(x)

In [None]:
y = 5
# Addition
z = x + y
print(z)

In [None]:
# Subtraction
print(x - y)
# Multiplication
print(x * y)
# Division
print(x / y)
# Integer division
print(x // y)
# Modulus operator (remainder function)
print(x % y)
# Exponentiation
print(x ** y)

In [None]:
# You can also change a variable's value in-place by using +=, -=, /=, etc.
x = 10
print(x)
# Equivalent to x = x + 1
x += 1
print(x)

In [None]:
# Strings are strings of characters, aka text
# They are denoted with double quotes ("") or single quotes ('')
string = "I am a string"
print(string)

In [None]:
# There are plenty of operations you can do on strings, too
# For example, this is concatenation
string1 = "Machine"
string2 = "Learning"
print(string1 + " " + string2)

### Lists and indexing

In [None]:
# If you want to store a list of things, you can use... a list!
# They're denoted with square brackets []

list1 = [1, 2, 3]
list2 = ["one", "two", "three"]

# List concatenation
print(list1 + list2)

In [None]:
# To get the length of a list, use len()
lst = ["there", "are", "four", "items"]
len(lst)

In [None]:
# You can also start with a list and add items with append()

# Empty array
lst = []

lst.append(1)
lst.append(2)
lst.append(3)
print(lst)

There are other list functions that come in handy. For example, find the first index of a value in a list with `.index()`, or remove the first instance of a value with `.remove()`. Check out https://www.w3schools.com/python/python_ref_list.asp to see a complete list.

In [None]:
# Items in lists can be accessed by their index
# Like most languages, Python is 0-indexed. That means that the index of the first item is 0, the second is 1, etc.

fibonacci = [1, 1, 2, 3, 5, 8, 13, 21]
# Indexes:   0  1  2  3  4  5  6   7

print(fibonacci[0])
print(fibonacci[2])
print(fibonacci[5])

In [None]:
# Python also lets you get the last items in a list by using negative numbers as indexes

# Last item
print(fibonacci[-1])
# Second to last item
print(fibonacci[-2])

In [None]:
# You can also "slice" parts of a list using a colon (:)
# The syntax is [a:b], where index a is included in the interval and index b is EXCLUDED

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
# Indexes: 0      1      2      3      4      5      6      7      8      9      10     11

# Month 2 up until but not including month 4
print(months[2:4])

In [None]:
# Month 5 up until the end
print(months[5:])

In [None]:
# Month 0 up until but not including month 8
print(months[:8])

In [None]:
# Month 3 up until but not including the last month
print(months[3:-1])

In [None]:
# You can also include a step size with another colon
# Every other month from month 1 up to but not including month 9
print(months[1:9:2])

### Booleans and control flow

In [None]:
# Boolean values can be either True or False

darkModeIsGood = True
machineLearningIsMagic = False

print(machineLearningIsMagic)

In [None]:
# You can work with booleans using boolean operators like not, and, and or

print(not True)
print(True or False)
print(False and True)

In [None]:
# You can use if statements to control the flow of logic

condition = True

if condition: 
    print("The condition was true (1)")
    
condition = False

if condition:
    print("The condition was true (2)")

Notice how in Python, a colon at the end of a conditional opens up a "code block", which is represented by an indentation. Everything in that indentation level will be run if the condition is true.

In [None]:
condition_1 = True
condition_2 = False

if condition_1:
    print(" Run")
    print(" everything")
    print(" in")
    print(" here")
    if not condition_2:
        print("  Nested conditional!")
        
print("This will run anyway")

We can get more creative with our conditionals. Any operation or function that returns a boolean (or something that can be interpreted as a boolean) can be used as a conditional. Some common operations include:
- `==`
    - Check if two values are equal. Note the difference; `=` is for assignment, `==` is for comparing and returning a boolean
- `!=`
    - Determine if two values are not equal to each other
- `<`, `>`, `<=`, `>=`
    - Less than, greater than, less than or equal to, greater than or equal to
- `in`
    - Determine if a value is in a collection of some kind

In [None]:
print(5 == 7)
print(5 == "5")
print(5 == 5)
print(5 != 7)
print(7 > 4)

In [None]:
# Second set of parentheses aren't necessary, just added for clarity
print(1 == 2 or (not 1 == 1))

In [None]:
print(5 in [1, 2, 3])
print(2 in [1, 2, 3])
print("lions" in ["lions", "tigers", "bears"])

In [None]:
x = 10

if x % 2 == 0:
    print(x, "is even!")

We can use `elif` (short for "else if") and `else` to get even more elaborate with our control flow:

In [None]:
classStartTime = 9.5

print("Probability of showing up on time:", end=" ")
if classStartTime < 9:
    print("No chance")
elif classStartTime < 10:
    print("Unlikely, but possible")
elif classStartTime < 11:
    print("I'll have to set my alarm, but I'll probably be there")
else:
    print("Perfect")
    
print("Computation complete")

Using `elif` and `else` ensures that the code logic only enters **one** conditional. For example, even though `classStartTime` is less than both 10 and 11, it only enters the first block and then exits.

To repeat the same code multiple times, we can use `while` and `for` loops.

In [None]:
x = 5
# Continues running code block while conditional is true
while x >= 0:
    print(x)
    x -= 1

Be careful to avoid infinite loops; for example, this code will run forever (until something stops it):

```
x = 5
while x >= 0:
    print(x)
```

`for` loops are one of the most useful things you can know in Python. This is how they usually look:

```
for <item> in <iterable>:
    <do stuff with item>
```

Where `iterable` is something that can be *iterated* over, like a list.

In [None]:
print(months)
for m in months:
    # On every iteration, Python sets the variable m to the next item in months
    print(m)

In [None]:
for m in months:
    print(f"Is the letter 'a' in the month {m}?", end=" ")
    if 'a' in m.lower():
        print("Yes!")
    else:
        print("No!")

In [None]:
# A common way to iterate through numbers is to use the range() function
# It returns an iterable that starts and ends at indexes you choose

# Default start index is 0
# Ranges from 0 to 4
for i in range(5):
    print(i)
    
print("")

# Ranges from 5 to 9
for i in range(5, 10):
    print(i)
    
print("")

# From 0 to 9 with step length 2
for i in range(0, 10, 2):
    print(i)

### Functions

We've already encountered a few functions:
- `print(...)`
- `string.lower()`
- `range(...)`
- `list.append(...)`
- `len(...)`

As well as a few of numpy functions:
- `numpy.zeros(...)`
- `numpy.ones(...)`
- `numpy.array(...)`
- `numpy.dot(...)`

Functions take in input(s), and return output(s). Here's how you define one:

In [None]:
def add(a, b):
    summation = a + b
    return summation

# `def` is the keyword to define a new function
# The function name comes next, then the parameters / arguments it takes and a colon to open up a code block
# The keyword `return` returns a value

print(add(3, 5))

In [None]:
# You can also return multiple values
def addAndSubtract(a, b):
    return a + b, a - b

# Returns a `tuple`
print(addAndSubtract(5, 10))

# "Unpacking" multiples values from a tuple
addResult, subtractResult = addAndSubtract(5, 10)
print(addResult)
print(subtractResult)

In [None]:
# You can also set default values for arguments
def power(base, power=2):
    return base ** power

# Just give base, power defaults to 2
print(power(4))

# Give base and power
print(power(4, power=3))

# Explicitly list arguments
print(power(base=4, power=3))

# When being explicit, order doesn't matter
print(power(power=3, base=4))

Let's write a function together! It'll take in a list and a value. It'll search for the value in the list; if it finds the value in the list, it'll return the index of the value's first appearance. If the value isn't in the list, it will return `-1`.

e.g.
```
search([1, 2, 3, 2], 1) => 0
search([1, 2, 3, 2], 2) => 1
search([1, 2, 3, 4], 5) => -1
```

In [None]:
# Write code here!





Here's some solutions I came up with:

In [None]:
def search_1(lst, val):
    for i in range(len(lst)):
        if lst[i] == val:
            return i
    return -1

def search_2(lst, val):
    i = 0
    while i < len(lst):
        if lst[i] == val:
            return i
        i += 1
    return -1

def search_3(lst, val):
    for i, item in enumerate(lst):
        if item == val:
            return i
    return -1

def search_4(lst, val):
    try:
        return lst.index(val)
    except:
        return -1

### List comprehensions

List comprehensions are basically shorthand for `for` loops- they're not necessary to know, just helpful shorthand. This is what they look like:

```
[<expression> for <item> in <iterable>]
```
Here's some examples:

In [None]:
[x*2 for x in range(5)]

In [None]:
[month.upper() for month in months]

In [None]:
['a' in month.lower() for month in months]

You can also add an additional `if` clause, which filters elements from the `iterable` based on whether they meet the conditional.

In [None]:
[month for month in months if 'a' in month.lower()]

In [None]:
# // is integer division; e.g. 4 // 2 = 2, 5 // 2 = 2, 1 // 2 = 0
[x // 2 for x in range(10) if x % 2 == 0]

### Tying it all together

Write the following functions (arguments written in `monospace`):
- A function that returns the numbers in list `lst` that are divisible by `k`
    - Remember that a % b is the remainder function
- A function that returns the elements in list `lst` that have a vowel in them (excluding y)
    - A string is almost like a list of characters. So, you can iterate over each char in a string (e.g. `for char in string:`) or check if a string is in another string (e.g. `'a' in 'can' ==> True`)
- A function that determines if `n` is a prime number
    - If you want to use the squareroot function (not necessary), you can get it from `import math`, and then call `math.sqrt()`

In [None]:
'''Write function 1 here!'''
def itemsDivisibleByK(lst, k):
    # Replace this; pass just means "do nothing"
    pass



In [None]:
'''Test function 1'''
# Should be [0, 3, 6, 9]
itemsDivisibleByK([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 3)

In [None]:
'''Write function 2 here!'''
def elementsWithVowels(lst):
    pass



In [None]:
'''Test function 2'''
# Should be ["machine", "learning", "def"]
elementsWithVowels(["machine", "learning", "bcd", "def", "xyz"])

In [None]:
'''Write function 3 here!'''
def isPrime(n):
    pass



In [None]:
'''Test function 3'''
for i in range(2, 16):
    print(i, isPrime(i))

Here are my solutions, but they might look different from yours:

In [None]:
def itemsDivisibleByK(lst, k):
    return [item for item in lst if item % k == 0]

In [None]:
def containsVowel(string):
    for char in string:
        if char in 'aeiou':
            return True
    return False

def elementsWithVowels(lst):
    return [string for string in lst if containsVowel(string)]

In [None]:
import math
def isPrime(n):
    if n < 2:
        return False
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True