<a href="https://colab.research.google.com/github/james-weichert/python-teaching-demo/blob/main/uw_teaching_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Iteration Applications: An Introduction to Machine Learning Using For Loops**
### Teaching Demo
### _University of Washington Paul G. Allen School of Computer Science & Engineering_
James Weichert | January 2025



---



**A Note On Using This Notebook**

<blockquote>

This code demo is designed to fit into a CS1-level introductory programming course taught in the Python programming language. While I assume you have some basic familiarity with Python, you may not be familiar with all of the Python libraries or functions used in this notebook — and that's totally OK!

In order to keep this demo easy to use, a lot of the complex code is already written. All you have to do is run these code cells (by pressing the play button next to the cell or pressing `shift` + `return` on your keyboard with the cell selected). Additionally, here are a few tips that will be helpful:

* We're using `pandas` 'DataFrames' (tables) to store data. To extract the values of a column in the table, use the syntax `table_name['column_name']` — this will return the values in the `column_name` column as a list-like Python object
* This notebook is organized into big sections with smaller subsections. To keep things organized and help with navigating the notebook, I recommend doing the following at the very beginning of working with the notebook:
  
  * Press `command`/`control` + `shift` + `A` to highlight all cells
  * With the cells highlighted, press `command`/`control` + `]` to collapse all cells into their folders. As you work through the notebook, you can expand each section one at a time!
</blockquote>

Now you're ready to get started!

In [24]:
# Just run this cell to import the necessary Python libraries for this demo

import numpy as np
import pandas as pd
import matplotlib.pyplot



---



## **1. Iteration Refresher**

**Iteration** involves performing the same task over and over again, usually _for each_ item in a collection of items (like a list or array).

**For Loops** are a powerful tool to loop a specified number of times. In Python, for loops use the following syntax:


```
# This is a for loop:

for x in collection:
  # do something using x
```



One way to use for loops is as a counter or index:

In [25]:
colors = ['red', 'orange', 'green', 'blue', 'violet']

for i in range(5):
  print(i)
  print(colors[i])

0
red
1
orange
2
green
3
blue
4
violet


This is very similar to for loops in other programming languages like Java, where you use an iterator variable (like `i`) as an index to access items in a list or array (e.g. `colors[i]`).

**That's great...** but for loops in Python are even _more_ useful! Perhaps a better name for a Python for loop is a **for each loop**, since the loop can perform the same task _for each_ element in a list.

In [26]:
for color in colors:
  print(color)

red
orange
green
blue
violet


Also note that `range(5)` is a collection itself...

In [27]:
list(range(5))

[0, 1, 2, 3, 4]

...so there is no difference in the mechanics of



```
for i in range(5):
```
and


```
for color in colors:
```




_Below are two more for loop example exercises to try out._

### Ex. 1: Dog Years

A popular convention to find the 'true' age of a dog in 'human years' is to multiply the dog's age by 7.

However, a 2019 genetic study of Labrador retrievers by [Wang et al.](https://www.biorxiv.org/content/10.1101/829192v1?ct=) cited by the [American Kennel Club](https://www.akc.org/expert-advice/health/how-to-calculate-dog-years-to-human-years/) finds that a more accurate formula is:

$$\text{human_age} = 16 \cdot \ln (\text{dog_age}) + 31$$

Using the function `human_age`, defined below, we can find the human age equivalent for dogs!

In [28]:
def human_age(dog_age):
  """Takes a dog's age in years and returns the equivalent human age"""
  return 16 * np.log(dog_age) + 31

For example, James' dog Lego is 13 years old...

In [29]:
human_age(13)

72.03918971938458

...so he is 72 years old in human years!

**Ex. 1** Now lets find the human ages of a few dogs. For each dog age in the `dog_ages` list, apply the `human_age` function to find its human age (and print it).

_Hint: Use a for loop!_

In [30]:
dog_ages = [4, 11, 9, 7, 2]

In [31]:
# ex. 1

# write your code here

### Ex. 2: Birthday Reminder

**Ex. 2** James is bad at remembering his friends' birthdays. To help him, write a function `birth_month_reminder` to remind James of which of his friends have a birthday this month. The function should take in the `birthdays` table created below and a `month` (a string, e.g. `"October"`) and print out the names of each friend in the table with a birthday in that month.

_Hints:_
1. You can use `len(table)` to return the number of rows in a table.
2. `table_name[column_name]` returns the `column_name` column in the table as an array-like Python object. You can index into arrays just like you would a Python list.

In [32]:
birthdays = pd.DataFrame({'Name': ['Pablo', 'Rishi', 'Shawn', 'Eunice', 'Sophia', 'Prasann', 'Will', 'Emilia', 'Rebecca', 'Sunny'],
                          'Birth Month': ['October', 'February', 'June', 'January', 'December', 'March', 'January', 'November', 'August', 'April']})
birthdays

Unnamed: 0,Name,Birth Month
0,Pablo,October
1,Rishi,February
2,Shawn,June
3,Eunice,January
4,Sophia,December
5,Prasann,March
6,Will,January
7,Emilia,November
8,Rebecca,August
9,Sunny,April


In [33]:
def birth_month_reminder(birthdays, month):
  """
  Given the table `birthdays` containing a collection of James' friends names and birth months,
  print the names of the friends who have birthdays in the specified `month`
  """

  # write your function here

## **2. Seattle Restaurant Recommender &trade;**

James is a **big** foodie and he loves trying local restaurants when he visits a new city. For lunch today, James wants to go a restaurant in the UDistrict.

The table `udistrict_restaurants` has 20+ nearby restaurants, their average price, and average Google Maps rating.

In [34]:
udistrict_restaurants = pd.read_csv('https://github.com/james-weichert/python-teaching-demo/blob/main/UW/udistrict_restaurants.csv?raw=true')
udistrict_restaurants.head()

Unnamed: 0,Restaurant,Price ($),Rating (out of 5)
0,Panda Noodle Bar,15,4.45
1,Taste of Xi'an,17,4.46
2,U:Don,13,4.38
3,Shawarma King,12,4.36
4,Samir's Mediterranean Grill,14,4.53


**Your task is to help James by building a _Restaurant Recommender_ &trade;** to suggest an ideal lunch spot that suits the user's desired restaurant price and rating.

Design and implement your recommendation algorithm in the function `best_match`, which takes a `restaurants` table (like `udistrict_restaurants`), an `ideal_price` (integer), and an `ideal_rating` (decimal) as inputs. The function should return the name of the restaurant in `restaurants` that best matches the user's preferences.

Feel free to use the `distance` function in your implementation. The function takes in the 'x' and 'y' coordinates for two points and returns the [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) between them.

### Try It Yourself: Build a Restaurant Recommender &trade;

In [35]:
def distance(x1, y1, x2, y2):
  """Returns the Euclidean distance between two points (x1, y1) and (x2, y2)"""

  return np.sqrt((x1 - x2)**2 + (y1 - y2)**2)

In [36]:
def best_match(restaurants, ideal_price, ideal_rating):
  """
  Given a table `restaurants`, an `ideal_price` (int) and an `ideal_rating (float),
  return the name of the restaurant in the table that is the best match
  """

  # write your algorithm here

  return # return the name of the recommended restaurant

#### Helper Functions


**No need to understand what's happening in these two cells**—They help with the interactive graph below!

In [37]:
# Just run this cell to import the necessary libraries to make the iteractive visualization work

import ipywidgets as widgets

import plotly.express as px
import plotly.graph_objects as go

In [38]:
# This cell takes care of setting up interactive sliders to change the `price` and `rating`
# and visualizing the restaurants as points on a scatter plot

# Just run me!

price = 15
rating = 4.2

def plot_restaurants(restaurants, price, rating, square_aspect = False):

  recommended_restaurant = best_match(restaurants, price, rating)
  colors = restaurants['Restaurant'] == recommended_restaurant

  fig = px.scatter(restaurants, x = 'Price ($)', y = 'Rating (out of 5)', hover_name = 'Restaurant', color = colors)
  fig.add_trace(go.Scatter(x = [price], y = [rating], name='My Preference'))

  fig.update_traces(marker = {'size': 10})
  fig.update_layout(showlegend = False, title = 'Restaurant Recommender')

  if square_aspect:
    fig.update_layout(yaxis_scaleanchor="x")

  fig.show()


def update(new_price, new_rating):
    global price, rating
    price = new_price
    rating = new_rating

#### Interactive Graph

Use the two cells below to visualize how the **_Restaurant Recommender_ &trade;** changes based on your desired price and rating, which you can input using the two sliders in the first cell. After selecting your price and rating, run the second cell to generate a scatter plot.

>If your `best_match` function returns the _name_ of a restaurant in the `udistrict_restaurants` table everything should work!

In [39]:
widgets.interact(update,
                 new_price = widgets.IntSlider(min = 10, max = 25, value = price, description = 'Price ($)'),
                 new_rating = widgets.FloatSlider(min = 3.8, max = 5, value = rating, description = 'Rating'));

interactive(children=(IntSlider(value=15, description='Price ($)', max=25, min=10), FloatSlider(value=4.2, des…

In [40]:
plot_restaurants(udistrict_restaurants, price, rating)

## **3. Just For Fun...**

>**Note!** This section is just for fun since we won't have time to cover it fully in the lecture. Feel free to browse the code cell in the _k-Means Implementation_ section and see if you can figure out what each of the _for loops_ in the algorithm is doing.

Another common machine learning task is to **cluster** data points into distinct groups. For example, the `seattle_restaurants` table contains restaurant information (similar to `udistrict_restaurants`) from three Seattle neighborhoods: the UDistrict, University Village, and Pike Place.

We want a machine learning algorithm that can cluster each restaurant according to its neighborhood **without** actually knowing that information.

___Is this even possible?___ Let's see...

In [41]:
seattle_restaurants = pd.read_csv('https://github.com/james-weichert/python-teaching-demo/blob/main/UW/seattle_restaurants.csv?raw=true')
seattle_restaurants.head()

Unnamed: 0,Neighborhood,Restaurant,Price ($),Rating (out of 5)
0,UDistrict,Panda Noodle Bar,15,4.45
1,UDistrict,Taste of Xi'an,17,4.46
2,UDistrict,U:Don,13,4.38
3,UDistrict,Shawarma King,12,4.36
4,UDistrict,Samir's Mediterranean Grill,14,4.53


### k-Means Clustering

The algorithm we will use to accomplish our task is called **k-Means Clustering**, which has a fixed parameter `k` that specifies how many clusters we want. Since we know there are three neighborhoods represented in our data, we will set `k = 3`.

k-Means works by trying to position the cluster centers (called "centroids") in such a way that all of the data points are closer to their own centroid than to the centroid of a different cluster. To accomplish this, the algorithm repeats the following two steps for a certain number of times (after randomly initializing each centroid):

1. Assign each data point to the cluster with the closest centroid
2. Re-calculate the cluster centroids to be the average of all of the points in the cluster

After running for multiple iterations, the algorithm should beging to converge to stable clusters that capture the big 'groupings' present in the data.

#### K-Means Implementation

In [42]:
def k_means(restaurants, clusters = 3, iter = 10):
  """
  Given a table of restaurants and a number of clusters,
  use the k-means clustering algorithm (`iter` number of iterations)
  to assign each data point to one of three clusters
  """

  n = len(restaurants)

  # We normalize the price and ratings variables to make the clusters fit better

  normalized_prices = (restaurants['Price ($)'] - np.mean(restaurants['Price ($)'])) / np.std(restaurants['Price ($)'])
  normalized_ratings = (restaurants['Rating (out of 5)'] - np.mean(restaurants['Rating (out of 5)'])) / np.std(restaurants['Rating (out of 5)'])

  # Randomly initialize cluster centroids

  centroids = [np.random.random(size=2), np.random.random(size=2), np.random.random(size=2)]

  # Repeat STEP 1 and STEP 2 a specified number of times
  for i in range(iter):

    # STEP 1: Assign points to closest centroid

    cluster_assignments = np.zeros(shape = n)

    # Loop over all data points j
    for j in range(n):

      min_dist = 1000
      min_cluster = 0

      # For each cluster k, check if data point j is closer to that cluster centroid
      for k in range(clusters):

        restaurant_price = normalized_prices[j]
        restaurant_rating = normalized_ratings[j]

        dist = distance(centroids[k][0], centroids[k][1], restaurant_price, restaurant_rating)

        if dist < min_dist:
          min_cluster = k
          min_dist = dist

      # Assign data point j to the cluster with the smallest distance to the centroid
      cluster_assignments[j] = min_cluster

    # STEP 2: Update centroids to be the mean of the points in the cluster

    # Loop over each cluster k
    for k in range(clusters):

      price_sum = 0
      rating_sum = 0

      count = 0

      # For each data point j in cluster k add its price and rating to running totals
      for j in range(n):

        if cluster_assignments[j] == k:
          price_sum += normalized_prices[j]
          rating_sum += normalized_ratings[j]

          count += 1

      # Divide the running totals by the number of data points in the cluster
      if count != 0:
        centroids[k][0] = price_sum / count
        centroids[k][1] = rating_sum / count

  return cluster_assignments


#### Helper Function

In [43]:
# Just run this cell to help with generating a scatter plot showing each restaurant

def plot_clusters(restaurants, show_neighborhoods = False):

  colors = k_means(restaurants)

  if show_neighborhoods:
    colors = 'Neighborhood'

  fig = px.scatter(restaurants, x = 'Price ($)', y = 'Rating (out of 5)', hover_name = 'Restaurant', hover_data = 'Neighborhood', color = colors, color_continuous_scale='portland')

  fig.update_traces(marker = {'size': 10})
  fig.update_layout(title = 'Seattle Restaurants by Neighborhood')

  if not show_neighborhoods:
    fig.update_layout(showlegend = False)

  fig.show()

#### Interactive Graphs

Let's see how good k-Means Clustering is at grouping our restaurants by neighborhood.

**Run this cell to visualize the three k-Means clusters**

In [44]:
plot_clusters(seattle_restaurants)

Now compare the k-Means clusters to the actual restaurant neighborhoods.

* How well did the algorithm do?

* Is there a limit to how good an algorithm _can_ be with this data?

In [45]:
plot_clusters(seattle_restaurants, show_neighborhoods = True)

## **4. The End**

**That's all I have for my demo!**

I hope you enjoyed coding along today. Maybe you found the application of a basic programming tool like for loops to build machine learning algorithms interesting and exciting. If so, I would encourage you to look into taking a machine learning/AI class offered at UW, or to do some experimenting with AI on your own.


**Feedback**

This is the first iteration of this lecture, so the content has room for improvement. I am always appreciative of constructive feedback — feel free to come talk to me or send me an email (jamesweichert@vt.edu) if you have any thoughts!

#### Open for a Surprise

Here is a photo of my dog, Lego.

<img src="https://github.com/james-weichert/python-teaching-demo/blob/main/UW/lego.JPEG?raw=true" width="400px"/>