### Motivation

<!-- ![Alt text](https://doimages.nyc3.cdn.digitaloceanspaces.com/002Blog/0-BLOG-BANNERS/app_platform.png "a title") -->
<!-- ![Perfect average rating with low count or high average rating with high count](amz_rats.png "mem") -->
<!-- ![Perfect average rating with low count or high average rating with high count](amz_rats.png) -->
<!-- ![](Screenshots/amz_rats.png) -->
<img src="Screenshots/amz_rats.png" width="350" height="275">

This is the first installment of a sequence on [recommendation systems](https://en.wikipedia.org/wiki/Recommender_system).

#### How do we compare a movie with a single 5-star rating to one with many 4 and 5 ratings? 
Say we want to take user ratings of movies and rank them. A simple approach would be to just look at the average rating of each movie, but that doesn't feel right.. different movies have different amounts of ratings, and we don't want small samples to skew our estimate of the ''true'' rating that would arise if the whole population rated that movie. Smoothed (a.k.a. dampened) average ratings is one approach to handle this and is an example of a non-personalzed recommendation system. Let's start by looking at the highest rated movies from the [MovieLens dataset on Kaggle](https://www.kaggle.com/datasets/jneupane12/movielens).

In [None]:
#| echo: false
# import relevant libraries
import pandas as pd
import numpy as np
import scipy.stats as stats
from tqdm.notebook import tqdm
from scipy.stats import norm

import ipywidgets as widgets
import IPython.display 
from IPython.display import display, clear_output, HTML

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns

# np.random.seed(123)

In [None]:
# import the data
# ratings doesn't include movie names so merge with ids to get names
ratings = pd.read_csv("archive/rating.csv", parse_dates=['timestamp'])
ids = pd.read_csv("archive/movie.csv")
ratings = pd.merge(ratings, ids, on='movieId', how='left')

# Find each movie's mean rating
avg_ratings = ratings.groupby(['movieId', 'title'])['rating'].agg(
    avg_rating='mean',
    rating_count='count'
)

avg_ratings.sort_values(by = 'avg_rating', ascending=False, inplace = True)
avg_ratings.head()

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: left;">
      <th>movieId</th>
      <th>Title</th>
      <th>Mean Rating</th>
      <th>Number of Ratings</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>117314</td>
      <td>Neurons to Nirvana (2013)</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>117418</td>
      <td>Victor and the Secret of Crocodile Mansion (2012)</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>117061</td>
      <td>The Green (2011)</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>109571</td>
      <td>Into the Middle of Nowhere (2010)</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>109715</td>
      <td>Inquire Within (2012)</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
  </tbody>
</table>

Not quite the blockbusters or classics we were expecting.. We'll soon see that there are a bunch of movies that had just one rating with an average of 5 stars. A little bit of Exploratory Data Analysis to start.

In [None]:
# Plot histogram
plt.figure(figsize=(10, 5))
plt.hist(avg_ratings['avg_rating'], bins=30, color='darkorange', edgecolor='black')
plt.xlabel('Mean Movie Rating')
plt.ylabel('Number of Movies')
plt.title('Histogram of Mean Movie Rating')
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

![](Screenshots/Mean_movie_histogram.png)

Now let's check out what the users are like. We see right away that there are users with dozens of ratings that rated all of their movies as 5-stars. The distribution of mean user ratings gives us a sense of how people tend to rate movies.

In [None]:
# Find each user's mean rating
user_avg_ratings = ratings.groupby('userId')['rating'].agg(
    user_avg_rating='mean',
    rating_count='count'
)

user_avg_ratings.sort_values(by = 'user_avg_rating', ascending=False, inplace = True)
user_avg_ratings.head()

<table border="1" class="dataframe">
  <thead>
    <tr style="text-alignleftht;">
      <th>user_avg_rating</th>
      <th>rating_count</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>5.000</td>
      <td>20</td>
    </tr>
    <tr>
      <td>5.000</td>
      <td>35</td>
    </tr>
    <tr>
      <td>4.949</td>
      <td>39</td>
    </tr>
    <tr>
      <td>4.893</td>
      <td>56</td>
    </tr>
    <tr>
      <td>4.827</td>
      <td>52</td>
    </tr>
  </tbody>
</table>

In [None]:
#| echo: false
# Plot histogram
plt.figure(figsize=(10, 5))
plt.hist(user_avg_ratings['user_avg_rating'], bins=30, color='darkorange', edgecolor='black')
plt.xlabel('Mean User Rating')
plt.ylabel('Number of Users')
plt.title('Histogram of Mean User Rating')
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

In [None]:
# Scatterplot
plt.scatter(avg_ratings['avg_rating'], avg_ratings['rating_count'], s=5)
plt.xlabel('Mean Movie Rating')
plt.ylabel('Number of Ratings')
plt.title('Scatterplot of Mean Movie Rating vs. Number of Ratings')
plt.grid(True)
plt.show()

<iframe src="Screenshots/interactive_movie_plot.html" width="100%" height="500"></iframe>

This previous plot shows that there are a lot of movies with high means (even some 5.0s) that have few ratings. We may not want to necessarily recommend those movies to everyone, but the simplest recommender (that solely looks at mean movie rating) would recommend those.

In [None]:
#| echo: false
# Scatterplot
plt.scatter(user_avg_ratings['user_avg_rating'], user_avg_ratings['rating_count'], s=5)
plt.xlabel('Mean User Rating')
plt.ylabel('Number of Ratings')
plt.title('Scatterplot of Mean User Rating vs. Number of Ratings')
plt.grid(True)
plt.show()

#### Average Rating
This system simply takes the movies with the highest mean ratings and recommends them to everyone.

We've computed each movie's mean rating and sorted by that mean rating, so the top movies according to this (admittedly poor) recommendation system would be those with the highest means, regardless of how many ratings there are.

In [None]:
#| echo: false
sum(avg_ratings['avg_rating'] == 5.0)

In [None]:
#| echo: false
max(mean_ratings[mean_ratings['avg_rating'] == 5]['rating_count'])

There would be a 113-way tie for first place.. and the most ratings that any of those perfectly rated films has is 2.

#### Smoothed (dampened) average rating
The idea here is that instead of taking the rating to be the mean over the $N_j$ ratings of movie $j$, (i.e. $r_j = \sum_i X_i/N_j$), we can look at something like 

$$r_j = \frac{\sum X_i + \mu_0 \lambda}{N_j + \lambda}$$

Here, $\mu_0$ and $\lambda$ are hyperparameters. I'll just take $\mu_0$ to be the mean of all the ratings and $\lambda = 1$ to start. This system gives unrated movies the mean rating $\mu_0$ and as a movie gets more ratings, this rating converges towards its "true mean rating", i.e. its rating across the whole population. 

In [None]:
# Step 1: Compute the global mean
mu_0 = ratings['rating'].mean()
damp_factor = 1

# Step 2: Group and compute sum and count
dampened_avg_ratings = ratings.groupby(['movieId', 'title'])['rating'].agg(
    sum_rating='sum',
    rating_count='count'
).reset_index()

# Step 3: Compute dampened mean
dampened_avg_ratings['dampened_mean'] = (dampened_avg_ratings['sum_rating'] + damp_factor*mu_0) / (dampened_avg_ratings['rating_count'] + damp_factor)

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: left;">
      <th>movieId</th>
      <th>Title</th>
      <th>Ratings Sum</th>
      <th>Number of Ratings</th>
      <th>Dampened Mean</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>108527</td>
      <td>Catastroika (2012)</td>
      <td>10.0</td>
      <td>2</td>
      <td>4.509</td>
    </tr>
    <tr>
      <td>103871</td>
      <td>Consuming Kids: The Commercialization of Childhood (2008)</td>
      <td>10.0</td>
      <td>2</td>
      <td>4.509</td>
    </tr>
    <tr>
      <td>98275</td>
      <td>Octopus, The (Le poulpe) (1998)</td>
      <td>14.5</td>
      <td>3</td>
      <td>4.506</td>
    </tr>
    <tr>
      <td>318</td>
      <td>Shawshank Redemption, The (1994)</td>
      <td>281788.0</td>
      <td>63366</td>
      <td>4.447</td>
    </tr>
    <tr>
      <td>113315</td>
      <td>Zero Motivation (Efes beyahasei enosh) (2014)</td>
      <td>49.5</td>
      <td>11</td>
      <td>4.419</td>
    </tr>
  </tbody>
</table>

Okay, that fourth spot being held by a critically-acclaimed movie is encouraging. We see the other spots are still held by movies with few ratings, and we can tweak the hyperparameter $\lambda$ to further dampen the means. Here's what the top 10 lists look like for various $\lambda$.

In [None]:
#| echo: false
# Our top movies
dampened_avg_ratings.sort_values(by = 'dampened_mean', ascending=False, inplace = True)
dampened_avg_ratings.head()

In [2]:
#| echo: false
# That set of parameters led to some of the highest rated movies being those with a few high ratings. We can play with the hyperparameters to dampen things more, so if $\lambda = 2$, then it's pulling everything closer to the true mean even more.

In [None]:
#| echo: false
# Step 1: Compute the global mean
mu_0 = ratings['rating'].mean()
damp_factor = 2

# Step 2: Group and compute sum and count
dampened_avg_ratings = ratings.groupby(['movieId', 'title'])['rating'].agg(
    sum_rating='sum',
    rating_count='count'
).reset_index()

# Step 3: Compute dampened mean
dampened_avg_ratings['dampened_mean'] = (dampened_avg_ratings['sum_rating'] + damp_factor*mu_0) / (dampened_avg_ratings['rating_count'] + damp_factor)

In [None]:
#| echo: false
# Our top movies
dampened_avg_ratings.sort_values(by = 'dampened_mean', ascending=False, inplace = True)
dampened_avg_ratings.head(10)

In [10]:
#| echo: false
#| eval: true

table0 = """<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: left;">
      <th>Title</th>
      <th>Dampened Mean</th>
      <th>Ratings Sum</th>
      <th>Number of Ratings</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Neurons to Nirvana (2013)</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>Victor and the Secret of Crocodile Mansion (2012)</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>The Green (2011)</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>Into the Middle of Nowhere (2010)</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>Inquire Within (2012)</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>Freeheld (2007)</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>Who Killed Vincent Chin? (1987)</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>Marihuana (1936)</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>The Encounter (2010)</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>Foster Brothers, The (Süt kardesler) (1976)</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1</td>
    </tr>
  </tbody>
</table>"""

In [11]:
#| echo: false
#| eval: true

table1 = """<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: left;">
      <th>Title</th>
      <th>Dampened Mean</th>
      <th>Ratings Sum</th>
      <th>Number of Ratings</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Catastroika (2012)</td>
      <td>4.509</td>
      <td>10.0</td>
      <td>2</td>
    </tr>
    <tr>
      <td>Consuming Kids: The Commercialization of Childhood (2008)</td>
      <td>4.509</td>
      <td>10.0</td>
      <td>2</td>
    </tr>
    <tr>
      <td>Octopus, The (Le poulpe) (1998)</td>
      <td>4.506</td>
      <td>14.5</td>
      <td>3</td>
    </tr>
    <tr>
      <td>Shawshank Redemption, The (1994)</td>
      <td>4.447</td>
      <td>281788.0</td>
      <td>63366</td>
    </tr>
    <tr>
      <td>Zero Motivation (Efes beyahasei enosh) (2014)</td>
      <td>4.419</td>
      <td>49.5</td>
      <td>11</td>
    </tr>
    <tr>
      <td>Echoes of the Rainbow (Sui yuet san tau) (2010)</td>
      <td>4.381</td>
      <td>14.0</td>
      <td>3</td>
    </tr>
    <tr>
      <td>Plastic Bag (2009)</td>
      <td>4.381</td>
      <td>14.0</td>
      <td>3</td>
    </tr>
    <tr>
      <td>Hellhounds on My Trail (1999)</td>
      <td>4.381</td>
      <td>14.0</td>
      <td>3</td>
    </tr>
    <tr>
      <td>Deewaar (1975)</td>
      <td>4.381</td>
      <td>14.0</td>
      <td>3</td>
    </tr>
    <tr>
      <td>All Passion Spent (1986)</td>
      <td>4.381</td>
      <td>14.0</td>
      <td>3</td>
    </tr>
  </tbody>
</table>"""

In [12]:
#| echo: false
#| eval: true

table2 = """<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: left;">
      <th>Title</th>
      <th>Dampened Mean</th>
      <th>Ratings Sum</th>
      <th>Number of Ratings</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Shawshank Redemption, The (1994)</td>
      <td>4.447</td>
      <td>281788.0</td>
      <td>63366</td>
    </tr>
    <tr>
      <td>Godfather, The (1972)</td>
      <td>4.365</td>
      <td>180503.5</td>
      <td>41355</td>
    </tr>
    <tr>
      <td>Zero Motivation (Efes beyahasei enosh) (2014)</td>
      <td>4.350</td>
      <td>49.5</td>
      <td>11</td>
    </tr>
    <tr>
      <td>Usual Suspects, The (1995)</td>
      <td>4.334</td>
      <td>203741.5</td>
      <td>47006</td>
    </tr>
    <tr>
      <td>Octopus, The (Le poulpe) (1998)</td>
      <td>4.310</td>
      <td>14.5</td>
      <td>3</td>
    </tr>
    <tr>
      <td>Schindler's List (1993)</td>
      <td>4.310</td>
      <td>215741.5</td>
      <td>50054</td>
    </tr>
    <tr>
      <td>Godfather: Part II, The (1974)</td>
      <td>4.276</td>
      <td>117144.0</td>
      <td>27398</td>
    </tr>
    <tr>
      <td>Seven Samurai (Shichinin no samurai) (1954)</td>
      <td>4.274</td>
      <td>49627.5</td>
      <td>11611</td>
    </tr>
    <tr>
      <td>Rear Window (1954)</td>
      <td>4.271</td>
      <td>74530.5</td>
      <td>17449</td>
    </tr>
    <tr>
      <td>Band of Brothers (2001)</td>
      <td>4.263</td>
      <td>18353.0</td>
      <td>4305</td>
    </tr>
  </tbody>
</table>"""

In [13]:
#| echo: false
#| eval: true

table3 = """<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: left;">
      <th>Title</th>
      <th>Dampened Mean</th>
      <th>Ratings Sum</th>
      <th>Number of Ratings</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Shawshank Redemption, The (1994)</td>
      <td>4.447</td>
      <td>281788.0</td>
      <td>63366</td>
    </tr>
    <tr>
      <td>Godfather, The (1972)</td>
      <td>4.365</td>
      <td>180503.5</td>
      <td>41355</td>
    </tr>
    <tr>
      <td>Usual Suspects, The (1995)</td>
      <td>4.334</td>
      <td>203741.5</td>
      <td>47006</td>
    </tr>
    <tr>
      <td>Schindler's List (1993)</td>
      <td>4.310</td>
      <td>215741.5</td>
      <td>50054</td>
    </tr>
    <tr>
      <td>Zero Motivation (Efes beyahasei enosh) (2014)</td>
      <td>4.291</td>
      <td>49.5</td>
      <td>11</td>
    </tr>
    <tr>
      <td>Godfather: Part II, The (1974)</td>
      <td>4.276</td>
      <td>117144.0</td>
      <td>27398</td>
    </tr>
    <tr>
      <td>Seven Samurai (Shichinin no samurai) (1954)</td>
      <td>4.274</td>
      <td>49627.5</td>
      <td>11611</td>
    </tr>
    <tr>
      <td>Rear Window (1954)</td>
      <td>4.271</td>
      <td>74530.5</td>
      <td>17449</td>
    </tr>
    <tr>
      <td>Band of Brothers (2001)</td>
      <td>4.263</td>
      <td>18353.0</td>
      <td>4305</td>
    </tr>
    <tr>
      <td>Casablanca (1942)</td>
      <td>4.258</td>
      <td>103686.0</td>
      <td>24349</td>
    </tr>
  </tbody>
</table>"""

In [14]:
#| echo: false
#| eval: true

table4 = """<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: left;">
      <th>Title</th>
      <th>Dampened Mean</th>
      <th>Ratings Sum</th>
      <th>Number of Ratings</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Shawshank Redemption, The (1994)</td>
      <td>4.447</td>
      <td>281788.0</td>
      <td>63366</td>
    </tr>
    <tr>
      <td>Godfather, The (1972)</td>
      <td>4.365</td>
      <td>180503.5</td>
      <td>41355</td>
    </tr>
    <tr>
      <td>Usual Suspects, The (1995)</td>
      <td>4.334</td>
      <td>203741.5</td>
      <td>47006</td>
    </tr>
    <tr>
      <td>Schindler's List (1993)</td>
      <td>4.310</td>
      <td>215741.5</td>
      <td>50054</td>
    </tr>
    <tr>
      <td>Godfather: Part II, The (1974)</td>
      <td>4.276</td>
      <td>117144.0</td>
      <td>27398</td>
    </tr>
    <tr>
      <td>Seven Samurai (Shichinin no samurai) (1954)</td>
      <td>4.274</td>
      <td>49627.5</td>
      <td>11611</td>
    </tr>
    <tr>
      <td>Rear Window (1954)</td>
      <td>4.271</td>
      <td>74530.5</td>
      <td>17449</td>
    </tr>
    <tr>
      <td>Band of Brothers (2001)</td>
      <td>4.262</td>
      <td>18353.0</td>
      <td>4305</td>
    </tr>
    <tr>
      <td>Casablanca (1942)</td>
      <td>4.258</td>
      <td>103686.0</td>
      <td>24349</td>
    </tr>
    <tr>
      <td>Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)</td>
      <td>4.256</td>
      <td>27776.5</td>
      <td>6525</td>
    </tr>
  </tbody>
</table>"""

In [24]:
#| echo: false
#| eval: true

table5 = """<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: left;">
      <th>Title</th>
      <th>Dampened Mean</th>
      <th>Ratings Sum</th>
      <th>Number of Ratings</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Shawshank Redemption, The (1994)</td>
      <td>4.447</td>
      <td>281788.0</td>
      <td>63366</td>
    </tr>
    <tr>
      <td>Godfather, The (1972)</td>
      <td>4.365</td>
      <td>180503.5</td>
      <td>41355</td>
    </tr>
    <tr>
      <td>Usual Suspects, The (1995)</td>
      <td>4.334</td>
      <td>203741.5</td>
      <td>47006</td>
    </tr>
    <tr>
      <td>Schindler's List (1993)</td>
      <td>4.310</td>
      <td>215741.5</td>
      <td>50054</td>
    </tr>
    <tr>
      <td>Godfather: Part II, The (1974)</td>
      <td>4.276</td>
      <td>117144.0</td>
      <td>27398</td>
    </tr>
    <tr>
      <td>Seven Samurai (Shichinin no samurai) (1954)</td>
      <td>4.274</td>
      <td>49627.5</td>
      <td>11611</td>
    </tr>
    <tr>
      <td>Rear Window (1954)</td>
      <td>4.271</td>
      <td>74530.5</td>
      <td>17449</td>
    </tr>
    <tr>
      <td>Band of Brothers (2001)</td>
      <td>4.262</td>
      <td>18353.0</td>
      <td>4305</td>
    </tr>
    <tr>
      <td>Casablanca (1942)</td>
      <td>4.258</td>
      <td>103686.0</td>
      <td>24349</td>
    </tr>
    <tr>
      <td>Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)</td>
      <td>4.256</td>
      <td>27776.5</td>
      <td>6525</td>
    </tr>
  </tbody>
</table>"""

In [28]:
#| echo: false
#| eval: true
table6 = """<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: left;">
      <th>Title</th>
      <th>Dampened Mean</th>
      <th>Ratings Sum</th>
      <th>Number of Ratings</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Shawshank Redemption, The (1994)</td>
      <td>4.447</td>
      <td>281788.0</td>
      <td>63366</td>
    </tr>
    <tr>
      <td>Godfather, The (1972)</td>
      <td>4.365</td>
      <td>180503.5</td>
      <td>41355</td>
    </tr>
    <tr>
      <td>Usual Suspects, The (1995)</td>
      <td>4.334</td>
      <td>203741.5</td>
      <td>47006</td>
    </tr>
    <tr>
      <td>Schindler's List (1993)</td>
      <td>4.310</td>
      <td>215741.5</td>
      <td>50054</td>
    </tr>
    <tr>
      <td>Godfather: Part II, The (1974)</td>
      <td>4.275</td>
      <td>117144.0</td>
      <td>27398</td>
    </tr>
    <tr>
      <td>Seven Samurai (Shichinin no samurai) (1954)</td>
      <td>4.274</td>
      <td>49627.5</td>
      <td>11611</td>
    </tr>
    <tr>
      <td>Rear Window (1954)</td>
      <td>4.271</td>
      <td>74530.5</td>
      <td>17449</td>
    </tr>
    <tr>
      <td>Band of Brothers (2001)</td>
      <td>4.262</td>
      <td>18353.0</td>
      <td>4305</td>
    </tr>
    <tr>
      <td>Casablanca (1942)</td>
      <td>4.258</td>
      <td>103686.0</td>
      <td>24349</td>
    </tr>
    <tr>
      <td>Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)</td>
      <td>4.256</td>
      <td>27776.5</td>
      <td>6525</td>
    </tr>
  </tbody>
</table>"""

In [29]:
#| echo: false
#| eval: true
table7 = """<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: left;">
      <th>Title</th>
      <th>Dampened Mean</th>
      <th>Ratings Sum</th>
      <th>Number of Ratings</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Shawshank Redemption, The (1994)</td>
      <td>4.447</td>
      <td>281788.0</td>
      <td>63366</td>
    </tr>
    <tr>
      <td>Godfather, The (1972)</td>
      <td>4.365</td>
      <td>180503.5</td>
      <td>41355</td>
    </tr>
    <tr>
      <td>Usual Suspects, The (1995)</td>
      <td>4.334</td>
      <td>203741.5</td>
      <td>47006</td>
    </tr>
    <tr>
      <td>Schindler's List (1993)</td>
      <td>4.310</td>
      <td>215741.5</td>
      <td>50054</td>
    </tr>
    <tr>
      <td>Godfather: Part II, The (1974)</td>
      <td>4.275</td>
      <td>117144.0</td>
      <td>27398</td>
    </tr>
    <tr>
      <td>Seven Samurai (Shichinin no samurai) (1954)</td>
      <td>4.274</td>
      <td>49627.5</td>
      <td>11611</td>
    </tr>
    <tr>
      <td>Rear Window (1954)</td>
      <td>4.271</td>
      <td>74530.5</td>
      <td>17449</td>
    </tr>
    <tr>
      <td>Band of Brothers (2001)</td>
      <td>4.262</td>
      <td>18353.0</td>
      <td>4305</td>
    </tr>
    <tr>
      <td>Casablanca (1942)</td>
      <td>4.258</td>
      <td>103686.0</td>
      <td>24349</td>
    </tr>
    <tr>
      <td>Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)</td>
      <td>4.256</td>
      <td>27776.5</td>
      <td>6525</td>
    </tr>
  </tbody>
</table>"""

In [None]:
#| echo: false
display(table0)

In [30]:
#| echo: false
#| eval: true

tables = [table0, table1, table2, table3, table4, table5] #, table6, table7, table2, table3, table4]
# tables = [table0, table1, table2, table3, table4, table5, table6, table7, table2, table3, table4]
# tables = [table0, table1, table2, table3, table4, table5, table6, table7, table8, table9, table10]
# tables = [table0_html for i in range(10)]

In [19]:
#| echo: false
#| eval: true

# from IPython.display import HTML

# # Example: tables is a list of 10 pre-rendered HTML tables
# # tables = [df0.to_html(), df1.to_html(), ... df9.to_html()]

# html = """
# <style>
# #tableSelect {
#     padding: 8px 12px;
#     font-size: 14px;
#     border: 1px solid #aaa;
#     border-radius: 6px;
#     background-color: #fafafa;
#     margin-left: 6px;
# }
# #tableSelect:hover {
#     background-color: #f0f0f0;
# }

# .table-container {
#     display: none;
#     margin-top: 15px;
#     padding: 12px;
#     background: #ffffff;
#     border: 1px solid #ddd;
#     border-radius: 8px;
#     box-shadow: 0 2px 6px rgba(0,0,0,0.05);
# }

# .table-container table {
#     border-collapse: collapse;
#     width: 100%;
#     font-size: 14px;
# }

# .table-container th {
#     background: #f7f7f7;
#     padding: 8px;
#     border-bottom: 2px solid #ddd;
# }
# .table-container td {
#     padding: 8px;
#     border-bottom: 1px solid #eee;
# }
# </style>

# <h3>Select a Dampening Factor     # html = html.replace(f"TABLE{i}_HTML", tables[i])
#     html = html.replace(f"TABLE{i}_HTML", tables[i])

# HTML(html)
# (0–10):</h3>

# <select id="tableSelect" onchange="showTable10()">
#     <option value="t0">0</option>
#     <option value="t1">1</option>
#     <option value="t2">2</option>
#     <option value="t3">3</option>
#     <option value="t4">4</option>
#     <option value="t5">5</option>
#     <option value="t6">6</option>
#     <option value="t7">7</option>
#     <option value="t8">8</option>
#     <option value="t9">9</option>
#     <option value="t10">10</option>
# </select>

# <div id="t0" class="table-container">TABLE0_HTML</div>
# <div id="t1" class="table-container">TABLE1_HTML</div>
# <div id="t2" class="table-container">TABLE2_HTML</div>
# <div id="t3" class="table-container">TABLE3_HTML</div>
# <div id="t4" class="table-container">TABLE4_HTML</div>
# <div id="t5" class="table-container">TABLE5_HTML</div>
# <div id="t6" class="table-container">TABLE6_HTML</div>
# <div id="t7" class="table-container">TABLE7_HTML</div>
# <div id="t8" class="table-container">TABLE8_HTML</div>
# <div id="t9" class="table-container">TABLE9_HTML</div>

# <script>
# function showTable10() {
#     var selected = document.getElementById("tableSelect").value;

#     document.querySelectorAll(".table-container").forEach(x => {
#         x.style.display = "none";
#     });

#     document.getElementById(selected).style.display = "block";
# }

# // Show table 0 on load
# showTable10();
# </script>
# """

In [35]:
#| echo: false

from IPython.display import HTML

# Example list of table HTML strings.
# Replace with your actual tables list.
# tables = [df.to_html() for df in dfs]

html = """
<style>
#tableSelect {
    padding: 8px 12px;
    font-size: 14px;
    border: 1px solid #aaa;
    border-radius: 6px;
    background-color: #fafafa;
    margin-left: 6px;
}
#tableSelect:hover {
    background-color: #f0f0f0;
}

.table-container {
    display: none;
    margin-top: 15px;
    padding: 12px;
    background: #ffffff;
    border: 1px solid #ddd;
    border-radius: 8px;
    box-shadow: 0 2px 6px rgba(0,0,0,0.05);
}
</style>

<h3>Select a Dampening Factor (0–5):</h3>

<select id="tableSelect" onchange="showTable10()">
    <option value="t0">0</option>
    <option value="t1">1</option>
    <option value="t2">2</option>
    <option value="t3">3</option>
    <option value="t4">4</option>
    <option value="t5">5</option>
</select>
"""

# Add table containers
for i in range(6):
    html += f'<div id="t{i}" class="table-container">TABLE{i}_HTML</div>\n'

# JavaScript
html += """
<script>
function showTable10() {
    var selected = document.getElementById("tableSelect").value;
    document.querySelectorAll(".table-container").forEach(x => {
        x.style.display = "none";
    });
    document.getElementById(selected).style.display = "block";
}
showTable10();
</script>
"""

# Now replace table HTML placeholders
for i in range(6):
    html = html.replace(f"TABLE{i}_HTML", tables[i])

HTML(html)


Title,Dampened Mean,Ratings Sum,Number of Ratings
Neurons to Nirvana (2013),5.0,5.0,1
Victor and the Secret of Crocodile Mansion (2012),5.0,5.0,1
The Green (2011),5.0,5.0,1
Into the Middle of Nowhere (2010),5.0,5.0,1
Inquire Within (2012),5.0,5.0,1
Freeheld (2007),5.0,5.0,1
Who Killed Vincent Chin? (1987),5.0,5.0,1
Marihuana (1936),5.0,5.0,1
The Encounter (2010),5.0,5.0,1
"Foster Brothers, The (Süt kardesler) (1976)",5.0,5.0,1

Title,Dampened Mean,Ratings Sum,Number of Ratings
Catastroika (2012),4.509,10.0,2
Consuming Kids: The Commercialization of Childhood (2008),4.509,10.0,2
"Octopus, The (Le poulpe) (1998)",4.506,14.5,3
"Shawshank Redemption, The (1994)",4.447,281788.0,63366
Zero Motivation (Efes beyahasei enosh) (2014),4.419,49.5,11
Echoes of the Rainbow (Sui yuet san tau) (2010),4.381,14.0,3
Plastic Bag (2009),4.381,14.0,3
Hellhounds on My Trail (1999),4.381,14.0,3
Deewaar (1975),4.381,14.0,3
All Passion Spent (1986),4.381,14.0,3

Title,Dampened Mean,Ratings Sum,Number of Ratings
"Shawshank Redemption, The (1994)",4.447,281788.0,63366
"Godfather, The (1972)",4.365,180503.5,41355
Zero Motivation (Efes beyahasei enosh) (2014),4.35,49.5,11
"Usual Suspects, The (1995)",4.334,203741.5,47006
"Octopus, The (Le poulpe) (1998)",4.31,14.5,3
Schindler's List (1993),4.31,215741.5,50054
"Godfather: Part II, The (1974)",4.276,117144.0,27398
Seven Samurai (Shichinin no samurai) (1954),4.274,49627.5,11611
Rear Window (1954),4.271,74530.5,17449
Band of Brothers (2001),4.263,18353.0,4305

Title,Dampened Mean,Ratings Sum,Number of Ratings
"Shawshank Redemption, The (1994)",4.447,281788.0,63366
"Godfather, The (1972)",4.365,180503.5,41355
"Usual Suspects, The (1995)",4.334,203741.5,47006
Schindler's List (1993),4.31,215741.5,50054
Zero Motivation (Efes beyahasei enosh) (2014),4.291,49.5,11
"Godfather: Part II, The (1974)",4.276,117144.0,27398
Seven Samurai (Shichinin no samurai) (1954),4.274,49627.5,11611
Rear Window (1954),4.271,74530.5,17449
Band of Brothers (2001),4.263,18353.0,4305
Casablanca (1942),4.258,103686.0,24349

Title,Dampened Mean,Ratings Sum,Number of Ratings
"Shawshank Redemption, The (1994)",4.447,281788.0,63366
"Godfather, The (1972)",4.365,180503.5,41355
"Usual Suspects, The (1995)",4.334,203741.5,47006
Schindler's List (1993),4.31,215741.5,50054
"Godfather: Part II, The (1974)",4.276,117144.0,27398
Seven Samurai (Shichinin no samurai) (1954),4.274,49627.5,11611
Rear Window (1954),4.271,74530.5,17449
Band of Brothers (2001),4.262,18353.0,4305
Casablanca (1942),4.258,103686.0,24349
Sunset Blvd. (a.k.a. Sunset Boulevard) (1950),4.256,27776.5,6525

Title,Dampened Mean,Ratings Sum,Number of Ratings
"Shawshank Redemption, The (1994)",4.447,281788.0,63366
"Godfather, The (1972)",4.365,180503.5,41355
"Usual Suspects, The (1995)",4.334,203741.5,47006
Schindler's List (1993),4.31,215741.5,50054
"Godfather: Part II, The (1974)",4.276,117144.0,27398
Seven Samurai (Shichinin no samurai) (1954),4.274,49627.5,11611
Rear Window (1954),4.271,74530.5,17449
Band of Brothers (2001),4.262,18353.0,4305
Casablanca (1942),4.258,103686.0,24349
Sunset Blvd. (a.k.a. Sunset Boulevard) (1950),4.256,27776.5,6525



It turns out that the top 10 list doesn't change as we increase $\lambda$ from 5 to 10. 

Finally, here's a visualization that has both the original (undampened) rating (blue) and the dampened rating (orange) with $\lambda = 5$, as well as the difference in the ratings (gray line). The visualization is only for a sample of 50 movies so it doesn't get too cluttered. You can see that the movies with fewer ratings are pulled more towards the overall mean, and those with more ratings have budged less. There's probably a good physics interpretation of what's going on here..  perhaps computing the center of mass of a plank with a weight on it where the plank's mass depends on $\lambda$ and the weight on top of it has a mass that depends on the number of ratings and is placed according to the movie's average rating. I don't know enough physics to truly formalize this connection..

<iframe src="Screenshots/interactive_movie_plot_2.html" width="100%" height="500"></iframe>

#### Further approaches and questions
Stay tuned for future posts on other recommender systems where I'll answer questions such as

- How does one select the hyperparameter $\lambda$?
- How can we measure the performance of the model?
- What about personalized recommender systems?