<div style="
    min-height: 90vh;
    padding: 20px;
    border-radius: 15px;
    background: linear-gradient(135deg, #a6cee3, #e0f7fa);
">

---


# IMDB Analysis of IMDB TV Shows from 1990-2018, Their Ratings and Viewer Trends

---

## Authors: Noah and Will
### Date: Today




<div style="
    min-height: 90vh;
    padding: 20px;
    border-radius: 15px;
    background: linear-gradient(135deg, #d0ebf7, #f0fbfe);
">

# Abstract
---

This project explores **IMDb ratings and viewer trends** to better understand how audience preferences and genre popularity have evolved over time. By analyzing historical data across multiple genres, we aim to identify the most influential patterns shaping TV reception. Through data-driven visualizations and trend analysis, this study uncovers meaningful insights into shifting viewer behaviors, highlighting how cultural, temporal, and genre-based factors contribute to changes in film popularity and audience trends.

</div>


In [3]:
%matplotlib inline
import pandas as pd

df = pd.read_csv("IMDb_Economist_tv_ratings.csv")

# Tranform dates to datetime objects
df["date"] = pd.to_datetime(df["date"])

# extract the year from datetime object
df["year"] = df["date"].dt.year

# all genres of individual rows
df["genres"] = df["genres"].str.split(",")
df = df.explode("genres")

<div style="
    min-height: 90vh;
    padding: 20px;
    border-radius: 15px;
    background: linear-gradient(135deg, #d0ebf7, #f0fbfe);
">

### Top 5 Show Seasons From 1990-2018
    
<div style="display: flex; justify-content: center; gap: 10px;">
  <div style="text-align: center;">
    <img src="angel.jpg" alt="Touched by an Angel" width="200"><br>
    Touched by an Angel
  </div>
  <div style="text-align: center;">
    <img src="barabra.jpg" alt="Santa Barbara" width="200"><br>
    Santa Barbara
  </div>
  <div style="text-align: center;">
    <img src="law.jpg" alt="L.A. Law" width="200"><br>
    L.A. Law
  </div>
  <div style="text-align: center;">
    <img src="got.jpg" alt="Game of Thrones" width="200"><br>
    Game of Thrones
  </div>
  <div style="text-align: center;">
    <img src="fugitive.jpg" alt="Fugitive Chronicles" width="200"><br>
    Fugitive Chronicles
  </div>
</div>


In [4]:
# Group by show title and compute the average rating over all years
overall_perf = (
    df.groupby("title", as_index=False)
    .agg({
        "av_rating": "mean",   # average rating across all years
        "genres": "first",     # keep one genre (for labeling)
        "year": "min"          # or 'first' — earliest appearance year
    })
)

# Select the top 5 shows overall
top5_overall = overall_perf.nlargest(5, "av_rating")

print(top5_overall)

                       title  av_rating  genres  year
800      Touched by an Angel   9.600000   Drama  1998
576            Santa Barbara   9.400000   Drama  1990
379                 L.A. Law   9.350000   Drama  1990
270          Game of Thrones   9.265114  Action  2011
696  The Fugitive Chronicles   9.200000   Crime  2010


<div style="
    min-height: 90vh;
    padding: 20px;
    border-radius: 15px;
    background: linear-gradient(135deg, #d0ebf7, #f0fbfe);
">

### Best Performing Genres From 1990-2018
    
The following analysis identifies the **best-rated TV genre for each year** based on average IMDb ratings.  
To achieve this, the dataset was first expanded so that shows belonging to multiple genres were counted under each individual genre.  
The average rating (`av_rating`) was then calculated for each combination of year and genre.  
Finally, for every year, the single genre with the **highest average rating** was selected, resulting in a clean timeline of the top-performing genres across all years.

The resulting figure (see *Figure 1*) visualizes how the top genre changes over time, highlighting shifting audience preferences and trends in television content.  
For example, *[insert your observation here — e.g., “Drama dominates the early 2010s, while Documentary becomes more prominent after 2018.”]*  
This view provides a concise overview of how viewer tastes evolved according to IMDb data.


In [5]:
# TOP GENRE FROM EACH YEAR
genre_year_perf = (
    df.assign(genres=df['genres'].str.split(','))
    .explode('genres')
    .groupby(["year", "genres"])["av_rating"]
    .mean()
    .reset_index()
)

# Get the SINGLE top genre for each year
top_genre_per_year = (
    genre_year_perf
    .sort_values(["year", "av_rating"], ascending=[True, False])
    .groupby("year")
    .head(1)  # Only get the top 1 genre per year
    .reset_index(drop=True)
)
top_genre_per_year.head(38)

Unnamed: 0,year,genres,av_rating
0,1990,Romance,8.65
1,1991,Fantasy,8.7038
2,1992,Fantasy,8.6019
3,1993,Fantasy,9.2043
4,1994,Family,8.687825
5,1995,Mystery,8.505175
6,1996,Animation,8.33135
7,1997,Thriller,8.6195
8,1998,Family,8.783133
9,1999,Thriller,8.5393


<div style="
    min-height: 90vh;
    padding: 20px;
    border-radius: 15px;
    background: linear-gradient(135deg, #d0ebf7, #f0fbfe);
">
    
###  Mathematical Representation of Top Genre by Year

Let the following definitions hold:

$$
G = \{ g_1, g_2, \dots, g_m \} \quad \text{(set of all movie genres)}
$$

$$
Y = \{ y_1, y_2, \dots, y_n \} \quad \text{(set of all years)}
$$

$$
R_{i,j,k} \text{ — IMDb rating of the } k\text{th movie released in year } y_i \text{ belonging to genre } g_j
$$

$$
N_{i,j} \text{ — number of movies of genre } g_j \text{ in year } y_i
$$

The **average rating** for genre \( g_j \) in year \( y_i \) is:

$$
\bar{R}(y_i, g_j)
= \frac{1}{N_{i,j}} \sum_{k=1}^{N_{i,j}} R_{i,j,k}
$$

The **top genre** for each year is the one with the highest average rating:

$$
g^{*}(y_i)
= \operatorname*{arg\,max}_{g_j \in G} \bar{R}(y_i, g_j)
$$

Hence, the final result can be written as:

$$
T = \{ (y_i, g^{*}(y_i), \bar{R}(y_i, g^{*}(y_i))) \mid y_i \in Y \}
$$


<div style="
    min-height: 90vh;
    padding: 20px;
    border-radius: 15px;
    background: linear-gradient(135deg, #d0ebf7, #f0fbfe);
">

### Top 5 Highest Rated Shows Per Year
    
To better understand yearly trends in television ratings, we extracted the **top five highest-rated shows for each year** based on their average IMDb ratings.  
The dataset was first grouped by year, title, and genre to calculate the mean rating (`av_rating`) for each show.  
If a show appeared multiple times within the same year, its average rating was taken to ensure consistency.  
From there, only the **top five shows per year** were selected, giving a clear picture of which series performed best annually.

The results are visualized in *Figure 2*, which presents a bar chart of the top five shows for every year.  
Each bar represents one show, with colors corresponding to different years.  
The chart provides insight into how audience preferences evolved over time and which titles stood out in their respective years.  
 *[You can add an observation here, e.g., “Comedy and drama shows consistently appear in the top five, while newer genres gain visibility in later years.”]*  
This visualization makes it easy to identify standout titles and compare shifts in popularity across the years.


In [8]:
# Get top 5 rated per year
year_perf = df.groupby(["year", "title", "genres"])["av_rating"].mean().reset_index()

year_perf = (
    year_perf.groupby(['year', 'title'], as_index=False)
    .agg({
        'av_rating': 'mean',        # average rating if multiple entries per year
        'genres': 'first'           # keep one genre label
    })
)

# Select the top 5 shows per year based on average rating
top5 = year_perf.groupby('year', group_keys=False).apply(
    lambda g: g.nlargest(5, 'av_rating')
).reset_index(drop=True)

import plotly.express as px

top5['title_year'] = top5['year'].astype(str) + " - " + top5['title']

top5['year_str'] = top5['year'].astype(str)

fig = px.bar(
    top5,
    x = 'title_year',
    y='av_rating',
    title='Top 5 over the years',
    color='year_str',
)

fig.update_layout(
    xaxis_tickangle=-45,
    width=2500,
    height=500,
)

fig.show()
print(top5.size)

# | label: fig-top5-by-year
# | fig-cap: "Bar chart showing the five highest-rated television shows for each year. 
#   Ratings are based on IMDb averages, illustrating which series defined audience preferences annually."


870


<div style="
    min-height: 90vh;
    padding: 20px;
    border-radius: 15px;
    background: linear-gradient(135deg, #d0ebf7, #f0fbfe);
">

### Conclusion! 

---

By looking at the populatiry of shows and genres over time, we can see the how society's taste in shows changes over time. Analysis of television ratings from 1990 to 2018 reveals a clear evolution in audience preferences and genre popularity. Early 1990s audiences favored family-centered dramas and comedies, exemplified by Parenthood and Are You Afraid of the Dark?, both achieving exceptionally high ratings. Over time, viewer tastes shifted toward complex, morally ambiguous narratives, with critically acclaimed titles such as Breaking Bad (2013) and Game of Thrones (2016) dominating later years.

When evaluated across the entire dataset, the highest-rated genres were War, Sport, History, Fantasy, and Music, suggesting consistent appreciation for storytelling grounded in realism, human struggle, and historical or imaginative depth. The steadily rising ratings of Fantasy and Drama genres reflect growing cultural interest in escapism and intricate world-building, paralleling the rise of premium cable and streaming platforms that enabled higher production quality and serialized storytelling.

Overall, the data indicate that societal taste has evolved from valuing light-hearted or family-oriented narratives toward favoring intense, cinematic, and thematically rich genres. This trend mirrors broader social shifts—such as increased political awareness, technological optimism, and the desire for emotionally and intellectually engaging entertainment.