# Project 1: Analyzing Average Music Album Ratings from Album of the Year (AOTY)

**Dataset:** Aggregated Music Albums Review (Kaggle)  
**Link:** https://www.kaggle.com/datasets/kauvinlucas/30000-albums-aggregated-review-ratings

This project uses Album of the Year (AOTY) data to calculate average music album ratings. AOTY aggregates professional reviews from various media outlets and produces a single AOTY Score, which represents the weighted average rating from multiple reputable sources. The dataset consists of approximately 30,000 music albums released between 1940 and October 2020 and was obtained from Kaggle.


# 1. Import Pandas and Dataset
As a first step, it is essential to import the dataset into a DataFrame. The data used for the analysis should be stored in a designated folder so that it can be easily accessed and referenced in the future.

In [6]:
# Import pandas and load the dataset
import pandas as pd

df = pd.read_csv("album_ratings.csv")
df.head(10)


Unnamed: 0,Artist,Title,Release Month,Release Day,Release Year,Format,Label,Genre,Metacritic Critic Score,Metacritic Reviews,Metacritic User Score,Metacritic User Reviews,AOTY Critic Score,AOTY Critic Reviews,AOTY User Score,AOTY User Reviews
0,Neko Case,Middle Cyclone,March,3,2009,LP,ANTI-,Alt-Country,79.0,31.0,8.7,31.0,79,25,78,55
1,Jason Isbell & The 400 Unit,Jason Isbell & The 400 Unit,February,17,2009,LP,Thirty Tigers,Country Rock,70.0,14.0,8.4,7.0,73,11,73,8
2,Animal Collective,Merriweather Post Pavilion,January,20,2009,LP,Domino,Psychedelic Pop,89.0,36.0,8.5,619.0,92,30,87,1335
3,Bruce Springsteen,Working on a Dream,January,27,2009,LP,Columbia Records,Rock,72.0,29.0,7.9,101.0,70,23,66,38
4,Andrew Bird,Noble Beast,January,20,2009,LP,Fat Possum,Singer-Songwriter,79.0,29.0,8.7,47.0,74,24,78,44
5,Bon Iver,Blood Bank EP,January,20,2009,EP,Jagjaguwar,Indie Folk,72.0,15.0,8.4,71.0,74,14,80,241
6,M. Ward,Hold Time,February,17,2009,LP,Merge Records,Folk,79.0,31.0,8.8,23.0,76,24,76,22
7,Lily Allen,"It's Not Me, It's You",February,10,2009,LP,EMI/Parlophone,Pop,71.0,32.0,8.1,155.0,68,29,79,209
8,Morrissey,Years of Refusal,February,17,2009,LP,Attack/Lost Highway,Alternative Rock,79.0,32.0,8.0,37.0,76,26,74,45
9,The Decemberists,The Hazards of Love,March,24,2009,LP,Capitol Records,Indie Rock,73.0,31.0,8.5,98.0,73,29,76,66


# 2. Calculating the Data (Mean, median, and mode)

After importing the data and conducting preliminary checks, the next step is to compute the mean, median, and mode of Album of the Year (AOTY) critic score.

In [136]:
# Calculate mean, median, and mode using pandas
mean_score   = df["AOTY Critic Score"].mean()
median_score = df["AOTY Critic Score"].median()
mode_score   = df["AOTY Critic Score"].mode().iloc[0]

# Display the result
print("\033[1mMean, Median, and Mode of Album of the Year Critic Scores:\033[0m")
print(f"Mean:   {mean_score:.2f}")
print(f"Median: {median_score:.2f}")
print(f"Mode:   {mode_score:.2f}")


[1mMean, Median, and Mode of Album of the Year Critic Scores:[0m
Mean:   72.81
Median: 74.00
Mode:   80.00


# 3. Data Visualization


Among the data, the number of albums that received a 100 AOTY critic score is quite large. However, many of these albums are not reviewed by a sufficient number of critics or media outlets, meaning the subjectivity of each score is high because only a few critics (or sometimes only one) rated them. To minimize this subjectivity issue, it is more suitable to rank albums that received a 100 score according to how many AOTY critic reviews they have (each critic review represents one credible critic or media outlet).

In [9]:
print("Top 10 Highest Rated Albums with the Most AOTY Critic Reviews")
print("=" * 101)

# Filter: only albums with both score and reviews
filtered = df.dropna(subset=["AOTY Critic Score", "AOTY Critic Reviews"])

# Sort by score first, then by number of reviews
top_albums = (
    filtered.sort_values(["AOTY Critic Score", "AOTY Critic Reviews"], ascending=[False, False])
            .head(10)[["Title", "Artist", "AOTY Critic Score", "AOTY Critic Reviews"]]
)

# Visualization
for _, r in top_albums.iterrows():
    score = round(r["AOTY Critic Score"], 1)
    reviews = int(r["AOTY Critic Reviews"])
    bar = "ðŸ”µ" * (reviews)
    print(f'{r["Title"][:25]:25} by {r["Artist"][:18]:18} {bar:<25} (Score: {score}, {reviews} reviews)')


Top 10 Highest Rated Albums with the Most AOTY Critic Reviews
Revolver                  by The Beatles        ðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µ                   (Score: 100, 7 reviews)
Sign "O" the Times        by Prince             ðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µ                   (Score: 100, 7 reviews)
Rubber Soul               by The Beatles        ðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µ                    (Score: 100, 6 reviews)
The Dark Side of the Moon by Pink Floyd         ðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µ                    (Score: 100, 6 reviews)
Automatic for the People  by R.E.M.             ðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µ                    (Score: 100, 6 reviews)
Blood on the Tracks       by Bob Dylan          ðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µ                    (Score: 100, 6 reviews)
Abbey Road                by The Beatles        ðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µ                     (Score: 100, 5 reviews)
Doolittle                 by Pixies             ðŸ”µðŸ”µðŸ”µðŸ”µðŸ”µ                     (Score: 100, 5 reviews)
Spiderland

Music review is a very subjective matter, so the fewer reviewers an album has, the more subjective its score is. Therefore, when examining albums with lower AOTY scores, filtering for albums with at least three reviewers can help reduce this subjectivity (Of course, there is no fixed standard for how many critics are needed to make a review less subjective). This filter is applied because there are several albums that received very low scores with only one reviewer.

In [10]:
print("Top 10 Lowest Rated Albums with at Least 3 AOTY Critic Reviews")
print("=" * 101)

# Convert to numeric
df["AOTY Critic Score"] = pd.to_numeric(df["AOTY Critic Score"], errors="coerce")
df["AOTY Critic Reviews"] = pd.to_numeric(df["AOTY Critic Reviews"], errors="coerce")

# Keep only albums that have at least 3 reviews
filtered = filtered[filtered["AOTY Critic Reviews"] >= 3]

# Sort by lowest score, then by highest number of reviews
low_albums = (
    filtered.sort_values(["AOTY Critic Score", "AOTY Critic Reviews"], ascending=[True, False])
            .head(10)[["Title", "Artist", "AOTY Critic Score", "AOTY Critic Reviews"]]
)

# Visualization (1 circle per review)
for _, r in low_albums.iterrows():
    score = round(r["AOTY Critic Score"], 1)
    reviews = int(r["AOTY Critic Reviews"])
    bar = "ðŸ”´" * reviews
    print(f'{r["Title"][:25]:25} by {r["Artist"][:18]:18} {bar:<25} (Score: {score}, {reviews} reviews)')


Top 10 Lowest Rated Albums with at Least 3 AOTY Critic Reviews
Playing with Fire         by Kevin Federline    ðŸ”´ðŸ”´ðŸ”´ðŸ”´ðŸ”´                     (Score: 18, 5 reviews)
Peter Criss               by Peter Criss        ðŸ”´ðŸ”´ðŸ”´                       (Score: 20, 3 reviews)
Dylan                     by Bob Dylan          ðŸ”´ðŸ”´ðŸ”´ðŸ”´                      (Score: 22, 4 reviews)
L.A. (Light Album)        by The Beach Boys     ðŸ”´ðŸ”´ðŸ”´                       (Score: 28, 3 reviews)
The Beach Boys            by The Beach Boys     ðŸ”´ðŸ”´ðŸ”´                       (Score: 28, 3 reviews)
Streets in the Sky        by The Enemy          ðŸ”´ðŸ”´ðŸ”´ðŸ”´ðŸ”´ðŸ”´                    (Score: 29, 6 reviews)
The Female Boss           by Tulisa             ðŸ”´ðŸ”´ðŸ”´ðŸ”´ðŸ”´                     (Score: 30, 5 reviews)
Transistor                by 311                ðŸ”´ðŸ”´ðŸ”´ðŸ”´                      (Score: 30, 4 reviews)
Keepin' the Summer Alive  by The Beach Boys     ðŸ”´ðŸ”´ðŸ”´  

Since the number of reviews is also essential, there are some albums in the data that have a high number of reviews. Not only does this reduce subjectivity, but it also indicates that these albums gained considerable attention from critics and the media.

In [12]:
print("Top 10 Albums with the Most AOTY Critic Reviews")
print("=" * 105)

# Convert to numeric and select top 10 by number of reviews
top = (
    df.assign(**{
        "AOTY Critic Reviews": pd.to_numeric(df["AOTY Critic Reviews"], errors="coerce"),
        "AOTY Critic Score": pd.to_numeric(df["AOTY Critic Score"], errors="coerce")
    })
    .dropna(subset=["AOTY Critic Reviews", "AOTY Critic Score"])
    .nlargest(10, "AOTY Critic Reviews")[["Title", "Artist", "AOTY Critic Reviews", "AOTY Critic Score"]]
)

# Visualization (1 circle per 2 reviews)
for _, r in top.iterrows():
    reviews = int(r["AOTY Critic Reviews"])
    score = round(r["AOTY Critic Score"], 1)
    bar = "âšª" * (reviews // 4)
    print(f'{r["Title"][:25]:25} by {r["Artist"][:18]:18} {bar:<25} ({reviews} reviews, Score: {score})')


Top 10 Albums with the Most AOTY Critic Reviews
Reflektor                 by Arcade Fire        âšªâšªâšªâšªâšªâšªâšªâšªâšªâšªâšªâšª              (48 reviews, Score: 78)
Trouble Will Find Me      by The National       âšªâšªâšªâšªâšªâšªâšªâšªâšªâšªâšª               (44 reviews, Score: 84)
Random Access Memories    by Daft Punk          âšªâšªâšªâšªâšªâšªâšªâšªâšªâšªâšª               (44 reviews, Score: 83)
A Moon Shaped Pool        by Radiohead          âšªâšªâšªâšªâšªâšªâšªâšªâšªâšªâšª               (44 reviews, Score: 87)
LP1                       by FKA twigs          âšªâšªâšªâšªâšªâšªâšªâšªâšªâšª                (43 reviews, Score: 86)
Everything Now            by Arcade Fire        âšªâšªâšªâšªâšªâšªâšªâšªâšªâšª                (43 reviews, Score: 66)
American Dream            by LCD Soundsystem    âšªâšªâšªâšªâšªâšªâšªâšªâšªâšª                (43 reviews, Score: 85)
Coexist                   by The xx             âšªâšªâšªâšªâšªâšªâšªâšªâšªâšª                (42 reviews, Score: 78

# 4. The Hardway

In [140]:
# Load the data
import csv

data = []

# Open the data and read critic scores manually
with open("album_ratings.csv", newline="", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        try:
            score = float(row["AOTY Critic Score"])
            data.append(score)
        except (ValueError, KeyError):
            continue

# Calculate mean
mean_value = sum(data) / len(data)
mean_value = round(mean_value, 2)

# Calculate median
data.sort()
n = len(data)
if n % 2 == 0:
    median_value = (data[n // 2 - 1] + data[n // 2]) / 2
else:
    median_value = data[n // 2]
median_value = round(median_value, 2)

# Calculate mode
counts = {}
for num in data:
    counts[num] = counts.get(num, 0) + 1
max_count = max(counts.values())
modes = [k for k, v in counts.items() if v == max_count]
modes = [round(mode, 2) for mode in modes]

# Print results
print("Mean AOTY Critic Score:", mean_value)
print("Median AOTY Critic Score:", median_value)
if len(modes) == 1:
    print(f"Mode AOTY Critic Score: {modes[0]}")
else:
    print(f"Modes AOTY Critic Score: {', '.join(map(str, modes))}")


Mean AOTY Critic Score: 72.81
Median AOTY Critic Score: 74.0
Mode AOTY Critic Score: 80.0
