### (a) Discussion
A normal distribution is unbounded on both sides, which conflicts with the fact that song durations can never be negative and often cluster within a certain range. Furthermore, song durations tend to exhibit some degree of right skew, whereas the normal distribution assumes symmetry around the mean, suggesting the normal model may not be the most suitable for this feature.

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import norm



In [2]:
df = pd.read_csv("ds4420_spotify.csv") 
durations = df["duration_s"]         

# MLEs for Normal(µ, σ)
mu_hat = durations.mean()
sigma_hat = durations.std(ddof=1) 

print(f"Estimated mu (seconds):    {mu_hat:.2f}")
print(f"Estimated sigma (seconds): {sigma_hat:.2f}")

# Calculate probabilities
# (a) Probability duration > 4 minutes (240 seconds)
p_longer_than_4 = 1 - norm.cdf(240, loc=mu_hat, scale=sigma_hat)

# (b) Probability duration is between 2 (120s) and 5 (300s) minutes
p_between_2_5 = norm.cdf(300, loc=mu_hat, scale=sigma_hat) - norm.cdf(120, loc=mu_hat, scale=sigma_hat)

# (c) Probability duration < 1 minute (60 seconds)
p_less_than_1 = norm.cdf(60, loc=mu_hat, scale=sigma_hat)

print(f"P(X > 4 min)         = {p_longer_than_4:.4f}")
print(f"P(2 min < X < 5 min) = {p_between_2_5:.4f}")
print(f"P(X < 1 min)         = {p_less_than_1:.4f}")

Estimated mu (seconds):    233.83
Estimated sigma (seconds): 87.81
P(X > 4 min)         = 0.4720
P(2 min < X < 5 min) = 0.6770
P(X < 1 min)         = 0.0239
