### (a) Discussion
An exponential distribution generally assumes a “memoryless” property (i.e., the probability of continuing does not depend on how long something has lasted), which does not align with how songs are typically structured. In reality, song durations often cluster around three to four minutes, have both practical minimum lengths (e.g., intros, refrain) and typical maximum lengths, so an exponential model is unlikely to accurately capture this distribution.

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import expon



In [2]:
df = pd.read_csv("ds4420_spotify.csv") 
durations = df["duration_s"]          

# MLE of θ 
theta_hat = durations.mean()
print(f"Estimated θ (seconds): {theta_hat:.2f}")

# Compute probabilities
# (a) Probability the duration is longer than 4 minutes (240 seconds)
p_longer_4min = 1 - expon.cdf(240, scale=theta_hat)

# (b) Probability the duration is between 2 minutes (120s) and 5 minutes (300s)
p_between_2_5min = expon.cdf(300, scale=theta_hat) - expon.cdf(120, scale=theta_hat)

# (c) Probability the duration is less than 1 minute (60 seconds)
p_less_1min = expon.cdf(60, scale=theta_hat)

print(f"P(X > 4 min):          {p_longer_4min:.4f}")
print(f"P(2 min < X < 5 min):  {p_between_2_5min:.4f}")
print(f"P(X < 1 min):          {p_less_1min:.4f}")

Estimated θ (seconds): 233.83
P(X > 4 min):          0.3583
P(2 min < X < 5 min):  0.3214
P(X < 1 min):          0.2263
