## Confidence Interval for Goodreads Ratings

In [10]:
# Confidence Interval for Goodreads Ratings
# Objective: Estimate the true average Goodreads rating for bestseller books, using our sample of 80 titles.

# Import required libraries
import pandas as pd
import numpy as np
from scipy import stats

# Load the dataset of 80 bestsellers
df = pd.read_csv("data/book_covers_goodreads_after_eda.csv")

# Calculate the mean and standard error of the Goodreads ratings 
mean_rating = np.mean(df['rating'])             
sem_rating = stats.sem(df['rating'])            

# Construct the 95% confidence interval for the average rating
confidence_interval = stats.t.interval(
    confidence=0.95,               # 95% confidence level
    df=len(df['rating']) - 1,      # Degrees of freedom
    loc=mean_rating,               # Sample mean
    scale=sem_rating               # Standard error
)

# Print the results
print("Mean Goodreads Rating (bestsellers):", round(mean_rating, 2))
print("95% Confidence Interval for Bestseller Ratings:", confidence_interval)

# Check if a rating of 4.0 is within the interval
benchmark = 4.0
is_in_interval = confidence_interval[0] <= benchmark <= confidence_interval[1]
print(f"Is {benchmark} within the confidence interval?", is_in_interval)

Mean Goodreads Rating (bestsellers): 4.01
95% Confidence Interval for Bestseller Ratings: (np.float64(3.94231367833146), np.float64(4.08293632166854))
Is 4.0 within the confidence interval? True


Confidence Interval Interpretation
The 95% confidence interval for the average Goodreads rating of our 80 curated bestsellers is approximately [3.94, 4.08]. Since the benchmark value of 4.0 falls within this range, we can be reasonably confident that the true average rating of similar bestsellers is close to 4.0.

This means that even though we only analyzed 80 books, we can generalize the result: if we repeated this process with other samples of bestsellers, in 95% of cases the average rating would fall within this interval.

Conclusion: Bestselling books tend to receive consistently solid ratings around 4.0, making it a reliable benchmark for perceived quality in this type of title.










## Confidence Intervals by Main Genre

In [11]:
# Confidence Intervals by Main Genre
# Objective: Compare average Goodreads ratings for the top 4 most common genres
# We'll calculate 95% confidence intervals for each genre separately

# List of genres to analyze
top_genres = ['Romance', 'Fantasy', 'Contemporary', 'Nonfiction']

# Loop through each genre and compute confidence interval
for genre in top_genres:
    
    # Filter dataset to include only books of the current genre
    genre_df = df[df['main_genre'] == genre]
    
    # Calculate mean and standard error of the rating for this genre
    mean_rating = np.mean(genre_df['rating'])
    sem_rating = stats.sem(genre_df['rating'])  # Standard error of the mean
    
    # Compute 95% confidence interval using t-distribution
    ci = stats.t.interval(
        confidence=0.95,                   # Confidence level
        df=len(genre_df['rating']) - 1,    # Degrees of freedom (n - 1)
        loc=mean_rating,                   # Sample mean
        scale=sem_rating                   # Standard error
    )
    
    # Print the result
    print(f"\n{genre} — Mean Rating: {round(mean_rating, 2)}")
    print(f"95% Confidence Interval: {ci}")


Romance — Mean Rating: 4.03
95% Confidence Interval: (np.float64(3.8778134652992824), np.float64(4.1885023241744))

Fantasy — Mean Rating: 4.05
95% Confidence Interval: (np.float64(3.816550291928412), np.float64(4.275214413953941))

Contemporary — Mean Rating: 3.85
95% Confidence Interval: (np.float64(3.7354894178840508), np.float64(3.965760582115949))

Nonfiction — Mean Rating: 4.12
95% Confidence Interval: (np.float64(4.0152468049515555), np.float64(4.227419861715111))


We computed 95% confidence intervals for the average Goodreads rating of the top 4 genres in our dataset: Romance, Fantasy, Contemporary, and Nonfiction.

Fantasy and Nonfiction show slightly higher mean ratings (around 4.05 and 4.12) than Romance (~4.03) and Contemporary (~3.85).

However, the confidence intervals overlap between genres like Romance and Fantasy, meaning the difference in rating is not statistically conclusive.

The only interval clearly lower is Contemporary, suggesting it may be rated slightly worse — but more data would be needed to confirm.

## Hypothesis 1: Books with a face-out display (Promoted = 1) tend to be more recently published than those shelved spine-out

In [13]:
# H0 (Null Hypothesis): There is no difference in publication year between promoted and non-promoted books
# H1 (Alternative Hypothesis): Promoted books tend to be more recently published

# Convert publication date to numeric year
df["published_year"] = pd.to_datetime(df["first_published_date"]).dt.year

# Create two groups: promoted books vs non-promoted
promoted_years = df[df["promoted"] == 1]["published_year"]
non_promoted_years = df[df["promoted"] == 0]["published_year"]

# Calculate the median publication year for each group
median_promoted = promoted_years.median()
median_non_promoted = non_promoted_years.median()

# Display the medians
print("Median publication year (Promoted books):", int(median_promoted))
print("Median publication year (Non-promoted books):", int(median_non_promoted))

# Perform Mann-Whitney U test (non-parametric test for medians)
u_stat, p_value = stats.mannwhitneyu(promoted_years, non_promoted_years, alternative='two-sided')

# Print the test results
print("U-statistic:", round(u_stat, 2))
print("P-value:", round(p_value, 4))

# Interpret the result
alpha = 0.05  # 95% confidence level
if p_value < alpha:
    print("We reject the null hypothesis: promoted books tend to be more recent.")
else:
    print("We fail to reject the null hypothesis: no significant difference in median publication year.")

Median publication year (Promoted books): 2023
Median publication year (Non-promoted books): 2023
U-statistic: 768.0
P-value: 0.8175
We fail to reject the null hypothesis: no significant difference in median publication year.


Although we expected that face-out displayed books might be more recent, the test showed no significant difference in median publication year between promoted and non-promoted books, both have a median year of 2023.
The p-value was 0.8175, much higher than the 0.05 threshold, so we fail to reject the null hypothesis. A p-value tells us how likely it is that the difference we see could be due to chance. Since our p-value is very high, we can’t confidently say that promoted books are more recent,they seem equally recent overall.

Why did we use the Mann-Whitney U test?
We used this test instead of a standard t-test because publication year is not normally distributed, and the median is a more robust metric in this case than the mean (some books are much older than others). Mann-Whitney is ideal for comparing medians between two groups when data may be skewed or contain outliers.

## Hipothesis 2: Do books with award / recognition tend to have shorter titles?

In [18]:
# H0 (Null Hypothesis): There is no difference in title length between books with and without award / recognition
# H1 (Alternative Hypothesis): Books with award / recognition have shorter titles (fewer words)

# Short and catchy titles may be preferred for books already boosted by an award or bestseller badge, the title doesn't need to work as hard to attract readers

# Create two groups based on award recognition
award_titles = df[df["award_recognition"] == 1]["title_word_count"]
no_award_titles = df[df["award_recognition"] == 0]["title_word_count"]

# Calculate medians
median_award = award_titles.median()
median_no_award = no_award_titles.median()

# Display medians
print("Median title word count (Award books):", median_award)
print("Median title word count (No award):", median_no_award)

# Mann-Whitney U test to compare medians
u_stat, p_value = stats.mannwhitneyu(award_titles, no_award_titles, alternative='less')

# Show results
print("U-statistic:", round(u_stat, 2))
print("P-value:", round(p_value, 4))

# Interpret result
alpha = 0.05
if p_value < alpha:
    print("We reject the null hypothesis: books with award recognition tend to have shorter titles.")
else:
    print("We fail to reject the null hypothesis: no significant difference in title length.")


Median title word count (Award books): 3.0
Median title word count (No award): 2.5
U-statistic: 723.5
P-value: 0.396
We fail to reject the null hypothesis: no significant difference in title length.


We tested whether books with award / recognition tend to have shorter titles. While my hypothesis made intuitive sense from a design and marketing perspective, the data did not support it. The median title length for award-recognized books (3.0 words) was actually slightly higher than those without recognition (2.5 words), and the p-value from the Mann-Whitney U test (0.396) shows no statistically significant difference.
We fail to reject the null hypothesis: books with or without awards have similar title lengths.

We used the Mann-Whitney U test again because we were comparing medians from two independent groups, and the title word count is a non-normally distributed variable.

## Hypothesis 3: Does visual style depend on book genre?

In [19]:
# H0 (Null Hypothesis): Visual style is independent of main genre
# H1 (Alternative Hypothesis): Visual style and genre are associated, certain genres use certain styles more often

# Create a contingency table: rows = genres, columns = visual styles
contingency_table = pd.crosstab(df['main_genre'], df['visual_style'])

# Display the table to check
print("Contingency Table:")
print(contingency_table)

# Perform Chi-squared test of independence
chi2_stat, p_value, dof, expected = stats.chi2_contingency(contingency_table)

# Show results
print("\nChi-squared statistic:", round(chi2_stat, 2))
print("Degrees of freedom:", dof)
print("P-value:", round(p_value, 4))

# Interpret the result
alpha = 0.05
if p_value < alpha:
    print("We reject the null hypothesis: visual style depends on the book's genre.")
else:
    print("We fail to reject the null hypothesis: no significant association between visual style and genre.")

Contingency Table:
visual_style     illustration  photo  symbolic  typographic
main_genre                                                 
Classics                    2      1         0            0
Contemporary                9      3         4            0
Fantasy                    16      0         1            0
Nonfiction                  4      1         2            8
Other                       1      0         0            0
Romance                    16      1         2            0
Science Fiction             1      0         1            1
Thriller                    5      0         1            0

Chi-squared statistic: 50.71
Degrees of freedom: 21
P-value: 0.0003
We reject the null hypothesis: visual style depends on the book's genre.


We used a Chi-squared test of independence because both variables — main_genre and visual_style — are categorical. This test checks whether the distribution of visual styles differs depending on the genre.

With a p-value of 0.0003 (below our 0.05 threshold), we reject the null hypothesis.
This means that visual style is not random across genres — it depends on the book’s genre.

For example, Fantasy and Romance heavily favor illustrated covers, while Nonfiction is more likely to use typographic or photographic styles. This confirms patterns already seen in our EDA, now supported by statistical evidence.