# Visualisations - Romance Market Analysis

**Author** : Lucie Dou  
**Date** : Febuary 2025  
**Goal** : Create professional visualizations to illustrate the insights from the analysis.

---

## Content
1. Loading data
2. Style configuration
3. Chart 1: Distribution of subgenres
4. Chart 2: Average rate by subgenre
5. Chart 3 : Engagement by subgenre
6. Chart 4 : Quality vs Popularity
7. Chart 5 : Rate distribution (boxplot)

## 1. Loading data

We load the data enriched with sub-genre indicators.

In [15]:
import pandas as pd
import plotly.express as px

DATA_PATH = "../data/processed/romance_with_subgenres.csv"

df = pd.read_csv(
    DATA_PATH,
    sep=";",
    encoding="latin-1"
)

print(f"Dataset loaded: {len(df):,} books")
df.head()

Dataset loaded: 1,566 books


Unnamed: 0,Book Id,Title,Author,average_rating,isbn,isbn13,language_code,num_pages,ratings_count,text_reviews_count,publication_date,publisher,genres,is_contemporary,is_historical,is_paranormal,is_erotic,is_suspense,is_fantasy
0,57,A Changeling for All Seasons (Changeling Seaso...,Angela Knight/Sahara Kelly/Judy Mays/Marteeka ...,3.76,1595962808,"9,7816E+12",eng,304,167,4,11/1/2005,Changeling Press,"Romance;Fantasy,Paranormal;Anthologies;Adult F...",False,False,True,True,False,False
1,59,The Changeling Sea,"Patricia A, McKillip",4.06,141312629,"9,78014E+12",eng,137,4454,302,4/14/2003,Firebird,"Fantasy;Young Adult;Romance;Fiction;Fantasy,Ma...",False,False,False,False,False,False
2,66,The Changeling (Daughters of England #15),Philippa Carr,3.98,449146979,"9,78045E+12",eng,369,345,12,8/28/1990,Ivy Books,"Historical,Historical Fiction;Romance;Fiction;...",False,True,False,False,False,False
3,151,Anna Karenina,Leo Tolstoy/Richard Pevear/Larissa Volokhonsky,4.05,143035002,"9,78014E+12",eng,838,16643,1851,5/31/2004,Penguin Classics,"Classics;Fiction;Romance;Cultural,Russia;Histo...",False,False,False,False,False,False
4,152,Anna Karenina,Leo Tolstoy/David Magarshack/Priscilla Meyer,4.05,451528611,"9,78045E+12",eng,960,109420,5696,11/5/2002,Signet,"Classics;Fiction;Romance;Cultural,Russia;Histo...",False,False,False,False,False,False


# 2. Style configuration

In [16]:
px.defaults.template = "plotly_white"
px.defaults.width = 800
px.defaults.height = 450

# 3. Chart 1: Distribution of subgenres

**Goal** : to visualize the weight of each subgenre

*Which subgenres dominate the market?*

In [17]:
subgenre_cols = [col for col in df.columns if col.startswith("is_")]

subgenre_counts = {
    col.replace("is_", "").capitalize(): df[col].sum()
    for col in subgenre_cols
}

df_subgenre_dist = (
    pd.DataFrame.from_dict(
        subgenre_counts, orient="index", columns=["Books"]
    )
    .reset_index()
    .rename(columns={"index": "Subgenre"})
    .sort_values("Books", ascending=False)
)

fig = px.bar(
    df_subgenre_dist,
    x="Subgenre",
    y="Books",
    title="Distribution of Romance Subgenres",
    text="Books"
)

fig.update_traces(textposition="outside")
fig.show()

Contemporary Romance clearly dominates the dataset, confirming its position as the mass-market backbone of the romance genre. Other subgenres appear in significantly smaller proportions, suggesting more specialized or niche readerships.

# 4. Chart 2: Average rate by subgenre


In [18]:
avg_rating = []

for col in subgenre_cols:
    sub = col.replace("is_", "").capitalize()
    avg_rating.append({
        "Subgenre": sub,
        "Average rating": df.loc[df[col], "average_rating"].mean()
    })

df_avg_rating = (
    pd.DataFrame(avg_rating)
    .sort_values("Average rating", ascending=False)
)

fig = px.bar(
    df_avg_rating,
    x="Subgenre",
    y="Average rating",
    title="Average Rating by Subgenre",
    text=df_avg_rating["Average rating"].round(2)
)

fig.update_traces(textposition="outside")
fig.update_yaxes(range=[3.5, 4.1])
fig.show()

Average ratings remain relatively high across all subgenres, with Fantasy and Paranormal Romance standing out slightly. This indicates generally strong reader satisfaction, while also suggesting that niche subgenres tend to receive more consistently positive evaluations.

# 5. Chart 3 : Engagement by subgenre


In [19]:
avg_engagement = []

for col in subgenre_cols:
    sub = col.replace("is_", "").capitalize()
    avg_engagement.append({
        "Subgenre": sub,
        "Average engagement": df.loc[df[col], "ratings_count"].mean()
    })

df_avg_engagement = (
    pd.DataFrame(avg_engagement)
    .sort_values("Average engagement", ascending=False)
)

fig = px.bar(
    df_avg_engagement,
    x="Subgenre",
    y="Average engagement",
    title="Average Reader Engagement by Subgenre",
    text=df_avg_engagement["Average engagement"].round(0)
)

fig.update_traces(textposition="outside")
fig.show()

Paranormal Romance generates by far the highest reader engagement, despite not being the most published subgenre. This highlights a particularly active and invested audience, whereas more marginal subgenres show lower but still meaningful engagement levels.

# 6. Chart 4 : Quality vs Popularity


In [20]:
df_quality_pop = pd.merge(
    df_avg_rating,
    df_avg_engagement,
    on="Subgenre"
)

fig = px.scatter(
    df_quality_pop,
    x="Average engagement",
    y="Average rating",
    text="Subgenre",
    size="Average engagement",
    title="Quality vs Popularity by Subgenre"
)

fig.update_traces(textposition="top center")
fig.update_yaxes(range=[3.5, 4.1])
fig.show()

This comparison reveals a clear distinction between mass-market and niche subgenres. Paranormal Romance combines both high engagement and strong ratings, while Fantasy Romance appears as a high-quality niche with limited reach. Contemporary Romance, although highly popular, shows slightly lower average ratings.

# 7. Chart 5 : Rate distribution (boxplot)


In [21]:
df_box = []

for col in subgenre_cols:
    sub = col.replace("is_", "").capitalize()
    temp = df.loc[df[col], ["average_rating"]].copy()
    temp["Subgenre"] = sub
    df_box.append(temp)

df_boxplot = pd.concat(df_box)

fig = px.box(
    df_boxplot,
    x="Subgenre",
    y="average_rating",
    title="Rating Distribution by Subgenre"
)

fig.show()

Rating distributions show varying degrees of dispersion between subgenres. Some, such as Contemporary Romance, display wider variability in reader ratings, while others exhibit more concentrated distributions, suggesting more homogeneous reader expectations and reception.