# Museum KPI: Income and Visitor Trends Visualisation

This notebook explores the key performance indicators (KPIs) for UK government sponsored museums, focusing on the relationship between annual income and total visitor numbers.

The aim is to produce a combined visualisation that shows:

- **Annual income for all museums combined**, displayed as a **stacked bar chart** split by income type  
  (Admissions, Fundraising, and Trading Income)
- **Total annual visitors**, displayed as a **line graph** overlaid on the same chart

The dataset used here is the cleaned KPI dataset prepared earlier:

- `kpis/data/kpi_income_clean.csv`
- `kpis/data/kpi_visitors_clean.csv`

These files contain income totals and visitor counts aggregated by financial year.  
The visualisation will help reveal long term trends, the impact of external events (such as Covid 19 closures), and the relationship between income patterns and changes in visitor behaviour.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker


income_df = pd.read_csv("../data/kpi_income_clean.csv")
visitors_df = pd.read_csv("../data/kpi_visitors_clean.csv")

# Convert numeric fields back to integers for cleaner display
income_df["Income"] = income_df["Income"].astype("Int64")
visitors_df["VisitorCount"] = visitors_df["VisitorCount"].astype("Int64")


### Step 1: Inspecting the Clean Datasets

Before creating any visualisations, the first step is to inspect the cleaned datasets to confirm they have loaded correctly and that all expected columns and datatypes are present. This quick check ensures there are no missing values, formatting issues, or inconsistencies before aggregation and plotting.


In [None]:
print("Income dataset preview:")
display(income_df.head())
print(income_df.info(), "\n")

print("Visitors dataset preview:")
display(visitors_df.head())
print(visitors_df.info())


### Step 2: Preparing Data for Visualisation

Before creating the charts, the datasets need to be aggregated to show total values per year across all museums.

For the **income data**, the total income will be grouped by `Year` and `IncomeType` so that each income category (Admissions, Trading, Fundraising) can be displayed in a stacked bar chart.

For the **visitor data**, only the `Total` visitor type will be used, grouped by `Year` to calculate the total visitor numbers across all museums. This will later be plotted as a line chart on a secondary axis.


In [None]:
# --- Clean Year labels so all years match ---
import re

def clean_year(y):
    # Remove footnotes like " [b]" or " [c]"
    return re.sub(r"\s*\[.*?\]", "", str(y)).strip()

income_df["Year"] = income_df["Year"].apply(clean_year)
visitors_df["Year"] = visitors_df["Year"].apply(clean_year)


In [None]:
# --- Extract national visitor totals ---

# Keep only rows where this museum’s entry is a total for the year
visitors_totals_only = visitors_df[
    visitors_df["VisitorType"].astype(str).str.strip().str.lower() == "total"
].copy()

# Group by Year and sum across museums
visitors_year_totals = (
    visitors_totals_only
    .groupby("Year", as_index=False)["VisitorCount"]
    .sum()
)

# Sort years properly
visitors_year_totals = visitors_year_totals.sort_values(
    by="Year", key=lambda x: x.str[:4].astype(int)
).reset_index(drop=True)

display(visitors_year_totals)



### Step 3: Creating the Combined KPI Visualisation

This step produces a combined chart that visualises both total income and total visitor numbers over time.  

- The **stacked bar chart** shows the total annual income across all museums, broken down by income type:  
  *Admissions*, *Trading*, and *Fundraising*.  
- The **line graph** (on a secondary axis) shows the total number of visitors per year.  

A consistent colour palette is applied across all visuals to ensure clarity and alignment with the project’s visual style.


In [None]:
# --- Prepare income data for stacked bar chart and align with visitor years ---

# 1. Sum income across all museums by Year and IncomeType
income_grouped = (
    income_df
    .groupby(["Year", "IncomeType"], as_index=False)["Income"]
    .sum()
)

# 2. Pivot into stacked format: rows = Year, columns = IncomeType
income_pivot = (
    income_grouped
    .pivot(index="Year", columns="IncomeType", values="Income")
    .fillna(0)
)

# 3. Use the years detected in visitors_year_totals as the master list
all_years = visitors_year_totals["Year"].tolist()

# 4. Reindex income_pivot to match the visitor year order
income_pivot = income_pivot.reindex(all_years).fillna(0)

# 5. Display previews for verification
print("Income Pivot Table:")
display(income_pivot.head())

print("\nAll Years in Use:", all_years)


In [None]:
# --- Combined KPI Visualisation: Total Income and Visitor Numbers ---

# Define chart colours
income_colours = ["#2f4b7c", "#665191", "#a05195"]  # bar colours for income types
visitor_line_colour = "#ff7c43"                     # line colour for visitor totals

# Create figure with twin axes
fig, ax1 = plt.subplots(figsize=(12, 6))
ax2 = ax1.twinx()

# Plot stacked bar chart for income
income_pivot.plot(
    kind="bar",
    stacked=True,
    ax=ax1,
    color=income_colours,
    width=0.8,
    legend=False
)

# Plot line chart for total visitors
ax2.plot(
    all_years,
    visitors_year_totals["VisitorCount"],
    color=visitor_line_colour,
    marker="o",
    linewidth=2.5,
    label="Total Visitors"
)

# Axis labels and title
ax1.set_xlabel("Year")
ax1.set_ylabel("Total Income (£)", color="black")
ax2.set_ylabel("Total Visitors", color="black")

ax1.set_title(
    "Total Income and Visitor Numbers Across All Museums",
    fontsize=14,
    weight="bold",
    pad=20
)

# Format both y-axes to show values in millions
ax1.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"{x/1e6:.0f}M"))
ax2.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"{x/1e6:.0f}M"))

# Add legend to the right of the chart
income_handles, income_labels = ax1.get_legend_handles_labels()
visitor_handles, visitor_labels = ax2.get_legend_handles_labels()

ax2.legend(
    handles=income_handles + visitor_handles,
    labels=income_labels + visitor_labels,
    title="KPI Type",
    loc="upper left",
    bbox_to_anchor=(1.15, 1),
    frameon=False
)

# Final formatting
plt.xticks(rotation=45)
plt.tight_layout(rect=[0, 0, 0.8, 1])
plt.show()


In [None]:
import os

# Correct relative path from the notebook folder to the visualisations folder
output_dir = "../visualisations"

# Create folder if needed
os.makedirs(output_dir, exist_ok=True)

# Build full file path
output_path = os.path.join(output_dir, "total_income_and_visitors.png")

# Save the figure
fig.savefig('../visualisations/total_income_and_visitors.png', dpi=300, bbox_inches="tight")

#print("Graph exported to:", os.path.abspath(output_path))
