In [None]:
# Epic Games Store Giveaway Strategy  
### A Data-Driven Analysis of Platform Growth Through Free Games

This project shows how Epic uses free games as a loss-leader to build a long-term platform ecosystem rather than maximize per-title revenue

## Executive Summary

Since 2018, the Epic Games Store has given away over $14,500 worth of games.

Analysis of 600+ promotions reveals a library dominated by high-quality indie and mid-tier titles, with older AAA blockbusters appearing far less frequently and primarily around major promotional windows.

Rather than maximising revenue per title, Epic appears to optimise for platform lock-in, habitual engagement, and long-term ecosystem growth.


## Methodology

To analyse historical game giveaways, I developed a data pipeline designed to transform raw information into a structured, meaningful format. Because the data came from multiple sources, I focused on three key phases:

- Data Collection & Cleaning: I gathered giveaway records from various sources and standardised them to ensure consistency and accuracy.
- Data Enrichment: I didn't just collect names; I connected each entry to broader context, such as its publisher, game franchise, and original release date.
- Final Dataset: The result is a "semantic" dataset—one that doesn't just list facts, but understands the relationships between the games and the companies that made them.

### Data Acquisition & Multi-API Enrichment

- Epic Games Store API — Giveaway dates and titles  
- IGDB (Twitch API) — Critic ratings and release dates  
- Steam API — Publisher normalization  
- CheapShark API — Historical retail pricing  
- Wikidata (SPARQL) — Franchise and sequel relationships  

### Data Cleaning & Entity Resolution

A major challenge in unifying these records was title drift and the inconsistent use of special characters. For example, a single title might appear as:

- Star Wars™ Battlefront II

- Star Wars Battlefront II

To ensure these were treated as the same entity, I applied fuzzy string matching with a specific confidence threshold. This allowed the system to group variations together while still being strict enough to avoid "false matches". For matches that fell below a certain confidence score, I manually reviewed and corrected the links to ensure "false positives" didn't skew the final analysis.

In [None]:
from thefuzz import process

def resolve_title(query, candidates, threshold=85):
    """
    Generic fuzzy resolver used across Steam, IGDB, and CheapShark.
    Returns best match if confidence threshold is met.
    """
    best, score = process.extractOne(query, candidates)
    return best if score >= threshold else None

This ensured consistent joins across data sources while preserving data integrity.

### Feature Engineering

To move beyond basic metadata, I engineered features that capture value, timing, and potential marketing intent.

---

**Real Value (Inflation-Adjusted Price)**  
To compare giveaways across years fairly, nominal retail prices are converted to 2026 purchasing power using annual CPI multipliers (constants.py).

\[
Real\ Value = Price \times InflationMultiplier_{Year}
\]

Missing or non-numeric prices are coerced to 0.0 to avoid overstating total value.

---

**Giveaway Maturity (Game Age at Giveaway)**  
Age of a game at time of giveaway, measured as the number of days between original release date and giveaway start date.  
Used to distinguish back-catalog monetization from near-launch promotions.

\[
MaturityDays = GiveawayDate - OriginalReleaseDate
\]

---

**Strategic Hype Flag**  
To detect marketing-timed giveaways, I compute lead time between the giveaway start date and the next related sequel release:

\[
LeadTimeDays = SequelReleaseDate - GiveawayStartDate
\]

Titles with a strict **0–90 day** lead time are flagged as **Strategic Hype** candidates.  
Missing sequel data (standalone titles) is treated as non-hype.

---

**Franchise Membership**  
Binary flag indicating whether a title belongs to a known franchise or shared universe, inferred using a tiered approach (manual overrides, keyword heuristics, API metadata).

---

**Monthly Value Density**  
Total retail value of giveaways grouped by calendar month to capture seasonal deployment patterns.

---

**Inflation Bonus**  
Difference between inflation-adjusted total value and nominal total value, representing long-term purchasing power advantage created by early giveaways.

\[
InflationBonus = RealTotalValue - NominalTotalValue
\]

---

### Publisher Generosity Index

To rank publishers by generosity, I constructed a weighted index balancing:

- **Total Value (70%)** — cumulative retail value given away  
- **Cost Intensity (30%)** — average retail price per giveaway  

\[
Score = (0.7 \times \frac{TotalValue}{MaxTotalValue}) +
        (0.3 \times \frac{AvgUnitCost}{MaxAvgUnitCost})
\]

This prevents publishers from ranking highly by distributing large quantities of low-cost titles, while also acknowledging that giving away a $60 title represents a larger economic concession than a $10 title.
