In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Finding the Ultimate Survivor Winner\n",
    "\n",
    "This analysis examines data from the reality TV show Survivor to identify patterns and characteristics that distinguish winners from other contestants. Using gameplay statistics, demographic data, and strategic metrics, we aim to uncover what makes the \"ultimate\" Survivor winner.\n",
    "\n",
    "## Research Questions:\n",
    "- What gameplay strategies are most effective for winning?\n",
    "- How do winner characteristics vary across different eras of the show?\n",
    "- What demographic factors correlate with success?\n",
    "- Is there a definitive \"winning formula\" or is success highly variable?\n",
    "\n",
    "## Methodology:\n",
    "The analysis uses contestant data including voting records, challenge performance, advantage usage, and demographic information. Players are categorized by play style (Physical, Strategic, Social, Advantage-Heavy) and game statistics are normalized per tribal council to account for varying game lengths.\n",
    "\n",
    "## Key Metrics:\n",
    "- **Votes Received per Tribal:** Measures how much of a target a player was\n",
    "- **Correct Vote Rate:** Strategic awareness and information gathering\n",
    "- **Individual Immunity Rate:** Physical capability and clutch performance\n",
    "- **Advantage Usage:** Modern game adaptation and resource management\n",
    "- **Tribal Attendance:** Game longevity and deep run capability\n",
    "\n",
    "The analysis reveals both consistent patterns among winners and significant variance that demonstrates Survivor's unpredictable nature."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Import and Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import libraries\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import statsmodels.api as sm\n",
    "from scipy.stats import ttest_ind\n",
    "from job_categories import job_categories\n",
    "\n",
    "print(\"FINDING THE ULTIMATE SURVIVOR WINNER\")\n",
    "print(\"=\" * 50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Loading and Initial Exploration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath = r\"C:\\Users\\becca\\OneDrive\\Desktop\\Portfolio\\Survivor\\Voting Stats Plus.csv\"\n",
    "print(\"\\nInitial Data Exploration\\n\")\n",
    "\n",
    "# Read in CSV\n",
    "survivors = pd.read_csv(filepath, encoding='latin-1')\n",
    "\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.set_option('display.width', None)\n",
    "pd.set_option('display.max_rows', None)\n",
    "\n",
    "# Drop unnamed index column if it exists\n",
    "if 'Unnamed: 0' in survivors.columns:\n",
    "    survivors = survivors.drop('Unnamed: 0', axis=1)\n",
    "\n",
    "print(f\"Dimensions: {survivors.shape}\")\n",
    "print(f\"\\nFirst 5 rows:\")\n",
    "print(survivors.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"\\nColumn names: {survivors.columns.tolist()}\")\n",
    "print(f\"\\nData types:\\n{survivors.dtypes}\")\n",
    "print(f\"\\nNull values:\\n{survivors.isnull().sum()}\")\n",
    "print(f\"\\nUnique values:\\n{survivors.nunique()}\")\n",
    "print(f\"\\nNumerical description:\\n{survivors.describe()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Preprocessing and Feature Engineering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add job type category\n",
    "job_to_category = {}\n",
    "for category, jobs in job_categories.items():\n",
    "    for job in jobs:\n",
    "        job_to_category[job] = category\n",
    "\n",
    "survivors['jobcategory'] = survivors['occupation'].map(job_to_category)\n",
    "\n",
    "# Add grouped ages to df\n",
    "def group_ages(row):\n",
    "    if row['age'] < 20:\n",
    "        return 'Teens'\n",
    "    elif row['age'] < 30:\n",
    "        return '20s'\n",
    "    elif row['age'] < 40:\n",
    "        return '30s'\n",
    "    elif row['age'] < 50:\n",
    "        return '40s'\n",
    "    else:\n",
    "        return '50+'\n",
    "\n",
    "survivors['agegroup'] = survivors.apply(group_ages, axis=1)\n",
    "\n",
    "print(\"Added job categories and age groups\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add column for play style\n",
    "def style_of_play(row):\n",
    "    if row['individualimmunites'] >= 3 or row['tribeimmunities'] >= 4:\n",
    "        return 'Physical'\n",
    "    elif row['votesnegated'] >= 1 or row['advantagesplayed'] >= 1:\n",
    "        return 'Advantage-Heavy'\n",
    "    elif row['votescast'] >= 6 or (row['correctlyvoted'] / (row['votescast'] + 1)) > 0.65:\n",
    "        return 'Strategic'\n",
    "    else:\n",
    "        return 'Social'\n",
    "\n",
    "survivors['style'] = survivors.apply(style_of_play, axis=1)\n",
    "\n",
    "print(\"Added play style classifications\")\n",
    "print(survivors['style'].value_counts())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Map season names to numbers\n",
    "season_map = {'Borneo': 1, 'Australian Outback': 2, 'Africa': 3, 'Marquesas': 4, 'Thailand': 5,\n",
    "    'The Amazon': 6, 'Pearl Islands': 7, 'All-Stars': 8, 'Vanuatu': 9, 'Palau': 10, 'Guatemala': 11,\n",
    "    'Panama': 12, 'Cook Islands': 13, 'Fiji': 14, 'China': 15, 'Micronesia': 16, 'Gabon': 17,\n",
    "    'Tocantins': 18, 'Samoa': 19, 'Heroes vs. Villains': 20, 'Nicaragua': 21, 'Redemption Island': 22,\n",
    "    'South Pacific': 23, 'One World': 24, 'Philippines': 25, 'Caramoan': 26, 'Blood vs. Water': 27,\n",
    "    'Cagayan': 28, 'San Juan del Sur': 29, 'Worlds Apart': 30, 'Cambodia': 31, 'Kaôh R?ng': 32, 'Millenials vs. Gen X': 33,\n",
    "    'Game Changers': 34, 'Heroes vs. Healers vs. Hustlers': 35, 'Ghost Island': 36, 'David vs. Goliath': 37,\n",
    "    'Edge of Extinction': 38, 'Island of the Idols': 39, 'Winners at War': 40, 'Survivor 41': 41,\n",
    "    'Survivor 42': 42, 'Survivor 43': 43, 'Survivor 44': 44, 'Survivor 45': 45, 'Survivor 46': 46,\n",
    "    'Survivor 47': 47, 'Survivor 48': 48\n",
    "}\n",
    "\n",
    "survivors['seasonnum'] = survivors['seasonplayed'].map(season_map)\n",
    "\n",
    "# Classify seasons by eras\n",
    "def classify_era(season):\n",
    "    if season <= 11:\n",
    "        return 'Old School'\n",
    "    elif season <= 26:\n",
    "        return 'Dark'\n",
    "    elif season <= 40:\n",
    "        return 'Advantage'\n",
    "    else:\n",
    "        return 'New'\n",
    "\n",
    "survivors['era'] = survivors['seasonnum'].apply(classify_era)\n",
    "\n",
    "print(\"Added season numbers and era classifications\")\n",
    "print(survivors['era'].value_counts())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Normalize data based on how long player is in game\n",
    "survivors['votesreceived_pertribal'] = survivors['votesrecieved'] / survivors['tribalsattended']\n",
    "survivors['votescast_pertribal'] = survivors['votescast'] / survivors['tribalsattended']\n",
    "survivors['correctvote_rate'] = survivors['correctlyvoted'] / (survivors['votescast'] + 1)\n",
    "survivors['advantagesplayed_pertribal'] = survivors['advantagesplayed'] / survivors['tribalsattended']\n",
    "survivors['immunities_pertribal'] = survivors['individualimmunites'] / survivors['tribalsattended']\n",
    "\n",
    "print(f\"After cleaning: {survivors.shape}\\n\")\n",
    "print(\"Normalized metrics created successfully\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Winners vs Non-Winners Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Separate winners vs. non-winners\n",
    "survivors['results'] = np.where(survivors['finalplacement'] == 1, 'winner', 'nonwinner')\n",
    "\n",
    "# Create separate data set for winners vs. nonwinners\n",
    "winners = survivors[survivors['results'] == 'winner']\n",
    "nonwinners = survivors[survivors['results'] == 'nonwinner']\n",
    "\n",
    "print(f\"Total winners: {len(winners)}\")\n",
    "print(f\"Total non-winners: {len(nonwinners)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Do winners perform differently vs. nonwinners?\n",
    "grouped = (survivors.groupby('results')[[\n",
    "    'votesreceived_pertribal',\n",
    "    'votescast_pertribal',\n",
    "    'correctvote_rate',\n",
    "    'advantagesplayed_pertribal',\n",
    "    'immunities_pertribal',\n",
    "    'timesswapped',\n",
    "    'tribalsattended',\n",
    "    'playersonseason']]\n",
    "           .mean().round(3))\n",
    "\n",
    "print(\"\\nComparison of Winners vs. Nonwinners Actions:\\n\")\n",
    "print(grouped.T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_df = survivors.melt(\n",
    "    id_vars='results',\n",
    "    value_vars=[\n",
    "        'votesreceived_pertribal',\n",
    "        'votescast_pertribal',\n",
    "        'correctvote_rate',\n",
    "        'advantagesplayed_pertribal',\n",
    "        'immunities_pertribal',\n",
    "        'timesswapped',\n",
    "        'tribalsattended',\n",
    "        'playersonseason'\n",
    "    ],\n",
    "    var_name='metric',\n",
    "    value_name='value'\n",
    ")\n",
    "\n",
    "# Plot of winners vs. nonwinners actions\n",
    "plt.figure(figsize=(10, 6))\n",
    "sns.barplot(data=plot_df, x='metric', y='value', hue='results', errorbar='sd')\n",
    "plt.xticks(rotation=45)\n",
    "plt.title(\"Winners vs Non-Winners: Survivor Gameplay Metrics\")\n",
    "plt.ylabel(\"Average per Action\")\n",
    "plt.xlabel(\"Metric\")\n",
    "plt.legend(title='Player Type')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Winner Profile Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Snapshot of a typical winner's behavior\n",
    "win_stats = winners[['votesreceived_pertribal',\n",
    "    'votescast_pertribal',\n",
    "    'correctvote_rate',\n",
    "    'advantagesplayed_pertribal',\n",
    "    'immunities_pertribal',\n",
    "    'timesswapped',\n",
    "    'tribalsattended',\n",
    "    'playersonseason']].describe()\n",
    "\n",
    "print(f\"\\nWinner action stats:\\n{win_stats}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Elite Winners Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Find the elite winners\n",
    "elite_winners = winners[\n",
    "    (winners['correctvote_rate'] > 0.8) &\n",
    "    (winners['votesreceived_pertribal'] < 1) &\n",
    "    (winners['immunities_pertribal'] > 0.2)\n",
    "]\n",
    "\n",
    "elite_winners = elite_winners.reset_index(drop=True)\n",
    "print(f\"\\n Elite Survivor winners:\\n{elite_winners}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Snapshot of an elite winner's behavior\n",
    "if len(elite_winners) > 0:\n",
    "    elite_stats = elite_winners[['votesreceived_pertribal',\n",
    "        'votescast_pertribal',\n",
    "        'correctvote_rate',\n",
    "        'advantagesplayed_pertribal',\n",
    "        'immunities_pertribal',\n",
    "        'timesswapped',\n",
    "        'tribalsattended',\n",
    "        'playersonseason']].describe()\n",
    "\n",
    "    print(f\"\\nElite winner action stats:\\n{elite_stats}\")\n",
    "\n",
    "    avg_winner = win_stats.loc['mean']\n",
    "    avg_elite = elite_stats.loc['mean']\n",
    "\n",
    "    comparison = pd.DataFrame({\n",
    "        'avg_winner': avg_winner,\n",
    "        'avg_elite_winner': avg_elite,\n",
    "        'difference': (avg_elite - avg_winner).round(3)\n",
    "    })\n",
    "\n",
    "    print(\"\\nElite vs. Average Winner Comparison:\")\n",
    "    print(comparison)\n",
    "else:\n",
    "    print(\"No elite winners found with the current criteria.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Play Style Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Actions by style of play\n",
    "style_stats = survivors.groupby('style')[[\n",
    "    'votesreceived_pertribal',\n",
    "    'correctvote_rate',\n",
    "    'advantagesplayed_pertribal',\n",
    "    'immunities_pertribal'\n",
    "]].mean().round(3)\n",
    "print(f\"\\nAction averages by style of play:\\n{style_stats}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# How does chance of winning change based on play-style?\n",
    "players_style = survivors['style'].value_counts()\n",
    "win_style = winners['style'].value_counts()\n",
    "\n",
    "style_win_rate = pd.DataFrame({\n",
    "    'total_players': players_style,\n",
    "    'winners': win_style\n",
    "})\n",
    "\n",
    "style_win_rate['win_rate'] = (style_win_rate['winners'] / style_win_rate['total_players']).round(3)\n",
    "style_win_rate = style_win_rate.sort_values('win_rate', ascending=False)\n",
    "print(f\"\\nWinners by style of play:\\n{style_win_rate}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Era-Based Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Do the qualities that make a good winner change based on era?\n",
    "era_wins = (\n",
    "    winners[winners['finalplacement'] == 1]\n",
    "    .groupby('era')[['correctlyvoted', 'votesrecieved', 'individualimmunites', 'tribeimmunities', 'tribalsattended']]\n",
    "    .mean()\n",
    ")\n",
    "print(f\"\\nWin averages by era:\\n {era_wins}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Demographic Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Any similar jobs amongst winners?\n",
    "job_wins = winners.groupby(['occupation']).count().sort_values(by='id', ascending=False)\n",
    "print(\"Top occupations among winners:\")\n",
    "print(job_wins.head())\n",
    "\n",
    "cat_wins = winners.groupby(['jobcategory']).count().sort_values(by='id', ascending=False)\n",
    "print(\"\\nTop job categories among winners:\")\n",
    "print(cat_wins.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# What age are winners?\n",
    "age_wins = winners.groupby(['agegroup']).count().sort_values(by='id', ascending=False)\n",
    "print(\"Winners by age group:\")\n",
    "print(age_wins.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Winner Profile Summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nWinner Profile Summary\")\n",
    "profile_summary = pd.DataFrame({\n",
    "    'Characteristic': [\n",
    "        'Age Group (Most Common)',\n",
    "        'Job Category (Most Common)',\n",
    "        'Play Style (Highest Win Rate)',\n",
    "        'Votes Received per Tribal',\n",
    "        'Correct Vote Rate',\n",
    "        'Individual Immunities per Tribal',\n",
    "        'Votes Cast per Tribal',\n",
    "        'Advantages Played per Tribal',\n",
    "        'Average Tribals Attended'\n",
    "    ],\n",
    "    'Winner Profile': [\n",
    "        f\"{age_wins.index[0]} ({age_wins.iloc[0]['id']} winners)\",\n",
    "        f\"{cat_wins.index[0]} ({cat_wins.iloc[0]['id']} winners)\",\n",
    "        f\"{style_win_rate.index[0]} ({style_win_rate.iloc[0]['win_rate']:.1%} win rate)\",\n",
    "        f\"{win_stats.loc['mean', 'votesreceived_pertribal']:.3f}\",\n",
    "        f\"{win_stats.loc['mean', 'correctvote_rate']:.3f}\",\n",
    "        f\"{win_stats.loc['mean', 'immunities_pertribal']:.3f}\",\n",
    "        f\"{win_stats.loc['mean', 'votescast_pertribal']:.3f}\",\n",
    "        f\"{win_stats.loc['mean', 'advantagesplayed_pertribal']:.3f}\",\n",
    "        f\"{win_stats.loc['mean', 'tribalsattended']:.1f}\"\n",
    "    ]\n",
    "})\n",
    "\n",
    "total_winners = len(winners)\n",
    "print(profile_summary.to_string(index=False))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Elite Winners Summary\n",
    "print(f\"\\nElite Winners Summary\")\n",
    "print(f\"Total Elite Winners: {len(elite_winners)}\")\n",
    "print(f\"Criteria: >80% correct votes, <1 vote per tribal, >20% immunity rate\")\n",
    "\n",
    "if len(elite_winners) > 0:\n",
    "    print(f\"\\nElite Winners:\")\n",
    "    for idx, winner in elite_winners.iterrows():\n",
    "        print(f\"  • {winner['playername']} ({winner['seasonplayed']}, Season {winner['seasonnum']})\")\n",
    "\n",
    "print(f\"\\nPlay Style Success Summary\")\n",
    "for style in style_win_rate.index:\n",
    "    count = style_win_rate.loc[style, 'winners']\n",
    "    if pd.isna(count):\n",
    "        count = 0\n",
    "    else:\n",
    "        count = int(count)\n",
    "    percentage = (count / total_winners) * 100\n",
    "    print(f\"{style:15} {count:2d} winners ({percentage:4.1f}%)\")\n",
    "\n",
    "print(f\"\\nWinner Evolution by Era\")\n",
    "for era in era_wins.index:\n",
    "    correct_votes = era_wins.loc[era, 'correctlyvoted']\n",
    "    votes_received = era_wins.loc[era, 'votesrecieved']\n",
    "    immunities = era_wins.loc[era, 'individualimmunites']\n",
    "    print(f\"{era:12} Correct Votes: {correct_votes:4.1f} | Votes Received: {votes_received:4.1f} | Immunities: {immunities:4.1f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualization: Play Style by Era"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Win rate by era and style of play\n",
    "era_style = winners.groupby(['era', 'style']).size().unstack(fill_value=0)\n",
    "era_style_pct = (era_style.T / era_style.sum(axis=1)).T * 100\n",
    "\n",
    "era_style_pct.plot(kind='bar', stacked=True, colormap='tab20', figsize=(10, 6))\n",
    "plt.title(\"Style of Play Among Winners by Era\")\n",
    "plt.ylabel(\"Percentage of Winners\")\n",
    "plt.xlabel(\"Era\")\n",
    "plt.legend(title=\"Play Style\", bbox_to_anchor=(1.05, 1), loc='upper left')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Ultimate Winner Formula"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"\\n*****THE ULTIMATE WINNER FORMULA*****\")\n",
    "\n",
    "# Key insights from existing analysis\n",
    "top_age = age_wins.index[0]\n",
    "top_job = cat_wins.index[0]\n",
    "best_style = style_win_rate.index[0]\n",
    "best_style_rate = style_win_rate.iloc[0]['win_rate']\n",
    "\n",
    "winner_analysis = f\"\"\"\n",
    "DEMOGRAPHIC PROFILE:\n",
    "The typical Survivor winner is in their {top_age.lower()}, most commonly working in \n",
    "{top_job.lower()}. \n",
    "\n",
    "GAMEPLAY STRATEGY:\n",
    "Winners excel at the \"{best_style.lower()}\" play style, which has a {best_style_rate:.1%} \n",
    "success rate. The data shows that winners typically:\n",
    "• Stay Under the Radar: Receive only {win_stats.loc['mean', 'votesreceived_pertribal']:.2f} votes per tribal council\n",
    "• Vote Strategically: Have a {win_stats.loc['mean', 'correctvote_rate']:.1%} correct voting rate  \n",
    "• Show Physical Strength: Win {win_stats.loc['mean', 'immunities_pertribal']:.1%} of individual immunities\n",
    "• Play Actively: Cast {win_stats.loc['mean', 'votescast_pertribal']:.2f} votes per tribal council\n",
    "• Use Advantages Sparingly: Play {win_stats.loc['mean', 'advantagesplayed_pertribal']:.3f} advantages per tribal\n",
    "\n",
    "ELITE WINNER CHARACTERISTICS:\n",
    "The most dominant winners (elite tier) separate themselves by:\n",
    "- Voting correctly over 80% of the time\n",
    "- Receiving less than 1 vote per tribal council  \n",
    "- Winning individual immunities at a 20%+ rate\n",
    "\"\"\"\n",
    "\n",
    "print(winner_analysis)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Variance Analysis: The Unpredictability Factor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"\\nVariance in winning play\")

print(\"\\nWinner metric variability:\")
variance_metrics = ['votesreceived_pertribal', 'correctvote_rate', 'immunities_pertribal', 'votescast_pertribal']
for metric in variance_metrics:
    mean_val = win_stats.loc['mean', metric]
    std_val = win_stats.loc['std', metric]
    min_val = win_stats.loc['min', metric]
    max_val = win_stats.loc['max', metric]
    cv = (std_val / mean_val) * 100  # coefficient of variation
   ],
   "outputs": [],
   "source": [
    "print(f\"\\nOutlier winners\")\n",
    "\n",
    "# Winners who don't fit the typical profile\n",
    "outlier_winners = []\n",
    "\n",
    "# High votes received but still won\n",
    "high_votes = winners[winners['votesreceived_pertribal'] > win_stats.loc['75%', 'votesreceived_pertribal']]\n",
    "if len(high_votes) > 0:\n",
    "    outlier_winners.append((\"High Target\", high_votes.iloc[0]['playername'], high_votes.iloc[0]['seasonplayed'],\n",
    "                           f\"{high_votes.iloc[0]['votesreceived_pertribal']:.2f} votes/tribal\"))\n",
    "\n",
    "# Low immunity but still won\n",
    "low_immunity = winners[winners['immunities_pertribal'] < win_stats.loc['25%', 'immunities_pertribal']]\n",
    "if len(low_immunity) > 0:\n",
    "    outlier_winners.append((\"Low Physical\", low_immunity.iloc[0]['playername'], low_immunity.iloc[0]['seasonplayed'],\n",
    "                           f\"{low_immunity.iloc[0]['immunities_pertribal']:.2f} immunity rate\"))\n",
    "\n",
    "# Poor voting record but still won\n",
    "poor_voting = winners[winners['correctvote_rate'] < win_stats.loc['25%', 'correctvote_rate']]\n",
    "if len(poor_voting) > 0:\n",
    "    outlier_winners.append((\"Poor Voting\", poor_voting.iloc[0]['playername'], poor_voting.iloc[0]['seasonplayed'],\n",
    "                           f\"{poor_voting.iloc[0]['correctvote_rate']:.1%} correct votes\"))\n",
    "\n",
    "for category, name, season, stat in outlier_winners:\n",
    "    print(f\"  {category:12} {name} ({season}) - {stat}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Play style diversity among winners\n",
    "print(f\"\\nDiversity by play style\")\n",
    "for style in style_win_rate.index:\n",
    "    count = style_win_rate.loc[style, 'winners']\n",
    "    if pd.isna(count):  # Handle NaN values\n",
    "        count = 0\n",
    "    else:\n",
    "        count = int(count)\n",
    "    percentage = (count / total_winners) * 100\n",
    "    print(f\"{style:15} {count:2d} winners ({percentage:4.1f}%)\")\n",
    "\n",
    "# Age and job diversity\n",
    "print(f\"\\nDemographic diversity\")\n",
    "age_diversity = len(age_wins[age_wins['id'] > 0])  # Number of different age groups with winners\n",
    "job_diversity = len(cat_wins[cat_wins['id'] > 0])  # Number of different job categories with winners\n",
    "\n",
    "print(f\"Age Groups with Winners: {age_diversity}\")\n",
    "print(f\"Job Categories with Winners: {job_diversity}\")\n",
    "\n",
    "# Show era differences\n",
    "print(f\"\\nEra variance\")\n",
    "era_variance = era_wins.std().round(2)\n",
    "print(\"Standard Deviation Across Eras:\")\n",
    "for metric in era_variance.index:\n",
    "    print(f\"  {metric:20} {era_variance[metric]:.2f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "variance_message = f\"\"\"\n",
    "While patterns exist, the data reveals HIGH VARIANCE in winner characteristics:\n",
    "\n",
    "METRIC VARIABILITY:\n",
    "• Coefficient of variation ranges from {((win_stats.loc['std'] / win_stats.loc['mean']) * 100).min():.0f}% to {((win_stats.loc['std'] / win_stats.loc['mean']) * 100).max():.0f}%\n",
    "• Winners span the full range of ages, jobs, and play styles\n",
    "• Even \"elite\" winners represent only {len(elite_winners)} of {len(winners)} total winners\n",
    "\n",
    "OUTLIER SUCCESS:\n",
    "• Winners exist who received heavy targeting but still won\n",
    "• Champions with poor voting records have succeeded  \n",
    "• Weak physical players and strategic misfits have claimed victory\n",
    "\n",
    "BOTTOM LINE:\n",
    "The \"ultimate winner profile\" represents tendencies, not requirements. \n",
    "Survivor's beauty lies in its unpredictability - anyone can win with the \n",
    "right combination of strategy, luck, and timing. The variance in the data \n",
    "proves that while some approaches are more successful on average, there's \n",
    "no single path to victory.\n",
    "\"\"\"\n",
    "\n",
    "print(variance_message)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Final Visualization: Winner Metric Distributions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Variance of winner metrics\n",
    "key_metrics = ['votesreceived_pertribal', 'correctvote_rate', 'immunities_pertribal',\n",
    "               'votescast_pertribal', 'advantagesplayed_pertribal']\n",
    "\n",
    "melted = winners[key_metrics].melt(var_name='Metric', value_name='Value')\n",
    "\n",
    "plt.figure(figsize=(10, 6))\n",
    "sns.violinplot(data=melted, x='Metric', y='Value', inner='box', cut=0)\n",
    "plt.title(\"Density & Distribution of Winner Metrics\")\n",
    "plt.xticks(rotation=45)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
