<h1 style='text-align: center;'>Survivor:</h1>
<h2 style='text-align: center;'>Exploring Contestant Trends Through Data</h2>

This projects aims to look at the long-running CBS show Survivor over the years, and determine what demographics of contestants are more likely to win.

We will also be looking to see if there are any trends where certain demographics have lower chances of winning.


In [1]:
import pandas as pd
import matplotlib as plt
import sqlite3
# Data pulled in/ saved to csv from 2/16/2025

## Importing Data

In [2]:
# I think it will be more difficult than worth to try and get function to create csv if they don't exist
seasons = pd.read_csv('seasons.csv')
contestants = pd.read_csv('contestants.csv')
stats = pd.read_csv('stats.csv')
idols = pd.read_csv('idols.csv')
advantages = pd.read_csv('advantages.csv')
immunities = pd.read_csv('immunities.csv')

## Cleaning Data

In [3]:
def clean_and_merge(advantages, idols, immunities, stats):
    """
    Cleans tables for advantages, idols and immunities, then merges them to the stats table.
    """
    # drop columns
    advantages = advantages.drop(columns=['Rank', 'Contestant', 'VV', 'VFB', 'Tie broken?'])
    idols = idols.drop(columns=['Rank', 'Contestant'])
    immunities = immunities.drop(columns=['Rank', 'Contestant'])
    # change column names 
    advantages.columns = advantages.columns.str.strip().str.replace('.1', '')
    idols.columns = idols.columns.str.strip().str.replace('.1', '')
    immunities.columns = immunities.columns.str.strip().str.replace('.1', '')
    # strip and replace values (Season column of S, idols table of special characters (*,†/+,#))
    advantages['Season'] = advantages['Season'].str.replace('S', '')
    advantages['Season'] = advantages['Season'].replace({
        'Game Changers': 34,
        'David vs. Goliath': 37,
        'Winners at War': 40,
        'Cambodia': 31,
        'Island of the Idols': 39,
        'HvHvH': 35,
        'Worlds Apart': 30,
        'Kaoh Rong': 32,
        'Ghost Island': 36,
        'urvivor 42': 42,
        'Edge of Extinction': 38,
        'MvGX': 33   
    })
    advantages['Season'] = advantages['Season'].astype(int) # hopefully the other two go more smoothly

    idols['Season'] = idols['Season'].str.replace('S', '') 
    idols['Contestant'] = idols['Contestant'].str.rstrip('*').str.rstrip('#').str.rstrip('+')
    idols['IH'] = idols['IH'].str.rstrip('*').str.rstrip('#').str.rstrip('+')
    idols['IP'] = idols['IP'].str.rstrip('*').str.rstrip('#').str.rstrip('+')
    idols['VV'] = idols['VV'].str.rstrip('†').str.rstrip('#')
    idols = idols.drop(idols[idols['Season'] == '--'].index)
    idols['IH'] = idols['IH'].astype(int)
    idols['IP'] = idols['IP'].astype(int)
    idols['VV'] = idols['VV'].astype(int)
    idols['Season'] = idols['Season'].astype(int)

    immunities['Season'] = immunities['Season'].str.split(':').str[0]
    immunities['Season'] = immunities['Season'].str.strip('Survivor').str.strip('S')
    immunities['Season'] = immunities['Season'].astype(int)
    # merge advantages/idols/immunities together (before merging them to stats?)
    merged_idols = pd.merge(idols, advantages, on=['Contestant', 'Season'], how='outer')
    merged_all = pd.merge(merged_idols, immunities, on=['Contestant', 'Season'], how='outer')
    stats = pd.merge(stats, merged_all, on=['Contestant', 'Season'], how='left')
    # reorder columns to be more readable
    stats = stats[['Season', 'Contestant', 'SurvSc', 'SurvAv', 'ChW', 'ChA', 'ChW%',
                            'SO', 'VFB', 'VAP','TotV','TCA','TC%','wTCR','JVF', 'TotJ', 
                            'JV%', 'IF', 'IH', 'IP', 'VV', 'ICW', 'ICA', 'AF', 'AP', 'Notes']]
    
    stats['VAP'] = stats['VAP'].str.rstrip('*')
    stats['TotV'] = stats['TotV'].str.rstrip('*')
    stats['TCA'] = stats['TCA'].str.rstrip('*')
    stats['TC%'] = stats['TC%'].str.rstrip('*')
    stats['wTCR'] = stats['wTCR'].str.rstrip('*')
    # drop rows where SurvSc is null (Season 48 - no values for stats yet)
    stats = stats.dropna(subset='SurvSc')
    # fill null values (notes with NA, everything else with 0)
    stats['Notes'] = stats['Notes'].fillna('NA')
    stats = stats.fillna(0)
    stats = stats.replace('-', '0')
    # to float- survsc, survav, chw, cha, chw%, tc%, wtcr, jv%
    # to int- so, vfb, vap, totv, tca, jvf, totj
    stats.astype({
    'SurvSc': 'float64', 'SurvAv': 'float64', 'ChW': 'float64', 'ChA': 'float64',
    'ChW%': 'float64', 'TC%': 'float64', 'wTCR': 'float64', 'JV%': 'float64',
    'SO': 'int64', 'VFB': 'int64', 'VAP': 'int64', 'TotV': 'int64', 'TCA': 'int64',
    'JVF': 'int64', 'TotJ': 'int64'
    })
    return stats # will be used to clean idols, advantages, and immunities. 

stats = clean_and_merge(advantages, idols, immunities, stats)

stats.head()


Unnamed: 0,Season,Contestant,SurvSc,SurvAv,ChW,ChA,ChW%,SO,VFB,VAP,...,JV%,IF,IH,IP,VV,ICW,ICA,AF,AP,Notes
0,1,Kelly Wiglesworth,1.34,12.26,5.87,16.1,0.36,2,6,0,...,0.43,0.0,0.0,0.0,0.0,4.0,8.0,0.0,0.0,
1,1,Richard Hatch,1.58,7.82,1.87,16.1,0.12,0,9,6,...,0.57,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,1,Rudy Boesch,1.09,3.95,1.62,15.1,0.11,3,10,8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,1,Gretchen Cordy,1.12,3.85,1.23,3.07,0.4,0,3,4,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,1,Susan Hawk,0.95,3.67,0.87,15.1,0.06,0,9,5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,


In [4]:
def rename_columns(stats):
    """
    Rename columns in stats table to be more readable
    """
    stats = stats.rename(columns={
        'SurvSc': 'Survival Score',
        'SurvAv': 'Survival Average',
        'ChW': 'Challenge Wins',
        'ChA': 'Challenge Appearances',
        'ChW%': 'Challenge Win %',
        'SO': 'Sit Outs',
        'VFB': 'Votes For Bootee',
        'VAP': 'Votes Against (Total)',
        'TotV': 'Total Votes Cast',
        'TCA': 'Tribal Council Appearances',
        'TC%': 'Tribal Counicl %',
        'wTCR': 'Tribal Council Ratio (Weighted)',
        'JVF': 'Jury Votes For',
        'TotJ': 'Total Numbers Of Jurors',
        'JV%': 'Jury Votes %',
        'IF': 'Idols Found',
        'IH': 'Idols Held',
        'IP': 'Idols Played',
        'VV': 'Votes Voided',
        'ICW': 'Immunity Challenge Wins',
        'ICA': 'Immunity Challenge Appearances',
        'AF': 'Advantages Found',
        'AP': 'Advantages Played'
    })
    return stats

stats = rename_columns(stats)

In [15]:
def clean_seasons(seasons):
    # Rename columns
    seasons = seasons.rename(columns={
        'Runner(s)-up': 'Runner-up 1',
        'Runner(s)-up.1': 'Runner-up 2'
    })
    # Strip and format column names
    seasons.columns = seasons.columns.str.strip().str.title()
    # Clean values in Location column
    seasons['Location'] = seasons['Location'].str.split('[').str[0]
    # Replace duplicate values in Runner Up 2
    seasons.loc[seasons['Runner-Up 2'] == seasons['Runner-Up 1'], 'Runner-Up 2'] = 'NA'
    return seasons

seasons = clean_seasons(seasons)
seasons.head()

Unnamed: 0,Season,Subtitle,Location,Original Tribes,Winner,Runner-Up 1,Runner-Up 2,Final Vote
0,1,Borneo,"Pulau Tiga, Sabah, Malaysia",Two tribes of eight new players,Richard Hatch,Kelly Wiglesworth,,4–3
1,2,The Australian Outback,"Herbert River at Goshen Station, Queensland, A...",Two tribes of eight new players,Tina Wesson,Colby Donaldson,,4–3
2,3,Africa,"Shaba National Reserve, Kenya",Two tribes of eight new players,Ethan Zohn,Kim Johnson,,5–2
3,4,Marquesas,"Nuku Hiva, Marquesas Islands, French Polynesia",Two tribes of eight new players,Vecepia Towery,Neleh Dennis,,4–3
4,5,Thailand,"Ko Tarutao, Satun Province, Thailand",Two tribes of eight new players; picked by the...,Brian Heidik,Clay Jordan,,4–3


In [None]:
contestants.info()
# fill null (Finish only) - TBD 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 875 entries, 0 to 874
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Season          875 non-null    int64 
 1   Name            875 non-null    object
 2   Age             875 non-null    int64 
 3   Hometown        875 non-null    object
 4   Profession      875 non-null    object
 5   Finish          857 non-null    object
 6   Ethnicity       875 non-null    object
 7   Gender          875 non-null    object
 8   Lgbt            875 non-null    bool  
 9   Has Disability  875 non-null    bool  
dtypes: bool(2), int64(2), object(6)
memory usage: 56.5+ KB


### Exploratory Data Analysis