# Suvivor voteoff reactions, by season
> How did contestants in different season respond after their torches were snuffed? This notebook calculates the season average score for how often voted-off castaways asknowledged their tribe mates by looking, smiling, gesturing or speaking. 

#### Load Python tools and Jupyter config

In [21]:
import os
import json
import boto3
import pandas as pd
import jupyter_black

In [22]:
jupyter_black.load()
pd.options.display.max_columns = 100
pd.options.display.max_rows = 1000

---

## Fetch

#### Read season summary from [survivoR2py](https://github.com/stiles/survivoR2py/tree/main) repo

In [78]:
season_summary_src = pd.read_csv(
    "https://raw.githubusercontent.com/stiles/survivoR2py/main/data/raw/csv/season_summary.csv"
)

In [79]:
us_summary_df = season_summary_src.query('version == "US"')[
    ["season", "season_name", "location", "tribe_setup", "n_cast", "winner_id"]
].copy()

In [80]:
us_summary_df["season"] = us_summary_df["season"].astype(str).str.replace(".0", "")

#### Read castaway details from [survivoR2py](https://github.com/stiles/survivoR2py/tree/main) repo

In [81]:
castaway_details_src = pd.read_csv(
    "https://raw.githubusercontent.com/stiles/survivoR2py/main/data/raw/csv/castaway_details.csv"
)

In [82]:
castaway_details_src["version"] = castaway_details_src.castaway_id.str[:2]

In [None]:
castaway_details_df = castaway_details_src.query('version == "US"').copy()

In [95]:
castaway_details_df.columns

Index(['castaway_id', 'full_name', 'full_name_detailed', 'castaway',
       'date_of_birth', 'date_of_death', 'gender', 'african', 'asian',
       'latin_american', 'native_american', 'bipoc', 'lgbt',
       'personality_type', 'occupation', 'three_words', 'hobbies',
       'pet_peeves', 'race', 'ethnicity', 'version'],
      dtype='object')

#### Read vote off log

In [None]:
voteoff_df = pd.read_json(
    "https://stilesdata.com/survivor/survivor_vote_off_reactions.json",
    dtype={"season": str, "vote": str, "episode": str},
)

In [100]:
voteoff_merged = pd.merge(voteoff_df, us_summary_df, on="season")
voteoff_merged_df = pd.merge(
    voteoff_merged,
    castaway_details_df[
        [
            "castaway_id",
            "full_name",
            "date_of_birth",
            "gender",
            "personality_type",
            "occupation",
        ]
    ],
    on="castaway_id",
)

---

## Aggregate

In [117]:
gender_scores = (
    (
        voteoff_merged_df.groupby(["gender"])
        .agg({"castaway_id": "count", "ack_score": "mean"})
        .round(2)
    )
    .reset_index()
    .rename(columns={"castaway_id": "count", "ack_score": "mean_score"})
)

In [118]:
gender_scores

Unnamed: 0,gender,count,mean_score
0,Female,353,1.7
1,Male,339,1.68
2,Non-binary,1,3.0


#### Mean score over the life of the series

In [37]:
series_score = round(float(voteoff_merged_df["ack_score"].mean()), 2)

#### Mean score by season

In [None]:
season_scores = (
    voteoff_merged_df.groupby(["season", "season_name", "tribe_setup"])["ack_score"]
    .mean()
    .round(2)
    .reset_index(name="mean_ack_score")
)

In [None]:
season_scores["series_score"] = series_score

#### Negative score = less acknowledgement in a season

In [None]:
season_scores["season_score_diff"] = (
    season_scores["mean_ack_score"] - season_scores["series_score"]
)

#### Highest acknowledgement

In [None]:
season_scores.sort_values("season_score_diff", ascending=False).head()

Unnamed: 0,season,season_name,tribe_setup,mean_ack_score,series_score,season_score_diff
7,16,Survivor: Micronesia,Two tribes of ten: new players against past co...,2.93,1.69,1.24
31,38,Survivor: Edge of Extinction,"Two tribes of nine, including four returning p...",2.88,1.69,1.19
36,42,Survivor: 42,Three tribes of 6 new players. This season was...,2.71,1.69,1.02
41,5,Survivor: Thailand,Two tribes of eight new players; picked by the...,2.71,1.69,1.02
38,44,Survivor: 44,Three tribes of 18 new castaways,2.69,1.69,1.0


#### Lowest acknowledgement

In [None]:
season_scores.sort_values("season_score_diff", ascending=False).tail()

In [None]:
gender_scores = (
    voteoff_merged_df.groupby(["season", "season_name", "tribe_setup"])["ack_score"]
    .mean()
    .round(2)
    .reset_index(name="mean_ack_score")
)

In [None]:
voteoff_merged_df.head()

---

## Export

In [12]:
# Output paths
csv_output_path = "../data/processed/survivor_voteoff_ack_scores_seasons.csv"
json_output_path = "../data/processed/survivor_voteoff_ack_scores_seasons.json"

In [16]:
# Save season scores to CSV
season_scores.to_csv(csv_output_path, index=False)

In [17]:
# Save season scores to JSON
season_scores.to_json(json_output_path, orient="records", indent=4)

In [18]:
# Upload CSV and JSON to S3
s3_bucket = "stilesdata.com"
s3_csv_key = "survivor/survivor_voteoff_ack_scores_seasons.csv"
s3_json_key = "survivor/survivor_voteoff_ack_scores_seasons.json"

# Initialize boto3 client with environment variables
s3_client = boto3.client(
    "s3",
    aws_access_key_id=os.getenv("MY_AWS_ACCESS_KEY_ID"),
    aws_secret_access_key=os.getenv("MY_AWS_SECRET_ACCESS_KEY"),
    aws_session_token=os.getenv("MY_AWS_SESSION_TOKEN"),
)

In [19]:
# Upload the CSV file
s3_client.upload_file(str(csv_output_path), s3_bucket, s3_csv_key)
print(f"CSV file uploaded to s3://{s3_bucket}/{s3_csv_key}")

CSV file uploaded to s3://stilesdata.com/survivor/survivor_voteoff_ack_scores_seasons.csv


In [20]:
# Upload the JSON file
s3_client.upload_file(str(json_output_path), s3_bucket, s3_json_key)
print(f"JSON file uploaded to s3://{s3_bucket}/{s3_json_key}")

JSON file uploaded to s3://stilesdata.com/survivor/survivor_voteoff_ack_scores_seasons.json
