# Question 2

*Which Actor Traits Correspond to Specific Archetypes?\
Which actor traits — such as age, gender, ethnicity, and other physical attributes — are typically associated with specific archetypes? For instance, are certain traits more frequently linked to roles like heroes, villains, or mentors? Investigating these correlations can reveal patterns in casting decisions.*

In [1]:
# ignore
%load_ext autoreload
%autoreload 2

In [2]:
# ignore
%config InlineBackend.figure_formats = ['svg']
%matplotlib inline

In [3]:
# ignore
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
from IPython import display

pio.renderers.default='notebook'
plotly.offline.init_notebook_mode(connected=True)

import os
import sys
sys.path.append(os.path.join(os.getcwd(), "../visualization"))
from plot_3d_like import plot_2d_heatmap

In [4]:
# ignore
sys.path.insert(0, '../..')
from src.scripts.load_data import data

data.shape

(87210, 21)

This question shapes our understanding of how features related to specific archetypes, i.e. what are the most probable features for an archetype.

In [5]:
# ignore

actor_traits_numerical_columns = ["actor_height", "weight", "years_in_film", "actor_bmi"]
actor_traits_categorial_columns = ["actor_gender", "race", "education", "religion", "nationality", "place_of_birth"]

In [6]:
# ignore

# in this cell we calculate the most probable features for each archetype 
# (taking into account that distribution of numerical features is gaussian)

# take the most probable value for categorial features
data_categorial = data[actor_traits_categorial_columns + ["archetype"]].copy()
data_categorial = data_categorial.groupby("archetype").agg(lambda x: x.value_counts().index[0]).reset_index()

# take the mean = most probable value for numerical features
data_numerical = data[actor_traits_numerical_columns + ["archetype"]].copy()
data_numerical = data_numerical.groupby("archetype").agg(lambda x: x.dropna().mean()).reset_index()
archetype_features_most_probable = pd.merge(data_categorial, data_numerical, on="archetype")
archetype_features_most_probable

Unnamed: 0,archetype,actor_gender,race,education,religion,nationality,place_of_birth,actor_height,weight,years_in_film,actor_bmi
0,Caregiver / Healer,F,European,1,Catholicism,United States of America,New York City,1.700686,65.397276,42.071826,20.97419
1,Innocent / Vulnerable,F,European,0,Catholicism,United States of America,Los Angeles,1.680676,57.836413,24.451378,20.295291
2,Intellectual / Creative (Scholar/Artist/Inventor),M,European,1,Catholicism,United States of America,New York City,1.758763,67.782252,39.353574,21.365027
3,Love Interest / Romantic Partner,F,European,1,Catholicism,United States of America,Mumbai,1.706257,61.322289,31.001326,20.377126
4,Mentor / Wise Guide,M,European,1,Catholicism,United States of America,New York City,1.776903,77.673471,46.709551,23.4882
5,Mystic / Seer,M,European,1,Catholicism,United States of America,New York City,1.74267,70.441211,42.123938,22.175049
6,Other,M,European,0,Catholicism,United States of America,New York City,1.732071,70.536072,39.338174,21.608705
7,Outsider / Loner,M,European,1,Catholicism,United States of America,New York City,1.754119,73.186161,35.076492,22.653687
8,Rogue / Trickster / Con Artist,M,European,1,Catholicism,United States of America,New York City,1.768333,75.793107,39.673057,23.195485
9,Ruler / Politician,M,European,1,Catholicism,United States of America,New York City,1.794513,79.449172,49.752175,23.54565


We found that there are several common traits across most archetypes:

- `Race` is predominantly European, `Religion` is Catholicism, `Nationality` is USA, and `Place of Birth` is New York City. Exceptions include the archetype `Love Interest/Romantic Partner`, where the `Place of Birth` is Mumbai, and `Innocent/Vulnerable`, where it is Los Angeles. This pattern likely reflects the fact that most films are produced in the United States.
- `Gender` is Female only for the archetypes `Innocent/Vulnerable`, `Caregiver/Healer`, and `Love Interest/Romantic Partner`.

The figure below presents the behavior of averaged, scaled values for each archetype on a single plot. Notably, `BMI` and `Weight` exhibit similar trends.
The oldest archetype is `Ruler/Politician`, while the tallest, heaviest, and highest in BMI is `Warrior/Vigilante`.
In contrast, the shortest, youngest, lightest, and lowest in BMI is a single archetype: `Innocent/Vulnerable`.

In [7]:
# hidecode

fig = go.Figure()

for s, title in zip(
    ["actor_height", "weight", "years_in_film", "actor_bmi"],
    ["Height", "Weight", "Years in film", "BMI"],
):  
    # scale the values to [0, 1]
    curr_values = data[s].dropna().values.reshape(-1, 1)
    mmax = np.quantile(curr_values, 0.95)
    mmin = np.quantile(curr_values, 0.05)
    scaled = (archetype_features_most_probable[s].values.reshape(-1, 1) - mmin) / (mmax - mmin)

    # plot them
    fig.add_trace(
        go.Scatter(
            x=archetype_features_most_probable["archetype"],
            y=scaled.flatten().round(2),
            mode="markers+lines",
            name=title,
            hovertemplate=(
                f"<b>{title}</b><br>"
                "Archetype: %{x}<br>"
                "Original Value: %{customdata[0]}<br>"
                "Scaled Value: %{y}<extra></extra>"
            ),
            customdata=archetype_features_most_probable[s].values.reshape(-1, 1).round(2),
        )
    )

fig.update_layout(
    title="Features of Archetypes",
    xaxis_title="Archetype",
    yaxis_title="Scaled Values",
    legend_title="Features",
    height=800,
    width=800,
)

# fig.show()

display.display_html(fig.to_html(full_html=False, include_plotlyjs='cdn'), raw=True)

In [8]:
# ignore
# python pages/merger.py src/story/Question_2.ipynb src/story/Question_3.ipynb > pages/merged.ipynb
# python pages/render.py pages/merged.ipynb > pages/index.markdown