# The History of Karen, Visualized

Karen has been used for characters’ names in a wide range of shows/movies, including The Office, Mean Girls, The Walking Dead, and Will and Grace. The meaning of the name is “pure”; however, it has been rapidly declining in popularity over the past couple of years as it becomes more associated with white privilege. The exact origins of this association is unknown, but according to Robin Queen, the chairwoman of the linguistics department at University of Michigan, the first mention of Karen being used negatively was in Dane Cook’s 2005 comedy album (Goldblatt, 2020). As an example of “a friend nobody likes”, Cook describes a woman named Karen. This meme is then speculated to have been used on Reddit somewhere between 2014 and 2015, from where it truly exploded in popularity in 2018, where “Karen” was used “to reference a hair style, white women who ask to speak to the manager, and people being racist in public (Greenspan, 2020). In this notebook, we explore the prevalence of the name Karen from its height in the 1960s to its 21st century fall.

Goldblatt, H. (2020). A Brief History of ‘Karen’. https://www.nytimes.com/2020/07/31/style/karen-name-meme-history.html 

Greenspan, Rachel E. (2020). How the name 'Karen' became a stand-in for problematic white women and a hugely popular meme. https://www.insider.com/karen-meme-origin-the-history-of-calling-women-karen-white-2020-5 

# Essential Imports

In [None]:
import numpy as np
import pandas as pd
import os
import sys
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as py
import seaborn as sns
import math
import itertools
from scipy.stats import pearsonr
import gc
py.init_notebook_mode(connected=True)
from pandas_profiling import ProfileReport

In [None]:
df = pd.read_csv("../input/baby-names/baby_names.csv").drop("Unnamed: 0", axis=1)

# Exploratory Data Analysis

<h2> Pandas Profile Report </h2>

In [None]:
report = ProfileReport(df)
report

To accomplish our goal of analyzing the popularity of the name Karen, we must first understand our dataset which was pulled from the Social Security Administration. According to a Pandas Profile Report, there are 5 unique features, 890627 unique rows, and no missing data from this dataset. Those 5 features are: State, Sex, Year, Name, and Count.

<h2> Interactive Graphs </h2>

In [None]:
fig = px.bar(df.groupby('State').sum()['Count'].sort_values(ascending=False))
fig.update_layout(title={'text': f"Number of Baby Names From Each State", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Count", showlegend=False)
fig.show()

In [None]:
fig = px.line(df.groupby('Year').sum())
fig.update_layout(title={'text': f"Number of Baby Names Over Time", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Count", showlegend=False)
fig.show()

<h2> Static Graphs </h2>

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
sns.barplot(data=pd.DataFrame(df.groupby('State').sum()['Count'].sort_values(ascending=False), columns=["Count"]).reset_index(), x="State", y="Count", ax=ax[0])
ax[0].set_title("Number of Baby Names From Each State", fontsize=16)
sns.lineplot(data=pd.DataFrame(df.groupby('Year').sum(), columns=["Count"]).reset_index(), x="Year", y="Count", ax=ax[1])
ax[1].set_title("Number of Baby Names Over Time", fontsize=16)
fig.tight_layout()
plt.show()

There are two features which are important to understand the count aggregations of. The first observation is that there are only 6 states in this dataset, all of which are somewhere in the southeast. The second observation is that the number of babies actually peaked around the 1960s according to this dataset. That’s particularly surprising since our current population is much higher than it was a half century ago. These aggregations are important to keep in mind as we draw conclusions from the data. Since our dataset only encompasses a small portion of the American population, results may differ when we also include other states (which may change the distribution of baby names over time).

# Exploration of the Name Karen

<h2> Are There Any Males Named Karen? </h2>

Now that we understand the underlying distribution of our data, we begin looking at Karen. One of the first questions we had was: “are there any males named Karen?”

In [None]:
karen_df = df.loc[df['Name'] == 'Karen'].reset_index(drop=True)
karen_df

In [None]:
male_karen_df = karen_df.loc[karen_df['Sex'] == 'M']
male_karen_df

We discovered that there were only a total of 15 male Karens in this dataset, compared to the thousands of female Karens.

<h2> Yearly Trend of the Name Karen </h2>

Next, we looked at the trend of the name Karen with a specific focus on our target years (21st century).

In [None]:
fig = px.line(karen_df.groupby('Year').sum().loc[2000:])
fig.add_vline(x=2005, line_width=3, line_dash="dash", line_color="green")
fig.add_vline(x=2014, line_width=3, line_dash="dash", line_color="yellow")
fig.add_vline(x=2018, line_width=3, line_dash="dash", line_color="red")
fig.update_layout(title={'text': f"Trend of Karen Over Time, Beginning at Year 2000", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Count", showlegend=False)
fig.show()

> Fig 3. Karen trends from 2000. Legend: 2005 (Green) - Release of Dane Cook’s comedy album; 2014 (Yellow) - First mention of Karen meme on Reddit; 2018 (Red) - Karen meme is used widespread.

Expectedly, the popularity of the name Karen decreases throughout the 21st century; however, this graph does not paint the full picture of the trend.


<h2> Full Picture Trends </h2>

In [None]:
fig = px.line(karen_df.groupby('Year').sum())
fig.add_vline(x=2005, line_width=3, line_dash="dash", line_color="green")
fig.add_vline(x=2014, line_width=3, line_dash="dash", line_color="yellow")
fig.add_vline(x=2018, line_width=3, line_dash="dash", line_color="red")
fig.update_layout(title={'text': f"Trend of Karen Over Time", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Count", showlegend=False)
fig.show()

Interestingly, the name Karen was already rapidly declining in popularity way before the name was used as a pejorative. Thus, we can conclude that, although the popularity of Karen did decrease from its use as a pejorative/meme, the name was already on a declining trend starting from the 1970s.

<h2> Karen Versus Other Female Names </h2>

In [None]:
female_df = df.loc[df['Sex'] == 'F'].groupby(['Year', 'Name']).sum()
female_df

In [None]:
all_years = df['Year'].unique()
years_used = []
rank_list = []
percentage_list = []
for year in all_years:
    year_f = female_df.loc[year].sort_values("Count", ascending=False)['Count']
    try:
        percentage_list.append(year_f['Karen']/year_f.sum()*100)
        years_used.append(year)
    except:
        pass
    
    try:
        rank_list.append(np.where(year_f.index == 'Karen')[0][0])
    except:
        pass

In [None]:
fig = px.line(x=years_used, y=percentage_list)
fig.update_layout(title={'text': f"Percentage of Female Names Called Karen", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, xaxis_title="Year", yaxis_title="Percentage", showlegend=False)
fig.show()

In [None]:
fig = px.bar(x=years_used, y=rank_list, log_y=True)
fig.update_layout(title={'text': f"Karen Name Rank", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, xaxis_title="Year", yaxis_title="Rank (log scale)", showlegend=False)
fig.show()

In [None]:
fig = go.Figure()
karen_year_df = karen_df.groupby('Year').sum()
fig.add_trace(go.Scatter(x=karen_year_df.index, y=karen_year_df['Count'], name='Total Number'))
fig.add_trace(go.Bar(x=years_used, y=rank_list, name='Rank'))

fig.update_layout(title={'text': f"Trend of Karen Over Time with Number + Rank", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, xaxis_title="Year", yaxis_title="Value (log scale)")
fig.update_yaxes(type="log")
fig.show()

From this graph, we can see that the true decline of the name Karen started from the 1970s, and has only been exacerbated by the growing use of Karen as a mockery of white privilege.

<h2> Which State Has the Largest Karen Dropoff </h2>

Mean plotted with the max as an error bar.

In [None]:
unique_states = df['State'].unique()
state_karen_df = karen_df.groupby('State')['Count'].describe()
fig = go.Figure()
fig.add_trace(go.Bar(name='Mean Value', x=state_karen_df.index, y=state_karen_df['mean'], error_y=dict(type='data', 
                                symmetric=False, array=[state_karen_df['max'][state] for state in unique_states])))
fig.update_layout(title={'text': f"States with Largest Karen Dropoff", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Mean Value")
fig.show()

In [None]:
state_year_df = karen_df.groupby(['State', 'Year']).sum()
state_year_df

In [None]:
fig, ax = plt.subplots(3, 2, figsize=(16, 12))
c = 0
for state in unique_states:
    tmp_df = state_year_df.loc[state]
    ax[c//2][c%2].plot(tmp_df.index, tmp_df['Count'])
    ax[c//2][c%2].set_title(f"Karen's Trend in {state}", fontsize=16)
    ax[c//2][c%2].set_xlabel("Year")
    ax[c//2][c%2].set_ylabel("Count")
    c += 1

fig.tight_layout()
fig.show()

The name Karen, once with the meaning of “pure”, has become used more and more with a negative connotation in recent years. Our data confirms that the usage of Karen has decreased because of these recent events, at least when looking at trends in six southeastern states: NC, GA, VA, TN, KY, and SC. However, when we look at the full picture for trends in popularity of the name “Karen”, it is clear that the name has been on the decline since the 1970s, and recent events have only exacerbated its decline in popularity.