# People on Banknotes

Whose faces appear on banknotes?

The file `people-on-banknotes.csv` contains data about individuals featured on banknotes from 38 countries. This dataset spans all 22 subregions and sub-subregions of the world, as defined by the United Nations Statistics Division's geoscheme.

It profiles 241 people, detailing their occupations and the year they first appeared on a banknote. Additionally, it includes their year of death — or `NaN` if they were still alive when the dataset was compiled.

Most banknotes were issued after the featured individual’s death. The column `first_death_diff` calculates the difference between the year of their first appearance on a banknote and their year of death (or remains `NaN` if the person was still living at the time of curation).




In [1]:
# FOR GOOGLE COLAB ONLY.
# Uncomment and run the code below. A dialog will appear to upload files.
# Upload 'people-on-banknotes.csv'.

# from google.colab import files
# uploaded = files.upload()

In [1]:
import pandas as pd

df = pd.read_csv('people-on-banknotes.csv')
df

Unnamed: 0,country,currency,name,gender,occupation,value,first_appearance,death,first_death_diff,currency_code
0,Argentina,Argentine Peso,Eva Perón,F,Activist,100,2012,1952,60.0,ARS
1,Argentina,Argentine Peso,Julio Argentino Roca,M,Head of Gov't,100,1988,1914,74.0,ARS
2,Argentina,Argentine Peso,Domingo Faustino Sarmiento,M,Head of Gov't,50,1999,1888,111.0,ARS
3,Argentina,Argentine Peso,Juan Manuel de Rosas,M,Politician,20,1992,1877,115.0,ARS
4,Argentina,Argentine Peso,Manuel Belgrano,M,Founder,10,1970,1820,150.0,ARS
...,...,...,...,...,...,...,...,...,...,...
274,Venezuela,Venezuelan Bolivar,Francisco de Miranda,M,Military,200,1968,1816,152.0,VES
275,Venezuela,Venezuelan Bolivar,Simón Rodrigues,M,Educator,20,2007,1854,153.0,VES
276,Venezuela,Venezuelan Bolivar,Ezequiel Zamora,M,Military,100,2018,1860,158.0,VES
277,Venezuela,Venezuelan Bolivar,Rafael Urdaneta,M,Head of Gov't,10,2018,1845,173.0,VES


### Quick cleaning

The same person can appear on multiple banknotes. Below we drop the `value` column and remove duplicate people.

In [2]:
df = df.drop(columns=['value'])
df = df.drop_duplicates(subset="name")
df

Unnamed: 0,country,currency,name,gender,occupation,first_appearance,death,first_death_diff,currency_code
0,Argentina,Argentine Peso,Eva Perón,F,Activist,2012,1952,60.0,ARS
1,Argentina,Argentine Peso,Julio Argentino Roca,M,Head of Gov't,1988,1914,74.0,ARS
2,Argentina,Argentine Peso,Domingo Faustino Sarmiento,M,Head of Gov't,1999,1888,111.0,ARS
3,Argentina,Argentine Peso,Juan Manuel de Rosas,M,Politician,1992,1877,115.0,ARS
4,Argentina,Argentine Peso,Manuel Belgrano,M,Founder,1970,1820,150.0,ARS
...,...,...,...,...,...,...,...,...,...
274,Venezuela,Venezuelan Bolivar,Francisco de Miranda,M,Military,1968,1816,152.0,VES
275,Venezuela,Venezuelan Bolivar,Simón Rodrigues,M,Educator,2007,1854,153.0,VES
276,Venezuela,Venezuelan Bolivar,Ezequiel Zamora,M,Military,2018,1860,158.0,VES
277,Venezuela,Venezuelan Bolivar,Rafael Urdaneta,M,Head of Gov't,2018,1845,173.0,VES


### Project Ideas

- What proportion of individuals featured are male versus female?
	- Hint: Use `value_counts(normalize=True)` to calculate percentages.

- Are writers or politicians more commonly depicted?

- What percentage of featured individuals are musicians?

- What percentage of banknotes were issued before the person’s death?
	- Hint: Look for negative values or NaN in `first_death_diff`.

- Who is the oldest historical figure in the dataset?

- Which countries feature the oldest historical figures on their banknotes?
	- Hint: Group by country and aggregate the year of death using the median. Sort the results.

- What percentage of individuals died at least 100 years before appearing on a banknote?

- Which individuals appeared on a banknote just one year after their death?


In [None]:
# YOUR CODE HERE (add additional cells as needed)
# What proportion of individuals featured are male versus female?
# Hint: Use `value_counts(normalize=True)` to calculate percentages.

gender_proportion = df["gender"].value_counts(normalize=True) * 100
print(gender_proportion)

gender
M    78.008299
F    21.991701
Name: proportion, dtype: float64


In [5]:
# Are writers or politicians more commonly depicted?
writer_politician_counts = df["occupation"].value_counts()[["Writer", "Politician"]]
print(writer_politician_counts)

occupation
Writer        45
Politician    27
Name: count, dtype: int64


In [6]:
# What percentage of featured individuals are musicians?
musician_pct = (df["occupation"] == "Musician").mean() * 100
print(f"{musician_pct:.2f}% are musicians")

4.98% are musicians


In [7]:
# What percentage of banknotes were issued before the person’s death?
# Hint: Look for negative values or NaN in `first_death_diff`.
issued_before_death = df["first_death_diff"] < 0
nan_deaths = df["first_death_diff"].isna()
percentage = ((issued_before_death | nan_deaths).mean()) * 100
print(f"{percentage:.2f}% were issued before the person's death")

4.56% were issued before the person's death


In [16]:
# Who is the oldest historical figure in the dataset?
# Drop rows where 'first_death_diff' is missing
df_cleaned = df.dropna(subset=["first_death_diff"])

# Find the individual with the maximum time between death and banknote appearance
longest_after_death = df_cleaned.loc[df_cleaned["first_death_diff"].idxmax()]
print(longest_after_death[["name", "death", "first_appearance", "first_death_diff"]])

name                Hannibal
death                   -183
first_appearance        2013
first_death_diff      2196.0
Name: 234, dtype: object


In [None]:
# Which countries feature the oldest historical figures on their banknotes?
# Hint: Group by country and aggregate the year of death using the median. Sort the results.

# Convert 'death' to numeric (force errors to NaN)
df["death"] = pd.to_numeric(df["death"], errors="coerce")

# Drop rows with NaN in 'death' or 'country' before grouping
df_clean = df.dropna(subset=["death", "country"])

# Group by country and compute median death year
median_death = df_clean.groupby("country")["death"].median().sort_values()

# Display result
print(median_death.head(10))  # or .tail(10) for most recent

country
South Korea              1560.5
São Tomé and Príncipe    1779.5
Ukraine                  1817.5
United States            1826.0
Bolivia                  1839.5
Iceland                  1845.0
Venezuela                1849.5
Czech Republic           1869.0
Chile                    1879.0
Argentina                1888.0
Name: death, dtype: float64


In [None]:
# What percentage of individuals died at least 100 years before appearing on a banknote?
over_100 = df["first_death_diff"] >= 100
pct_over_100 = over_100.mean() * 100
print(f"{pct_over_100:.2f}% died at least 100 years before appearing on a banknote")

33.20% died at least 100 years before appearing on a banknote


In [None]:
#  Which individuals appeared on a banknote just one year after their death?

# Ensure numeric types
df["first_appearance"] = pd.to_numeric(df["first_appearance"], errors="coerce")
df["death"] = pd.to_numeric(df["death"], errors="coerce")

# Filter for individuals who appeared exactly 1 year after death
one_year_after = df[df["first_appearance"] - df["death"] == 1]

# Show relevant columns
print(one_year_after[["name", "death", "first_appearance"]])

                         name   death  first_appearance
63     Gabriel García Márquez  2014.0              2015
173  General Murtala Mohammed  1976.0              1977
190         Corazon C. Aquino  2009.0              2010
191           Manuel A. Roxas  1948.0              1949
