The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

In [81]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Load the dataset
# Updated the file path to the correct location of the dataset
nobel = pd.read_csv("data/nobel.csv")

# Get the most commonly awarded gender
top_gender = nobel['sex'].value_counts().idxmax()

# Get the most common birth country
top_country = nobel['birth_country'].value_counts().idxmax()

# Output results
print("Top gender:", top_gender)
print("Top country:", top_country)

Top gender: Male
Top country: United States of America


In [82]:
# Step 2: Flag USA-born winners and compute per-decade ratio

# Add USA-born flag
nobel["usa_born_winner"] = nobel["birth_country"] == "United States of America"

# Create a decade column
nobel["decade"] = (nobel["year"] // 10) * 10

# Group by decade and calculate ratios
decade_grouped = nobel.groupby("decade")["usa_born_winner"]
us_ratio_by_decade = decade_grouped.mean()  # This gives the ratio: mean of True/False values

# Find the decade with the highest USA-born winner ratio
max_decade_usa = us_ratio_by_decade.idxmax()

# Output
print("Decade with highest ratio of USA-born Nobel winners:", max_decade_usa)


Decade with highest ratio of USA-born Nobel winners: 2000


In [83]:
# Step 3: Find the decade/category with highest proportion of female laureates

# Add a female_winner flag
nobel["female_winner"] = nobel["sex"] == "Female"

# Group by decade and category, then calculate proportion of female winners
female_ratio = nobel.groupby(["decade", "category"])["female_winner"].mean()

# Find the index (decade, category) with the highest proportion
max_index = female_ratio.idxmax()

# Store result in the required dictionary format
max_female_dict = {max_index[0]: max_index[1]}

# Output
print("Max female laureate proportion:", max_female_dict)


Max female laureate proportion: {2020: 'Literature'}


In [84]:
# Filter the DataFrame for female winners
female_winners = nobel[nobel["sex"] == "Female"]

# Find the earliest female Nobel Prize winner
first_female_winner = female_winners.loc[female_winners["year"].idxmin()]

# Assign required variables
first_woman_name = first_female_winner["full_name"]
first_woman_category = first_female_winner["category"]

# Optional: print results
print("First woman's name:", first_woman_name)
print("First woman's category:", first_woman_category)


First woman's name: Marie Curie, née Sklodowska
First woman's category: Physics


In [85]:
# Count the number of times each winner has won
winner_counts = nobel['full_name'].value_counts()

# Select winners with counts of two or more
repeats = winner_counts[winner_counts >= 2].index.tolist()

# Create a variable named 'repeat_list'
repeat_list = repeats

# Output the list of repeat winners
print("Repeat winners:", repeat_list)

Repeat winners: ['Comité international de la Croix Rouge (International Committee of the Red Cross)', 'Linus Carl Pauling', 'John Bardeen', 'Frederick Sanger', 'Marie Curie, née Sklodowska', 'Office of the United Nations High Commissioner for Refugees (UNHCR)']
