The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

## Understand the Raw Dataset

In [6]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Convert nobe.csv to Pandas dataframe
nobel_df = pd.read_csv('nobel.csv')

# Show the first few rows of a DataFrame
nobel_df.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France


## What is the most commonly awarded gender and birth country?

In [22]:
# Dropping rows with missing values from 'sex' and 'birth_country' columns
nobel_df = nobel_df.dropna(subset=['sex', 'birth_country'])

# Find the most commonly awarded gender
top_gender = nobel_df['sex'].value_counts(sort=True).idxmax()
print(f"The most commonly awarded gender is: {top_gender}")

# Find the most commonly awarded birth country
top_country = nobel_df['birth_country'].value_counts(sort=True).idxmax()
print(f"The most commonly awarded birth country is: {top_country}")

The most commonly awarded gender is: Male
The most commonly awarded birth country is: United States of America


## Which decade had the highest ratio of US-born Nobel Prize winners to total winners in all categories?

In [21]:
# Add new column called 'decade' based on the year
nobel_df['decade'] = (nobel_df['year'] // 10) * 10

# Filter US born winners
us_born = nobel_df[nobel_df['birth_country'] == 'United States of America']

# Calculate the total numbers of winners per decade
total_winners_per_decade = nobel_df.groupby('decade').size()

# Calculate the total numbers of US-born winners per decade
us_winners_per_decade = us_born.groupby('decade').size()

# Calculate the ratio of US-born Nobel Prize winners to total winners in all categories
ratio_per_decade = us_winners_per_decade / total_winners_per_decade
print(ratio_per_decade)

# Find the decade with the highest ratio
max_decade_usa = ratio_per_decade.idxmax()
highest_ratio = ratio_per_decade.max()
print(f"The decade with the highest ratio of US-born Nobel Prize winners is {max_decade_usa}, with the ratio of {highest_ratio * 100: .2f}%")

decade
1900    0.017857
1910    0.078947
1920    0.074074
1930    0.254545
1940    0.325000
1950    0.295775
1960    0.280000
1970    0.320388
1980    0.329787
1990    0.415842
2000    0.436975
2010    0.324786
2020    0.391304
dtype: float64
The decade with the highest ratio of US-born Nobel Prize winners is 2000, with the ratio of  43.70%


## Which decade and Nobel Prize category combination had the highest proportion of female laureates?

In [88]:
# Find the female laureates
female_laureates = nobel_df[nobel_df['sex'] == 'Female']

# Calculate the total numbers of female laureates per decade and category
female_counts = female_laureates.groupby(['decade', 'category']).size().reset_index(name='female_count')

# Calculate the total numbers of laureates per decade and category
total_counts = nobel_df.groupby(['decade', 'category']).size().reset_index(name='total_count')

# Merge female counts alongside to total_counts
merged_counts = pd.merge(female_counts, total_counts, on=['decade', 'category'])

# Calculate the proportion of female laureates
merged_counts['female_proportion'] = merged_counts['female_count'] / merged_counts['total_count']

# Find the maximum numbers of female proportion
max_female_proportion = merged_counts['female_proportion'].max()

# Find the corresponding row
max_female_row = merged_counts[merged_counts['female_proportion'] == max_female_proportion].iloc[0]

# Store the max numbers of female laureates as a dictionary
max_female_dict = {max_female_row['decade']: max_female_row['category']}
print(max_female_dict)

{2020: 'Literature'}


## Who was the first woman to receive a Nobel Prize, and in what category?

In [56]:
# Find the first female laureate
first_woman = nobel_df[nobel_df['sex'] == 'Female'].head(1)
first_woman_name = first_woman['full_name'].values[0]
first_woman_category = first_woman['category'].values[0]
print(f"The first woman to receive a Nobel Prize is {first_woman_name} which is in {first_woman_category}")

The first woman to receive a Nobel Prize is Marie Curie, née Sklodowska which is in Physics


## Which individuals or organizations have won more than one Nobel Prize throughout the years?

In [70]:
# Count the Nobel Prize by full name
prize_counts = nobel_df['full_name'].value_counts()

# Find idividuals that have more than one prize
multiple_prizes = prize_counts[prize_counts > 1]

# Create a list of full names of those with multiple prizes
repeat_list = multiple_prizes.index.tolist()
print(repeat_list)

['John Bardeen', 'Marie Curie, née Sklodowska', 'Linus Carl Pauling', 'Frederick Sanger']
