## Nobel Prize Winner Analysis

Welcome to the Nobel Prize Winner Analysis project! The Nobel Prize has been one of the most esteemed international awards since its inception in 1901. Each year, laureates are recognized for their outstanding contributions in various fields including chemistry, literature, physics, physiology or medicine, economics, and peace. Apart from the honor and substantial prize money, recipients are awarded a prestigious gold medal featuring the image of Alfred Nobel (1833 - 1896), the founder of the prize.

For this project, we'll be exploring a dataset containing information about all Nobel Prize winners from 1901 to 2023. The dataset, sourced from the Nobel Prize API, is available in the `nobel.csv` file located in the `data` folder.

Throughout this analysis, we'll delve into several questions related to the Nobel Prize winners' data. We encourage you to explore additional questions that pique your interest!


In [3]:
# URL of the CSV file
csv_url = "https://raw.githubusercontent.com/hasperjiga/Data_Analyst_Portfolio/main/nobel.csv"


# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

## Loading in data and previewing metadata
nobel=pd.read_csv('/content/nobel.csv')
print(nobel.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   year                  1000 non-null   int64 
 1   category              1000 non-null   object
 2   prize                 1000 non-null   object
 3   motivation            912 non-null    object
 4   prize_share           1000 non-null   object
 5   laureate_id           1000 non-null   int64 
 6   laureate_type         1000 non-null   object
 7   full_name             1000 non-null   object
 8   birth_date            968 non-null    object
 9   birth_city            964 non-null    object
 10  birth_country         969 non-null    object
 11  sex                   970 non-null    object
 12  organization_name     736 non-null    object
 13  organization_city     735 non-null    object
 14  organization_country  735 non-null    object
 15  death_date            596 non-null    o

In [9]:
## What is the most commonly awarded gender and birth country?

# Counting the number of Nobel laureates by gender
top_gender = nobel['sex'].value_counts().index[0]
print('The gender with the most Nobel laureates is:', top_gender)

# Counting the number of Nobel laureates by birth country
top_country = nobel['birth_country'].value_counts().index[0]
print('The country with the most Nobel laureates is:', top_country)



The gender with the most Nobel laureates is: Male
The country with the most Nobel laureates is: United States of America


In [5]:
# Which decade had the highest ratio of US-born Nobel Prize winners to total winners in all categories?

# Creating a new column 'decade' to represent the decade of each Nobel Prize win
nobel['decade'] = (np.floor(nobel['year'] / 10) * 10).astype(int)

# Creating a new column 'usborn' to identify whether a laureate was born in the United States
nobel['usborn'] = nobel['birth_country'] == 'United States of America'

# Grouping by decade and calculating the proportion of US-born winners for each decade
prop_usa_winners = nobel.groupby('decade', as_index=False)['usborn'].mean()

# Finding the decade with the highest proportion of US-born winners
max_decade_usa = prop_usa_winners[prop_usa_winners['usborn'] == prop_usa_winners['usborn'].max()]['decade'].values[0]

print('The decade with the highest ratio of US-born Nobel Prize winners was:', max_decade_usa)


The decade with highest ratio of US-born Nobel prize winners was 2000


In [12]:
# Which decade and Nobel Prize category combination had the highest proportion of female laureates?

# Creating a new column 'female' to identify whether a laureate is female
nobel['female'] = nobel['sex'] == 'Female'

# Grouping by decade and category and calculating the proportion of female laureates for each combination
femaleratio = nobel.groupby(['decade', 'category'], as_index=False)['female'].mean()

# Finding the decade and category with the highest proportion of female laureates
max_female = femaleratio[femaleratio['female'] == femaleratio['female'].max()]

# Creating a dictionary to store the decade and category combination with the highest proportion of female laureates
max_female_dict = {max_female['decade'].values[0]: max_female['category'].values[0]}


In [10]:
# Who was the first woman to receive a Nobel Prize, and in what category?

# Filtering the Nobel Prize winners to include only females
fnobel = nobel[nobel['sex'] == 'Female']

# Finding the first woman to receive a Nobel Prize and her category
first_woman_name = fnobel[fnobel['year'] == fnobel['year'].min()]['full_name'].iloc[0]
print('The first woman to receive a Nobel Prize is:', first_woman_name)

first_woman_category = fnobel[fnobel['year'] == fnobel['year'].min()]['category'].iloc[0]
print('The category for the first woman to receive a Nobel Prize is:', first_woman_category)


The first woman to receive a Nobel Prize is: Marie Curie, née Sklodowska
The category for the first woman to receive a Nobel Prize is: Physics


In [13]:
# Which individuals or organizations have won more than one Nobel Prize throughout the years?

# Counting the number of Nobel Prizes won by each individual or organization
count = nobel['full_name'].value_counts()

# Creating a list of individuals or organizations that have won more than one Nobel Prize
repeat_list = list(count[count > 1].index)

# Printing the list of repeat Nobel Prize winners
print('Individuals or organizations that have won more than one Nobel Prize:', repeat_list)


Individuals or organizations that have won more than one Nobel Prize: ['Comité international de la Croix Rouge (International Committee of the Red Cross)', 'Linus Carl Pauling', 'John Bardeen', 'Frederick Sanger', 'Marie Curie, née Sklodowska', 'Office of the United Nations High Commissioner for Refugees (UNHCR)']
