# Nobel Prize analysis in SQL

## Import of data, libraries and making it ready to work with SQL

In [1]:
# import librariesa
import pandas as pd
import sqlite3

In [2]:
# install the ipython-sql libray
!pip install ipython-sql 

Defaulting to user installation because normal site-packages is not writeable


In [3]:
# read the csv file to pandas dataframe
df = pd.read_csv('nobel.csv')

In [4]:
# We will use sqlite3 library and create a connection
cnn = sqlite3.connect('jupyter_sql_nobel.db')
df.to_sql('nobel', cnn, if_exists='replace')
%load_ext sql
%sql sqlite:///jupyter_sql_nobel.db

## Check of the data

In [5]:
%%sql
/* check general information about the data*/
PRAGMA table_info(nobel);

 * sqlite:///jupyter_sql_nobel.db
Done.


cid,name,type,notnull,dflt_value,pk
0,index,INTEGER,0,,0
1,year,INTEGER,0,,0
2,category,TEXT,0,,0
3,prize,TEXT,0,,0
4,motivation,TEXT,0,,0
5,prize_share,TEXT,0,,0
6,laureate_id,INTEGER,0,,0
7,laureate_type,TEXT,0,,0
8,full_name,TEXT,0,,0
9,birth_date,TEXT,0,,0


In [6]:
%%sql
/* check first row to get idea how the data looks */
SELECT *
FROM nobel
LIMIT 1;

 * sqlite:///jupyter_sql_nobel.db
Done.


index,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions""",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany


In [7]:
%%sql
/* We are going to use birth_date and death_date in many queries; 
therefore, it is important to check that values are valid 
and we can do calculations with them using date and time functions. 
SQLite does not have a storage class set aside for storing dates and/or times. 
Instead, the built-in Date And Time Functions of SQLite are capable of storing
dates and times as TEXT, REAL, or INTEGER values.
See here: https://www.sqlite.org/datatype3.html */

        
SELECT
laureate_id,
birth_date,
strftime('%Y',birth_date) AS birth_strf
FROM nobel
WHERE birth_date IS NOT NULL AND birth_strf IS NULL;

 * sqlite:///jupyter_sql_nobel.db
Done.


laureate_id,birth_date,birth_strf
967,1993-00-00,
969,1955-00-00,
986,1949-00-00,
998,1967-00-00,
1004,1948-00-00,
1006,1961-00-00,
1007,1956-00-00,
1016,1954-00-00,
1029,1961-00-00,
1030,1943-00-00,


In [8]:
%%sql
/* Same check as in previous cell but for the death date*/

        
SELECT
laureate_id,
death_date,
strftime('%Y',death_date) AS death_strf
FROM nobel
WHERE death_date IS NOT NULL AND death_strf IS NULL;

 * sqlite:///jupyter_sql_nobel.db
Done.


laureate_id,death_date,death_strf


### Birth_date issue and its solution
We can see that death dates are not cousing problems but some birth dates has a format YYYY-00-00. This is not recognized by time and date functions (e.g.: strftime, julianday) as a date and returns NULL values instead of year if we use strftime('%Y',birth_date). Therefore, these birth dates cannot be used to calculate age.

To clean this, the problematic birth_date values has to be removed or the dates changed to valid date. I have decided for the latter. If we change YYYY-00-00 to YYYY-07-01 then the birth_date will be almost exactly in the middle of the year and there will be in the worst case half year error for each of the laureates with edited birth_date. However, this update of nobel table will be done later after we will check if there are some dates on which was born the most laureates.

### Checking of missing values

In [9]:
%%sql
/* count total number of rows and number of "NOT NULL" values 
in each column to get overview how complete are the data */

SELECT
    COUNT(*) AS total_nr,
    COUNT(year) AS year,
    COUNT(category) AS category,
    COUNT(prize) AS prize,
    COUNT(motivation) AS motivation,
    COUNT(prize_share) AS share,
    COUNT(laureate_id) AS laureate_id,
    COUNT(laureate_type) AS laureate_type,
    COUNT(full_name) AS full_name,
    COUNT(birth_date) AS birth_date,
    COUNT(birth_city) AS birth_city,
    COUNT(birth_country) AS birth_country,
    COUNT(sex) AS sex,
    COUNT(organization_name) AS organization_name,
    COUNT(organization_city) AS organization_city,
    COUNT(organization_country) AS organization_country,
    COUNT(death_date) AS death_date,
    COUNT(death_city) AS death_city,
    COUNT(death_country) AS death_country
FROM nobel;

 * sqlite:///jupyter_sql_nobel.db
Done.


total_nr,year,category,prize,motivation,share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
1000,1000,1000,1000,912,1000,1000,1000,1000,968,964,969,970,736,735,735,596,579,585


### Overview of missing values
We can see that currently (Fabruary 2024) was historically awarded 1000 Nobel Prizes. First column with significant amount of missing values is 'motivation'. Further, there are around 40 values missing in columns related to birth. Generally, it would not be surprising if the birth_date or place of birth would be unknow for few people. Most likely, where is missing birth_country there will be also missing information on birth_city, however, it should be checked if the same rows have also problem with missing birth_date. Next we can see that there is missing information on sex at 30 cases which can be due to the fact that some prizes were awarded to institutions. Then, there is significant amount of data missing in columns related to organization and it will be worth of finding out reasons for that. Finally, lots of values are missing in relation to deaths, probably a big portion of awarded people is still alive or the data are not up to date. However, it is not focus of this work to check if any death_date is missing incorrectly and we will just assume that missing death_date means that the person is still alive.

In [10]:
%%sql
/* check columns with missing data to find out if there is any reason for that
or if they have something in common. Lets check motivation column first */

SELECT *
FROM nobel
WHERE motivation IS NULL;

 * sqlite:///jupyter_sql_nobel.db
Done.


index,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France
9,1902,Peace,The Nobel Peace Prize 1902,,1/2,464,Individual,Élie Ducommun,1833-02-19,Geneva,Switzerland,Male,,,,1906-12-07,Bern,Switzerland
10,1902,Peace,The Nobel Peace Prize 1902,,1/2,465,Individual,Charles Albert Gobat,1843-05-21,Tramelan,Switzerland,Male,,,,1914-03-16,Bern,Switzerland
16,1903,Peace,The Nobel Peace Prize 1903,,1/1,466,Individual,William Randal Cremer,1828-03-18,Fareham,United Kingdom,Male,,,,1908-07-22,London,United Kingdom
24,1904,Peace,The Nobel Peace Prize 1904,,1/1,467,Organization,Institut de droit international (Institute of International Law),,,,,,,,,,
29,1905,Peace,The Nobel Peace Prize 1905,,1/1,468,Individual,"Baroness Bertha Sophie Felicita von Suttner, née Countess Kinsky von Chinic und Tettau",1843-06-09,Prague,Austrian Empire (Czech Republic),Female,,,,1914-06-21,Vienna,Austria
35,1906,Peace,The Nobel Peace Prize 1906,,1/1,470,Individual,Theodore Roosevelt,1858-10-27,"New York, NY",United States of America,Male,,,,1919-01-06,"Oyster Bay, NY",United States of America
40,1907,Peace,The Nobel Peace Prize 1907,,1/2,471,Individual,Ernesto Teodoro Moneta,1833-09-20,Milan,Austrian Empire (Italy),Male,,,,1918-02-10,Milan,Italy
41,1907,Peace,The Nobel Peace Prize 1907,,1/2,472,Individual,Louis Renault,1843-05-21,Autun,France,Male,Sorbonne University,Paris,France,1918-02-08,Barbizon,France


In [11]:
%%sql
/* It seems that motivation was not announced for The Nobel Peace Prizes until year 1990.
We can double check it by counting missing motivation within Peace category. Indeed, 88+912 = 1000*/
SELECT COUNT(*)
FROM nobel
WHERE motivation IS NULL AND category = 'Peace';

 * sqlite:///jupyter_sql_nobel.db
Done.


COUNT(*)
88


In [12]:
%%sql
/* Was there any Nobel Peace Prize with motivation before 1990? */
SELECT *
FROM nobel
WHERE category = 'Peace' AND motivation IS NOT NULL AND year<1990;

 * sqlite:///jupyter_sql_nobel.db
Done.


index,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
577,1987,Peace,The Nobel Peace Prize 1987,"""for his work for peace in Central America, efforts which led to the accord signed in Guatemala on August 7 this year""",1/1,549,Individual,Oscar Arias Sánchez,1941-09-13,Heredia,Costa Rica,Male,,,,,,


In [13]:
%%sql
/* check columns with missing data to find out if there is any reason for that
or if they have something in common. Lets check birth and sex related columns now */

SELECT COUNT(*)
FROM nobel
WHERE birth_date IS NULL
OR birth_city IS NULL
OR birth_country IS NULL
OR sex IS NULL;

 * sqlite:///jupyter_sql_nobel.db
Done.


COUNT(*)
38


In [14]:
%%sql
/* From earlier we know that there was 30-40 missing values 
in each of the columns related to birth or sex. 
From the result of previous query we can conclude that most of these 
missing values are occuring on the same rows.
Lets see full information if there is anything in common */

SELECT *
FROM nobel
WHERE birth_date IS NULL
OR birth_city IS NULL
OR birth_country IS NULL
OR sex IS NULL;

 * sqlite:///jupyter_sql_nobel.db
Done.


index,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
24,1904,Peace,The Nobel Peace Prize 1904,,1/1,467,Organization,Institut de droit international (Institute of International Law),,,,,,,,,,
60,1910,Peace,The Nobel Peace Prize 1910,,1/1,477,Organization,Bureau international permanent de la Paix (Permanent International Peace Bureau),,,,,,,,,,
89,1917,Peace,The Nobel Peace Prize 1917,,1/1,482,Organization,Comité international de la Croix Rouge (International Committee of the Red Cross),,,,,,,,,,
200,1938,Peace,The Nobel Peace Prize 1938,,1/1,503,Organization,Office international Nansen pour les Réfugiés (Nansen International Office for Refugees),,,,,,,,,,
215,1944,Peace,The Nobel Peace Prize 1944,,1/1,482,Organization,Comité international de la Croix Rouge (International Committee of the Red Cross),,,,,,,,,,
237,1947,Peace,The Nobel Peace Prize 1947,,1/2,508,Organization,Friends Service Council (The Quakers),,,,,,,,,,
238,1947,Peace,The Nobel Peace Prize 1947,,1/2,509,Organization,American Friends Service Committee (The Quakers),,,,,,,,,,
283,1954,Peace,The Nobel Peace Prize 1954,,1/1,515,Organization,Office of the United Nations High Commissioner for Refugees (UNHCR),,,,,,,,,,
348,1963,Peace,The Nobel Peace Prize 1963,,1/2,482,Organization,Comité international de la Croix Rouge (International Committee of the Red Cross),,,,,,,,,,
349,1963,Peace,The Nobel Peace Prize 1963,,1/2,523,Organization,Ligue des Sociétés de la Croix-Rouge (League of Red Cross Societies),,,,,,,,,,


In [15]:
%%sql
/* Mostly, the laureate_type is organization. 
In addition, organizations seems to be missing values 
at organization_name, organization_city, and organization_country column.
Lets check if any organization has a birth_date. */

SELECT *
FROM nobel
WHERE laureate_type LIKE 'Org%' AND birth_date IS NOT NULL;

 * sqlite:///jupyter_sql_nobel.db
Done.


index,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
435,1973,Peace,The Nobel Peace Prize 1973,,1/2,531,Organization,Le Duc Tho,1911-10-14,Nam Ha province,Vietnam,Male,,,,1990-10-13,Hanoi,Vietnam
501,1979,Peace,The Nobel Peace Prize 1979,,1/1,540,Organization,Mother Teresa,1910-08-26,Uskup (Skopje),Ottoman Empire (Republic of Macedonia),Female,,,,1997-09-05,Calcutta,India
598,1989,Peace,The Nobel Peace Prize 1989,,1/1,551,Organization,The 14th Dalai Lama (Tenzin Gyatso),1935-07-06,Taktser,Tibet (People's Republic of China),Male,,,,,,
618,1991,Peace,The Nobel Peace Prize 1991,"""for her non-violent struggle for democracy and human rights""",1/1,553,Organization,Aung San Suu Kyi,1945-06-19,Rangoon (Yangon),Burma (Myanmar),Female,,,,,,


In [16]:
%%sql
/* Now, we can look which individuals are missing birth_date */

SELECT *
FROM nobel
WHERE laureate_type LIKE 'Ind%' AND birth_date IS NULL;

 * sqlite:///jupyter_sql_nobel.db
Done.


index,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
816,2009,Chemistry,The Nobel Prize in Chemistry 2009,"""for studies of the structure and function of the ribosome""",1/3,841,Individual,Venkatraman Ramakrishnan,,"Chidambaram, Tamil Nadu",India,Male,MRC Laboratory of Molecular Biology,Cambridge,United Kingdom,,,
850,2011,Physics,The Nobel Prize in Physics 2011,"""for the discovery of the accelerating expansion of the Universe through observations of distant supernovae""",1/2,864,Individual,Saul Perlmutter,,"Champaign-Urbana, IL",United States of America,Male,Lawrence Berkeley National Laboratory,"Berkeley, CA",United States of America,,,


In [17]:
%%sql
/* check columns with missing data to find out if there is any reason for that
or if they have something in common. Finally, lets check columns related to 
organization_name, _city, and _country where was 735 records out of 1000. */

SELECT *
FROM nobel
WHERE organization_name IS NULL
OR organization_city IS NULL
OR organization_country IS NULL;

 * sqlite:///jupyter_sql_nobel.db
Done.


index,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composition, which gives evidence of lofty idealism, artistic perfection and a rare combination of the qualities of both heart and intellect""",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France
7,1902,Literature,The Nobel Prize in Literature 1902,"""the greatest living master of the art of historical writing, with special reference to his monumental work, <I>A history of Rome</I>""",1/1,571,Individual,Christian Matthias Theodor Mommsen,1817-11-30,Garding,Schleswig (Germany),Male,,,,1903-11-01,Charlottenburg,Germany
9,1902,Peace,The Nobel Peace Prize 1902,,1/2,464,Individual,Élie Ducommun,1833-02-19,Geneva,Switzerland,Male,,,,1906-12-07,Bern,Switzerland
10,1902,Peace,The Nobel Peace Prize 1902,,1/2,465,Individual,Charles Albert Gobat,1843-05-21,Tramelan,Switzerland,Male,,,,1914-03-16,Bern,Switzerland
14,1903,Literature,The Nobel Prize in Literature 1903,"""as a tribute to his noble, magnificent and versatile poetry, which has always been distinguished by both the freshness of its inspiration and the rare purity of its spirit""",1/1,572,Individual,Bjørnstjerne Martinus Bjørnson,1832-12-08,Kvikne,Norway,Male,,,,1910-04-26,Paris,France
16,1903,Peace,The Nobel Peace Prize 1903,,1/1,466,Individual,William Randal Cremer,1828-03-18,Fareham,United Kingdom,Male,,,,1908-07-22,London,United Kingdom
19,1903,Physics,The Nobel Prize in Physics 1903,"""in recognition of the extraordinary services they have rendered by their joint researches on the radiation phenomena discovered by Professor Henri Becquerel""",1/4,6,Individual,"Marie Curie, née Sklodowska",1867-11-07,Warsaw,Russian Empire (Poland),Female,,,,1934-07-04,Sallanches,France
21,1904,Literature,The Nobel Prize in Literature 1904,"""in recognition of the fresh originality and true inspiration of his poetic production, which faithfully reflects the natural scenery and native spirit of his people, and, in addition, his significant work as a Proven&ccedil;al philologist""",1/2,573,Individual,Frédéric Mistral,1830-09-08,Maillane,France,Male,,,,1914-03-25,Maillane,France


In [18]:
%%sql
/* As noticed earlier, organizations are suprisingly missing information
in organization_name, _city, and _country columns. Otherwise it seems that the most missing
values are related to Nobel Prizes in category of Peace and Literature.
Lets check which laureates in these two categories have values at organization columns */

SELECT *
FROM nobel
WHERE (category = 'Peace' OR
category = 'Literature')
AND (organization_name IS NOT NULL
OR organization_city IS NOT NULL
OR organization_country IS NOT NULL);

 * sqlite:///jupyter_sql_nobel.db
Done.


index,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
41,1907,Peace,The Nobel Peace Prize 1907,,1/2,472,Individual,Louis Renault,1843-05-21,Autun,France,Male,Sorbonne University,Paris,France,1918-02-08,Barbizon,France
161,1931,Peace,The Nobel Peace Prize 1931,,1/2,497,Individual,Nicholas Murray Butler,1862-04-02,"Elizabeth, NJ",United States of America,Male,Columbia University,"New York, NY",United States of America,1947-12-07,"New York, NY",United States of America
256,1950,Peace,The Nobel Peace Prize 1950,,1/1,511,Individual,Ralph Bunche,1904-08-07,"Detroit, MI",United States of America,Male,Harvard University,"Cambridge, MA",United States of America,1971-12-09,"New York, NY",United States of America
340,1962,Peace,The Nobel Peace Prize 1962,,1/1,217,Individual,Linus Carl Pauling,1901-02-28,"Portland, OR",United States of America,Male,California Institute of Technology (Caltech),"Pasadena, CA",United States of America,1994-08-19,"Big Sur, CA",United States of America


In [19]:
%%sql
/* Seems that almost all prizes in category of Peace and all in Literature 
were given to individuals or organizations who are not part of any organization. 
LEts briefly check how many prizes was given in each of the two categories 
to see if it is close to number of missing values (ca 265) in organization columns*/

SELECT COUNT(category) AS nr_prizes_in_peace_or_literature
FROM nobel
WHERE (category = 'Peace' OR
category = 'Literature');

 * sqlite:///jupyter_sql_nobel.db
Done.


nr_prizes_in_peace_or_literature
261


### Summary on missing values
We found out that motivation of Nobel Prize was not being announced for almost first 90 years of this prize existence. 

Further, most of the missing values related to birth and sex are connected to awarded organizations and few individuals. While there are only two individuals with missing birth date, the organizations are missing birth dates almost always. However, there are four exceptional records which are assigned as organizations but are clearly related to known individuals such as Dalai Lama or Mother Teresa. Note that missing birth dates in the case of organizations will be beneficial in following analysis when we will calculate lenght of life because it cannot be calculated without birth date; therefore, we can be sure all results will be related to individuals and not to organizations.

Finally, values missing in organization columns (_name, _city, _country) belongs to cases where laureate_type is organization or cases where category is peace or literature. Indeed, this makes sense and indicates that organizations in organization columns are scientific institutions where was working awarded individual. Nobel Prizes in peace and literature are not related to scientific research; therefore, there is no organization to add.

## Let's dive into interesting stuff
From now on the cells are focused on finding out interesting information.

### Who won the most Nobel Prizes?
We will approach this questions from different points of view. We will look on individuals as well as organizations or countries.

In [50]:
%%sql
/* Who won more than one Nobel Prize? */

SELECT full_name,
    COUNT(laureate_id) AS nr_of_prizes,
    sex,
    birth_date,
    death_date,
    birth_country,
    death_country
FROM nobel
GROUP BY laureate_id
HAVING COUNT(laureate_id)>1
ORDER BY nr_of_prizes DESC, birth_date DESC;

 * sqlite:///jupyter_sql_nobel.db
Done.


full_name,nr_of_prizes,sex,birth_date,death_date,birth_country,death_country
Comité international de la Croix Rouge (International Committee of the Red Cross),3,,,,,
K. Barry Sharpless,2,Male,1941-04-28,,United States of America,
Frederick Sanger,2,Male,1918-08-13,2013-11-19,United Kingdom,United Kingdom
John Bardeen,2,Male,1908-05-23,1991-01-30,United States of America,United States of America
Linus Carl Pauling,2,Male,1901-02-28,1994-08-19,United States of America,United States of America
"Marie Curie, née Sklodowska",2,Female,1867-11-07,1934-07-04,Russian Empire (Poland),France
Office of the United Nations High Commissioner for Refugees (UNHCR),2,,,,,


In [21]:
%%sql
/* Record holds Red Cross, lets check in which years it was awarded */

SELECT 
    full_name,
    year,
    category,
    prize_share    
FROM nobel
WHERE full_name LIKE "%Red Cross%";

 * sqlite:///jupyter_sql_nobel.db
Done.


full_name,year,category,prize_share
Comité international de la Croix Rouge (International Committee of the Red Cross),1917,Peace,1/1
Comité international de la Croix Rouge (International Committee of the Red Cross),1944,Peace,1/1
Comité international de la Croix Rouge (International Committee of the Red Cross),1963,Peace,1/2
Ligue des Sociétés de la Croix-Rouge (League of Red Cross Societies),1963,Peace,1/2


In [53]:
%%sql
/* From which countries are comming laureates mostly? */

SELECT birth_country, COUNT(*) AS nr_prizes_birth
FROM nobel
WHERE birth_country IS NOT NULL
GROUP BY birth_country
ORDER BY nr_prizes_birth DESC, birth_country ASC
LIMIT 15;

 * sqlite:///jupyter_sql_nobel.db
Done.


birth_country,nr_prizes_birth
United States of America,291
United Kingdom,91
Germany,67
France,58
Sweden,30
Japan,28
Canada,21
Netherlands,19
Switzerland,19
Italy,18


In [23]:
%%sql
/* How many laureates worked in other country than the country of birth? */

WITH country_difference AS(SELECT 
    CASE WHEN organization_country = birth_country THEN 'Birth and organization in the same country'
    WHEN organization_country <> birth_country THEN 'Birth and organization in different countries'
    WHEN organization_country IS NULL OR birth_country IS NULL THEN 'Birth or organization country unknown' 
    ELSE 'unexpected' END AS status 
    FROM nobel)
SELECT status,COUNT(*) AS number_of_prizes
FROM country_difference
GROUP BY status
ORDER BY number_of_prizes DESC;

 * sqlite:///jupyter_sql_nobel.db
Done.


status,number_of_prizes
Birth and organization in the same country,464
Birth and organization in different countries,271
Birth or organization country unknown,265


In [24]:
%%sql
/* What is the difference in number of awards by birth_country and organization_country? */

WITH organization AS (
    SELECT organization_country, COUNT(*) AS nr_prizes_org
    FROM nobel
    GROUP BY organization_country),
birth_nobel AS (
    SELECT birth_country, COUNT(*) AS nr_prizes_birth
    FROM nobel
    GROUP BY birth_country)
SELECT organization_country, nr_prizes_org, nr_prizes_birth, (nr_prizes_org-nr_prizes_birth) AS org_over_birth_nr
FROM organization AS org
JOIN birth_nobel AS birth
ON org.organization_country = birth.birth_country
ORDER BY org_over_birth_nr DESC;

 * sqlite:///jupyter_sql_nobel.db
Done.


organization_country,nr_prizes_org,nr_prizes_birth,org_over_birth_nr
United States of America,385,291,94
Switzerland,24,19,5
United Kingdom,93,91,2
Finland,1,2,-1
Portugal,1,2,-1
Argentina,2,4,-2
Denmark,9,12,-3
Belgium,5,9,-4
Ireland,1,5,-4
Australia,5,10,-5


In [56]:
%%sql
/* There are many famous universities and research institutes in the world. 
How they stand in the number of Nobel Prizes awarded to their scientist? */

SELECT
    organization_name, 
    organization_country,
    COUNT(*) AS prizes_nr
FROM nobel
WHERE organization_country IS NOT NULL
GROUP BY organization_name
HAVING COUNT(*) > 9
ORDER BY prizes_nr DESC;

 * sqlite:///jupyter_sql_nobel.db
Done.


organization_name,organization_country,prizes_nr
University of California,United States of America,36
Harvard University,United States of America,28
Stanford University,United States of America,22
Massachusetts Institute of Technology (MIT),United States of America,22
University of Chicago,United States of America,19
University of Cambridge,United Kingdom,17
Princeton University,United States of America,17
Columbia University,United States of America,17
California Institute of Technology (Caltech),United States of America,17
Rockefeller University,United States of America,13


In [58]:
%%sql
WITH prizes_org AS (
    SELECT COUNT(*) AS prizes_nr
    FROM nobel
    WHERE organization_country IS NOT NULL
    GROUP BY organization_name
    HAVING COUNT(*) > 9)
SELECT 
    COUNT(prizes_nr) AS nr_organizations_with_10_or_more_laureates,
    SUM(prizes_nr) AS total_nr_prizes_for_top_organizations
FROM prizes_org;

 * sqlite:///jupyter_sql_nobel.db
Done.


nr_organizations_with_10_or_more_laureates,total_nr_prizes_for_top_organizations
12,228


In [26]:
%%sql
/* As I am from Czech Republic, we will look up which laureates were from Czechia or worked in Czech organization */

SELECT 
    full_name,
    year AS year_awarded,
    strftime('%Y', death_date) AS year_of_death,
    ROUND((julianday(death_date) - julianday(birth_date))/365.2422,1) AS life_length,
    category,
    motivation,
    birth_country,
    organization_country
FROM nobel
WHERE birth_country LIKE '%Cz%'
OR organization_country LIKE '%Cz%'
ORDER BY year_awarded ASC;

 * sqlite:///jupyter_sql_nobel.db
Done.


full_name,year_awarded,year_of_death,life_length,category,motivation,birth_country,organization_country
"Baroness Bertha Sophie Felicita von Suttner, née Countess Kinsky von Chinic und Tettau",1905,1914.0,71.0,Peace,,Austrian Empire (Czech Republic),
Carl Ferdinand Cori,1947,1984.0,87.9,Medicine,"""for their discovery of the course of the catalytic conversion of glycogen""",Austria-Hungary (Czech Republic),United States of America
"Gerty Theresa Cori, née Radnitz",1947,1957.0,61.2,Medicine,"""for their discovery of the course of the catalytic conversion of glycogen""",Austria-Hungary (Czech Republic),United States of America
Jaroslav Heyrovsky,1959,1967.0,76.3,Chemistry,"""for his discovery and development of the polarographic methods of analysis""",Austria-Hungary (Czech Republic),Czechoslovakia
Jaroslav Seifert,1984,1986.0,84.3,Literature,"""for his poetry which endowed with freshness, sensuality and rich inventiveness provides a liberating image of the indomitable spirit and versatility of man""",Austria-Hungary (Czech Republic),
Peter Grünberg,2007,,,Physics,"""for the discovery of Giant Magnetoresistance""",Czechoslovakia (Czech Republic),Germany


### Is there a day or month to be born which makes you more likely to get Nobel Prize?

In [27]:
%%sql
/* First, lets look on birthdays. */

SELECT
strftime('%m/%d',birth_date) AS birthday,
COUNT(*) AS nr_people
FROM nobel
WHERE birthday IS NOT NULL
GROUP BY birthday
ORDER BY nr_people DESC;

 * sqlite:///jupyter_sql_nobel.db
Done.


birthday,nr_people
02/28,8
10/10,7
06/28,7
05/21,7
12/11,6
11/30,6
11/07,6
10/30,6
10/02,6
09/30,6


In [28]:
%%sql
/* There are no dates which would be outliers. Lets look, how are statistics by months */

SELECT
strftime('%m',birth_date) AS birthday_month,
COUNT(*) AS nr_people
FROM nobel
WHERE birthday_month IS NOT NULL
GROUP BY birthday_month
ORDER BY nr_people DESC;

 * sqlite:///jupyter_sql_nobel.db
Done.


birthday_month,nr_people
9,91
6,90
10,88
5,87
8,83
7,79
12,78
4,77
3,77
1,74


In [29]:
%%sql
/* How are statistics by day of month? */

SELECT
strftime('%d',birth_date) AS birthday_day,
COUNT(*) AS nr_people
FROM nobel
WHERE birthday_day IS NOT NULL
GROUP BY birthday_day
ORDER BY nr_people DESC;

 * sqlite:///jupyter_sql_nobel.db
Done.


birthday_day,nr_people
23,42
15,41
30,39
28,39
19,37
1,36
22,35
7,34
27,33
21,33


In [30]:
%%sql
/* Finally, how are statistics by day of week? */

SELECT
  case cast (strftime('%w', birth_date) as integer)
  WHEN 0 then 'Sunday'
  WHEN 1 then 'Monday'
  WHEN 2 then 'Tuesday'
  WHEN 3 then 'Wednesday'
  WHEN 4 then 'Thursday'
  WHEN 5 then 'Friday'
  ELSE 'Saturday' END AS birth_weekday,
COUNT(*) AS nr_people
FROM nobel
WHERE birth_date IS NOT NULL
GROUP BY birth_weekday
ORDER BY nr_people DESC;

 * sqlite:///jupyter_sql_nobel.db
Done.


birth_weekday,nr_people
Saturday,165
Tuesday,145
Wednesday,139
Monday,135
Thursday,133
Friday,130
Sunday,121


### Summary for birthdays
We have found that 28th February was the most common birthday between all laureates. Exactly eight people was born on that day. Nevertheless, there are many other dates on which was celebrating birthdays seven or six laureates.

Statistics for months are more interesting. It is no surprise that the least laureates was born in the shortest month, February. However, the difference from other months is too large to be explained by 2-3 missing days in comparison to other months. Clearly, we would have to take in count statistics for birth rate to find out if there is born statisticaly more Nobel laureates in any month. But let's take it easy and look the numbers as they are. Most of laureates was born in June and September which have actually only 30 days. 

Further, there are quite large differences between days of month. The highest number of laureates was born on 23rd (42 laureates). One would expect that the lowest number would be born on 31st (24 laureates) but the lowest number was actually born on 17th (17 laureates). There is no reason to think that any date is better than other and the differences seems to follow normal distribution. We can assume that numbers for each date would be more similar if there would multiple times more laureates than we have today.

Finally, for days of week we have only seven options and with almost 1000 of individuals with valid birth date we would expect that numbers for each day could be very similar. Nevertheless, there is much larger amount of laureates born on Saturdays, exactly 165 which is 20 more then on the second most popular day, Tuesday. Then it seems not lucky to be born on Sunday if you want to win a Nobel Prize. On the other hand, as discussed above, the number of laureates is not huge and differences between days of week can be random.

In [31]:
%%sql
/* As we have analyse statistics related to exact dates of birth, we can now
do the above discussed update for birth dates which has a format YYYY-00-00. 
We will substitute the 'month-day' part for first July which is almost exactly in the middle of the year.
Therefore, the new arbitrary dates will cause maximum six month error in the further calculations.
First, we can check which laureates will be edited. */

SELECT 
birth_date,
laureate_id,
strftime('%Y',birth_date) AS year_born,
full_name
FROM nobel
WHERE birth_date IS NOT NULL AND year_born IS NULL
ORDER BY birth_date;

 * sqlite:///jupyter_sql_nobel.db
Done.


birth_date,laureate_id,year_born,full_name
1943-00-00,1030,,Louis Brus
1945-00-00,1031,,Aleksey Yekimov
1946-00-00,1034,,Claudia Goldin
1948-00-00,1004,,Abdulrazak Gurnah
1949-00-00,986,,Michael Houghton
1954-00-00,1016,,Morten Meldal
1955-00-00,969,,Paul M. Romer
1956-00-00,1007,,David Card
1961-00-00,1006,,Dmitry Muratov
1961-00-00,1029,,Moungi Bawendi


In [32]:
%%sql
/* Do the update of birth_date! */

UPDATE nobel
SET birth_date = substr(birth_date, 1, 4) || '-07-01'
WHERE laureate_id IN (SELECT laureate_id FROM nobel WHERE birth_date IS NOT NULL AND strftime('%Y',birth_date) IS NULL);

 * sqlite:///jupyter_sql_nobel.db
Done.


[]

In [33]:
%%sql
/* Check the result of update. There should be no record for the conditions now. */

SELECT 
birth_date,
strftime('%Y',birth_date) AS year_born,
full_name
FROM nobel
WHERE birth_date IS NOT NULL AND year_born IS NULL
ORDER BY birth_date;

 * sqlite:///jupyter_sql_nobel.db
Done.


birth_date,year_born,full_name


In [34]:
%%sql
/* Now we can check laureates born on first July. */

SELECT 
birth_date,
strftime('%m/%d',birth_date) AS date_born,
full_name
FROM nobel
WHERE date_born = '07/01'
ORDER BY birth_date;

 * sqlite:///jupyter_sql_nobel.db
Done.


birth_date,date_born,full_name
1879-07-01,07/01,Léon Jouhaux
1927-07-01,07/01,Robert W. Fogel
1929-07-01,07/01,Gerald M. Edelman
1941-07-01,07/01,Alfred G. Gilman
1941-07-01,07/01,Myron S. Scholes
1943-07-01,07/01,Louis Brus
1945-07-01,07/01,Aleksey Yekimov
1946-07-01,07/01,Claudia Goldin
1948-07-01,07/01,Abdulrazak Gurnah
1949-07-01,07/01,Michael Houghton


### What is the average age of Nobel laureates and other age related statistics
Now, our data were cleaned in the birth_date column and we can use it to calculate age related statistics.

In [35]:
%%sql
/* Who was the earliest and the latest born laureate so far? */
SELECT
    full_name,
    sex,
    category,
    birth_country,
    birth_date,
    year AS year_awarded,
    motivation
FROM nobel
WHERE birth_date = (SELECT MIN(birth_date) FROM nobel) 
OR birth_date = (SELECT MAX(birth_date) FROM nobel);

 * sqlite:///jupyter_sql_nobel.db
Done.


full_name,sex,category,birth_country,birth_date,year_awarded,motivation
Christian Matthias Theodor Mommsen,Male,Literature,Schleswig (Germany),1817-11-30,1902,"""the greatest living master of the art of historical writing, with special reference to his monumental work, <I>A history of Rome</I>"""
Malala Yousafzai,Female,Peace,Pakistan,1997-07-12,2014,"""for their struggle against the suppression of children and young people and for the right of all children to education"""


In [36]:
%%sql
/* What is average age of laureates by category since year 1901? */

SELECT 
    category,
    ROUND(AVG(year - strftime('%Y', birth_date)),1) AS age_when_awarded
FROM nobel
GROUP BY category
ORDER BY age_when_awarded DESC;

 * sqlite:///jupyter_sql_nobel.db
Done.


category,age_when_awarded
Economics,66.9
Literature,65.0
Peace,60.8
Chemistry,59.1
Medicine,58.7
Physics,57.3


In [37]:
%%sql
/* What was average age of laureates when they were awarded and how was this number changing in time (by decades)? */

SELECT
  CAST(FLOOR(year / 10) * 10 AS TEXT) || 's' AS decade,
    ROUND(AVG(year - strftime('%Y', birth_date)),1) AS age_when_awarded
FROM nobel
GROUP BY decade;

 * sqlite:///jupyter_sql_nobel.db
Done.


decade,age_when_awarded
1900s,57.8
1910s,52.8
1920s,55.0
1930s,51.6
1940s,58.0
1950s,54.3
1960s,56.5
1970s,59.4
1980s,60.5
1990s,62.1


In [38]:
%%sql
/* Lets check, how was the average age in each category changing through the decades. */

SELECT
  CAST(FLOOR(year / 10) * 10 AS TEXT) || 's' AS decade,
    ROUND(AVG(CASE WHEN category='Physics' THEN year-strftime('%Y', birth_date) ELSE NULL END),1) AS Physics,
    ROUND(AVG(CASE WHEN category='Chemistry' THEN year-strftime('%Y', birth_date) ELSE NULL END),1) AS Chemistry,
    ROUND(AVG(CASE WHEN category='Medicine' THEN year-strftime('%Y', birth_date) ELSE NULL END),1) AS Medicine,
    ROUND(AVG(CASE WHEN category='Literature' THEN year-strftime('%Y', birth_date) ELSE NULL END),1) AS Literature,
    ROUND(AVG(CASE WHEN category='Peace' THEN year-strftime('%Y', birth_date) ELSE NULL END),1) AS Peace,
    ROUND(AVG(CASE WHEN category='Economics' THEN year-strftime('%Y', birth_date) ELSE NULL END),1) AS Economics
FROM nobel
GROUP BY decade;

 * sqlite:///jupyter_sql_nobel.db
Done.


decade,Physics,Chemistry,Medicine,Literature,Peace,Economics
1900s,49.2,51.0,56.0,64.9,67.3,
1910s,48.1,49.0,49.2,59.0,61.8,
1920s,45.6,52.3,53.9,60.1,64.1,
1930s,41.3,46.0,54.7,56.4,64.1,
1940s,51.1,54.4,56.0,64.3,75.8,
1950s,49.8,53.0,51.8,63.7,63.7,
1960s,50.2,55.8,55.0,67.0,59.0,70.0
1970s,53.7,61.9,56.7,67.3,56.8,67.0
1980s,59.4,56.2,60.7,67.6,56.4,67.9
1990s,60.0,63.1,60.5,67.5,58.7,65.5


In [60]:
%%sql
/* Who was the oldest when awarded? */

SELECT 
    year - strftime('%Y', birth_date) AS age_when_awarded,
    full_name,
    year,
    category,
    motivation,
    birth_country,
    organization_country
FROM nobel
WHERE age_when_awarded IS NOT NULL
ORDER BY age_when_awarded DESC
LIMIT 5;

 * sqlite:///jupyter_sql_nobel.db
Done.


age_when_awarded,full_name,year,category,motivation,birth_country,organization_country
97,John Goodenough,2019,Chemistry,"""for the development of lithium-ion batteries""",Germany,United States of America
96,Arthur Ashkin,2018,Physics,"""for the optical tweezers and their application to biological systems""",USA,United States of America
90,Leonid Hurwicz,2007,Economics,"""for having laid the foundations of mechanism design theory""",Russia,United States of America
90,Syukuro Manabe,2021,Physics,"""for the physical modelling of Earth’s climate quantifying variability and reliably predicting global warming""",Japan,United States of America
90,Klaus Hasselmann,2021,Physics,"""for the physical modelling of Earth’s climate quantifying variability and reliably predicting global warming""",Germany,Germany


In [61]:
%%sql
/* Who was the youngest when awarded? */

SELECT 
    year - strftime('%Y', birth_date) AS age_when_awarded,
    full_name,
    year,
    category,
    motivation,
    birth_country,
    organization_country
FROM nobel
WHERE age_when_awarded IS NOT NULL
ORDER BY age_when_awarded ASC
LIMIT 5;

 * sqlite:///jupyter_sql_nobel.db
Done.


age_when_awarded,full_name,year,category,motivation,birth_country,organization_country
17,Malala Yousafzai,2014,Peace,"""for their struggle against the suppression of children and young people and for the right of all children to education""",Pakistan,
25,William Lawrence Bragg,1915,Physics,"""for their services in the analysis of crystal structure by means of X-rays""",Australia,United Kingdom
25,Nadia Murad,2018,Peace,"""for their efforts to end the use of sexual violence as a weapon of war and armed conflict""",Iraq,
31,Werner Karl Heisenberg,1932,Physics,"""for the creation of quantum mechanics, the application of which has, inter alia, led to the discovery of the allotropic forms of hydrogen""",Germany,Germany
31,Paul Adrien Maurice Dirac,1933,Physics,"""for the discovery of new productive forms of atomic theory""",United Kingdom,United Kingdom


In [41]:
%%sql
/* Which Nobel laureates had the shortest life? */

SELECT 
    ROUND((julianday(death_date) - julianday(birth_date))/365.2422,1) AS life_length,
    full_name,
    year AS year_awarded,
    strftime('%Y', death_date) AS year_of_death,
    category,
    motivation,
    birth_country,
    organization_country
FROM nobel
WHERE life_length IS NOT NULL
ORDER BY life_length ASC
LIMIT 5;

 * sqlite:///jupyter_sql_nobel.db
Done.


life_length,full_name,year_awarded,year_of_death,category,motivation,birth_country,organization_country
39.2,Martin Luther King Jr.,1964,1968,Peace,,United States of America,
43.8,Niels Ryberg Finsen,1903,1904,Medicine,"""in recognition of his contribution to the treatment of diseases, especially lupus vulgaris, with concentrated light radiation, whereby he has opened a new avenue for medical science""",Faroe Islands (Denmark),Denmark
46.2,Albert Camus,1957,1960,Literature,"""for his important literary production, which with clear-sighted earnestness illuminates the problems of the human conscience in our times""",French Algeria (Algeria),
46.9,Pierre Curie,1903,1906,Physics,"""in recognition of the extraordinary services they have rendered by their joint researches on the radiation phenomena discovered by Professor Henri Becquerel""",France,France
48.6,Carl von Ossietzky,1935,1938,Peace,,Germany,


In [42]:
%%sql
/* Which of deceased Nobel laureates had the longest life so far? */

SELECT 
    ROUND((julianday(death_date) - julianday(birth_date))/365.2422,1) AS life_length,
    full_name,
    year AS year_awarded,
    strftime('%Y', death_date) AS year_of_death,
    category,
    motivation,
    birth_country,
    organization_country
FROM nobel
WHERE life_length IS NOT NULL
ORDER BY life_length DESC
LIMIT 5;

 * sqlite:///jupyter_sql_nobel.db
Done.


life_length,full_name,year_awarded,year_of_death,category,motivation,birth_country,organization_country
103.7,Rita Levi-Montalcini,1986,2012,Medicine,"""for their discoveries of growth factors""",Italy,Italy
102.7,Ronald H. Coase,1991,2013,Economics,"""for his discovery and clarification of the significance of transaction costs and property rights for the institutional structure and functioning of the economy""",United Kingdom,United States of America
100.9,John Goodenough,2019,2023,Chemistry,"""for the development of lithium-ion batteries""",Germany,United States of America
99.5,Charles Hard Townes,1964,2015,Physics,"""for fundamental work in the field of quantum electronics, which has led to the construction of oscillators and amplifiers based on the maser-laser principle""",United States of America,United States of America
99.4,Maurice Allais,1988,2010,Economics,"""for his pioneering contributions to the theory of markets and efficient utilization of resources""",France,France


In [43]:
%%sql
/* Who are the oldest currently living laureates and how old are they? */

SELECT 
    full_name,
    birth_date,
    year AS year_awarded,
    ROUND(year - strftime('%Y', birth_date),0) AS age_when_awarded,
    ROUND((julianday('now') - julianday(birth_date))/365.2422,1) AS current_age,
    category,
    motivation,
    birth_country,
    organization_country
FROM nobel
WHERE death_date IS NULL AND birth_date IS NOT NULL
ORDER BY current_age DESC
LIMIT 5;

 * sqlite:///jupyter_sql_nobel.db
Done.


full_name,birth_date,year_awarded,age_when_awarded,current_age,category,motivation,birth_country,organization_country
Paul D. Boyer,1918-07-31,1997,79.0,105.6,Chemistry,"""for their elucidation of the enzymatic mechanism underlying the synthesis of adenosine triphosphate (ATP)""",United States of America,United States of America
Jens C. Skou,1918-10-08,1997,79.0,105.4,Chemistry,"""for the first discovery of an ion-transporting enzyme, Na+, K+ -ATPase""",Denmark,Denmark
Nicolaas Bloembergen,1920-03-11,1981,61.0,104.0,Physics,"""for their contribution to the development of laser spectroscopy""",Netherlands,United States of America
Edmond H. Fischer,1920-04-06,1992,72.0,103.9,Medicine,"""for their discoveries concerning reversible protein phosphorylation as a biological regulatory mechanism""",China,United States of America
Jack Steinberger,1921-05-25,1988,67.0,102.8,Physics,"""for the neutrino beam method and the demonstration of the doublet structure of the leptons through the discovery of the muon neutrino""",Germany,Switzerland


In [44]:
%%sql
/* Unfortunately, after quick check of nymes by google, it seems that all,
supposedly living laureates, in the previous table are actually deceased. 
This means that death dates are not updated frequently or properly in the Noble dataset
and we cannot rely that missing values means that laureate is still alive.

However, lets assume for a moment that the Nobel dataset is up to date and count 
how many living laureates are over age of 90. */

SELECT
    COUNT(*) AS living_winners_over_age_90
FROM nobel
WHERE death_date IS NULL 
AND birth_date IS NOT NULL 
AND ((julianday('now') - julianday(birth_date))/365.2422)>90;

 * sqlite:///jupyter_sql_nobel.db
Done.


living_winners_over_age_90
90


In [66]:
%%sql
/* As shown earlier, most people becomes laureates in a age where most of us finishing our careers.
Lets look who enjoyed to be Nobel laureate longest. 
In other words who lived (or still lives) longest after being awarded. 
Note: M.Eigen and M. Gell-Mann does not have death date in the Nobel dataset 
and they were removed after checking their death dates and calculating real value 
for years_lived_since_awarded which was too low to provide them place in this "TOP 10" */

SELECT 
    full_name,
    birth_date,
    year AS year_awarded,
    ROUND(year - strftime('%Y', birth_date),0) AS age_when_awarded,
    CASE WHEN death_date IS NULL THEN 'alive'
    WHEN death_date IS NOT NULL THEN 'deceased' END AS dead_or_alive,
    CASE WHEN death_date IS NULL THEN ROUND((julianday('now') - julianday(birth_date))/365.2422,1)
    WHEN death_date IS NOT NULL THEN ROUND((julianday(death_date) - julianday(birth_date))/365.2422,1) END AS life_length,
    CASE WHEN death_date IS NULL THEN (strftime('%Y', 'now') - year)
    WHEN death_date IS NOT NULL THEN (strftime('%Y', death_date) - year) END AS years_lived_since_awarded,
    category
FROM nobel
WHERE birth_date IS NOT NULL AND full_name NOT IN ('Manfred Eigen', 'Murray Gell-Mann')
ORDER BY years_lived_since_awarded DESC
LIMIT 10;

 * sqlite:///jupyter_sql_nobel.db
Done.


full_name,birth_date,year_awarded,age_when_awarded,dead_or_alive,life_length,years_lived_since_awarded,category
Chen Ning Yang,1922-09-22,1957,35.0,alive,101.5,67,Physics
Tsung-Dao (T.D.) Lee,1926-11-24,1957,31.0,alive,97.3,67,Physics
James Dewey Watson,1928-04-06,1962,34.0,alive,95.9,62,Medicine
Prince Louis-Victor Pierre Raymond de Broglie,1892-08-15,1929,37.0,deceased,94.6,58,Physics
William Lawrence Bragg,1890-03-31,1915,25.0,deceased,81.3,56,Physics
Adolf Friedrich Johann Butenandt,1903-03-24,1939,36.0,deceased,91.8,56,Chemistry
Archibald Vivian Hill,1886-09-26,1922,36.0,deceased,90.7,55,Medicine
Carl David Anderson,1905-09-03,1936,31.0,deceased,85.4,55,Physics
Frederick Sanger,1918-08-13,1958,40.0,deceased,95.3,55,Chemistry
Karl Manne Georg Siegbahn,1886-12-03,1924,38.0,deceased,91.8,54,Physics


## Conclusions
In this notebook, we employed SQLite to analyse Nobel Prize Dataset. Data was imported from csv file and database was created. Then we checked data if any cleaning is necessary. This was done by focusing on birth and death dates and also by checking count of missing values in each column. Mostly, there was simple and clear explanations for missing values but there were some faulty birth dates which we substituted (updated) for first July. In the analysis, the main focus was on number of laureates and organizations awarded in different countries and later we calculated various statistics related to age. The effort to find the oldest currently living laureate helped us to find out that death dates in the Nobel dataset are often missing.

In summary, we can say that USA is the most successful country from the point of laureates born or working in there. Also, it seems that laureates are usually awarded in rather older age and quite a number of them lived very long lives. 