# Nobel API - check Wikidata and Nobel API
version. 0.42 added video and view statistics for Nobelweek 5-12 October 2020

The purpose of this Notebook is to have Nobelprize winners in Wikidata in synch with the prizewinners at Nobelprize... lesson learned is that we also find vandalization in Wikidata....


* This [Jupyter Notebook](https://tinyurl.com/SynchWSNobel)  


  * Wikidata property [Property:P8024](https://www.wikidata.org/wiki/Property:P8024)
  * [Developer zone nobelprize.org](https://www.nobelprize.org/about/developer-zone-2/)
      * maybe better API is [api.nobelprize.org/2.0/laureates](https://api.nobelprize.org/2.0/laureates)
  * [video](https://youtu.be/Iu1JtefueM8) explaining what we do
      
#### Other sources we sync
* Famous people on Uppsala old cemetery [Kulturpersoner Uppsalakyrkogård](https://github.com/salgo60/open-data-examples/blob/master/Check%20WD%20kulturpersoner%20uppsalakyrkogardar.ipynb)
* Swedish Literature Bank - [Litteraturbanken](https://github.com/salgo60/open-data-examples/blob/master/Litteraturbanken%20Author.ipynb) 
  * WD property [P5101](https://www.wikidata.org/wiki/Property_talk:P5101) [P5123](https://www.wikidata.org/wiki/Property_talk:P5123)
* [Nobelprize.org](https://github.com/salgo60/open-data-examples/blob/master/Nobel%20API.ipynb)
  * WD [property 8024](https://www.wikidata.org/wiki/Property:P8024)
* The Swedish National Archive [SBL](https://github.com/salgo60/open-data-examples/blob/master/SBL.ipynb) 
  * WD [property 3217](https://www.wikidata.org/wiki/Property:P3217) 
* The Biographical Dictionary of Swedish Women [SKBL](https://github.com/salgo60/open-data-examples/blob/master/Svenskt%20Kvinnobiografiskt%20lexikon%20part%203.ipynb)
  * WD [property 4963](https://www.wikidata.org/wiki/Property:P4963)
* Swedish Academy - [Svenska Akademien](https://github.com/salgo60/open-data-examples/blob/master/Svenska%20Akademien.ipynb) 
  * WD [property 5325](https://www.wikidata.org/wiki/Property:P5325) 


In [1]:
from datetime import datetime
start_time = datetime.now()
print("Last run: ", datetime.now())

Last run:  2020-10-12 12:13:18.575725


In [2]:
import urllib3, json
import pandas as pd 
http = urllib3.PoolManager() 

url ="https://api.nobelprize.org/2.0/laureates?offset=0&limit=1500"
dftot = pd.DataFrame()
r = http.request('GET', url)
data = json.loads(r.data)
dftot = dftot.append(pd.DataFrame(data["laureates"]),sort=False)


In [3]:
dftot.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 955 entries, 0 to 954
Data columns (total 18 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   id               955 non-null    object
 1   knownName        930 non-null    object
 2   givenName        930 non-null    object
 3   familyName       928 non-null    object
 4   fullName         930 non-null    object
 5   gender           930 non-null    object
 6   birth            929 non-null    object
 7   links            955 non-null    object
 8   nobelPrizes      955 non-null    object
 9   death            629 non-null    object
 10  orgName          25 non-null     object
 11  nativeName       24 non-null     object
 12  acronym          11 non-null     object
 13  founded          24 non-null     object
 14  penName          11 non-null     object
 15  birthCountry     1 non-null      object
 16  birthCountryNow  1 non-null      object
 17  birthContinent   1 non-null      ob

In [16]:
dftot["id"] = dftot["id"].astype("int")

dftot.sort_values(by=['id'], ascending=[True], inplace=True)
#dftot.tail(10)


In [5]:
#dftot.sample(10)

In [6]:
listNobel = []
for index,row in dftot.iterrows():
    new_item = dict()
    new_item['id'] = row['id']
    try:
        new_item['name_en'] = row['knownName']['en']
    except:
        pass
    try:
        new_item['name_se'] = row['knownName']['se']
    except:
        pass
    new_item['link'] = row['links']['href']
    try:
        new_item['awardYear'] = row['nobelPrizes'][0]['awardYear']
        new_item['category'] = row['nobelPrizes'][0]['category']['en']
    except:
        pass
    try:
        new_item['gender'] = row['gender']
    except:
        pass
    listNobel.append(new_item)
NobelTot = pd.DataFrame(listNobel,
                  columns=['id','name_en','name_se','link','gender','awardYear','category'])

NobelTot.info()    


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 955 entries, 0 to 954
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   id         955 non-null    int64 
 1   name_en    930 non-null    object
 2   name_se    930 non-null    object
 3   link       955 non-null    object
 4   gender     930 non-null    object
 5   awardYear  955 non-null    object
 6   category   955 non-null    object
dtypes: int64(1), object(6)
memory usage: 52.4+ KB


## Check Wikidata matches

In [7]:
# pip install sparqlwrapper
# https://rdflib.github.io/sparqlwrapper/

import sys
from SPARQLWrapper import SPARQLWrapper, JSON

endpoint_url = "https://query.wikidata.org/sparql"

query = """SELECT ?item ?itemLabel ?NobelAPI WHERE {
  ?item wdt:P8024 ?NobelAPI.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} order by xsd:integer(?NobelAPI)"""


def get_sparql_dataframe(endpoint_url, query):
    """
    Helper function to convert SPARQL results into a Pandas data frame.
    """
    user_agent = "salgo60/%s.%s" % (sys.version_info[0], sys.version_info[1])
 
    sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    result = sparql.query()

    processed_results = json.load(result.response)
    cols = processed_results['head']['vars']
    out = []
    for row in processed_results['results']['bindings']:
        item = []
        for c in cols:
            item.append(row.get(c, {}).get('value'))
        out.append(item)

    return pd.DataFrame(out, columns=cols)

WDNobel = get_sparql_dataframe(endpoint_url, query)

In [8]:
WDNobel["NobelAPI"] = WDNobel["NobelAPI"].astype("int")
WDNobel.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 955 entries, 0 to 954
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   item       955 non-null    object
 1   itemLabel  955 non-null    object
 2   NobelAPI   955 non-null    int64 
dtypes: int64(1), object(2)
memory usage: 22.5+ KB


### Compare WD <-> Nobel 

In [9]:
dfmerge = pd.merge(WDNobel, NobelTot,how='outer', left_on="NobelAPI",right_on="id",indicator=True)
dfmerge['_merge'] = dfmerge['_merge'].str.replace('left_only','WD_only').str.replace('right_only','Nobel_only')
dfmerge.rename(columns={"_merge": "WD_Nobel"},inplace = True)    
dfmerge[-10:]  

Unnamed: 0,item,itemLabel,NobelAPI,id,name_en,name_se,link,gender,awardYear,category,WD_Nobel
945,http://www.wikidata.org/entity/Q22669757,Charles M. Rice,987,987,Charles M. Rice,Charles M. Rice,http://masterdataapi.nobelprize.org/2/laureate...,male,2020,Physiology or Medicine,both
946,http://www.wikidata.org/entity/Q193803,Roger Penrose,988,988,Roger Penrose,Roger Penrose,http://masterdataapi.nobelprize.org/2/laureate...,male,2020,Physics,both
947,http://www.wikidata.org/entity/Q65807,Reinhard Genzel,989,989,Reinhard Genzel,Reinhard Genzel,http://masterdataapi.nobelprize.org/2/laureate...,male,2020,Physics,both
948,http://www.wikidata.org/entity/Q493956,Andrea M. Ghez,990,990,Andrea Ghez,Andrea Ghez,http://masterdataapi.nobelprize.org/2/laureate...,female,2020,Physics,both
949,http://www.wikidata.org/entity/Q17280087,Emmanuelle Charpentier,991,991,Emmanuelle Charpentier,Emmanuelle Charpentier,http://masterdataapi.nobelprize.org/2/laureate...,female,2020,Chemistry,both
950,http://www.wikidata.org/entity/Q56068,Jennifer Doudna,992,992,Jennifer A. Doudna,Jennifer A. Doudna,http://masterdataapi.nobelprize.org/2/laureate...,female,2020,Chemistry,both
951,http://www.wikidata.org/entity/Q2344210,Louise Glück,993,993,Louise Glück,Louise Glück,http://masterdataapi.nobelprize.org/2/laureate...,female,2020,Literature,both
952,http://www.wikidata.org/entity/Q204344,World Food Programme,994,994,,,http://masterdataapi.nobelprize.org/2/laureate...,,2020,Peace,both
953,http://www.wikidata.org/entity/Q1359990,Paul Milgrom,995,995,Paul R. Milgrom,Paul R. Milgrom,http://masterdataapi.nobelprize.org/2/laureate...,male,2020,Economic Sciences,both
954,http://www.wikidata.org/entity/Q377265,Robert B. Wilson,996,996,Robert B. Wilson,Robert B. Wilson,http://masterdataapi.nobelprize.org/2/laureate...,male,2020,Economic Sciences,both


#### Check that Wikidata and api.nobelprize.org is in synch

In [10]:
dfmerge["WD_Nobel"].value_counts()  

both    955
Name: WD_Nobel, dtype: int64

In [11]:
Nobelonly = dfmerge[dfmerge["WD_Nobel"]=="Nobel_only"].copy() 

from IPython.display import HTML
Nobelonly["Nobel"] = "<a href='https://api.nobelprize.org/v1/laureate.json?id=" + Nobelonly["id"] + "'>link</a>"
Nobelonly["Nobelhtml"] = "<a href='https://www.nobelprize.org/laureate/" + Nobelonly["id"] + "'>html</a>"
pd.set_option("display.max.columns", None) 
HTML(Nobelonly.to_html(escape=False))

Unnamed: 0,item,itemLabel,NobelAPI,id,name_en,name_se,link,gender,awardYear,category,WD_Nobel,Nobel,Nobelhtml


In [12]:
WDonly = dfmerge[dfmerge["WD_Nobel"]=="WD_only"].copy() 
WDonly

Unnamed: 0,item,itemLabel,NobelAPI,id,name_en,name_se,link,gender,awardYear,category,WD_Nobel


In [13]:
print("End run: ", datetime.now())

End run:  2020-10-12 12:13:22.146677


In [14]:
print('Time elapsed (hh:mm:ss.ms) {}'.format(datetime.now() - start_time))

Time elapsed (hh:mm:ss.ms) 0:00:03.582088


#### Lesson learned
Its a delay when the html page is linked after the announcement --> we do this list below to check when its ready and we can add it to Wikidata and the WIkipedia article

In [15]:
# see task xxx we can create links JSOn and HTML using the ID
dfmerge["linkjson"] = "<a href='https://api.nobelprize.org/v1/laureate.json?id=" + dfmerge["id"].astype(str) + "'>json</a>"
dfmerge["linkhtml"] = "<a href='https://www.nobelprize.org/laureate/" + dfmerge["id"].astype(str) + "'>html</a>"
dfmerge["wd"] = "<a href='" + dfmerge["item"].astype(str) + "'>Wikidata</a>"

pd.set_option("display.max.columns", None) 
HTML(dfmerge[{'wd','name_en','awardYear','category','linkjson','linkhtml'}].tail(15).to_html(escape=False))

Unnamed: 0,linkhtml,awardYear,name_en,category,wd,linkjson
940,html,2019,Abhijit Banerjee,Economic Sciences,Wikidata,json
941,html,2019,Esther Duflo,Economic Sciences,Wikidata,json
942,html,2019,Michael Kremer,Economic Sciences,Wikidata,json
943,html,2020,Harvey J. Alter,Physiology or Medicine,Wikidata,json
944,html,2020,Michael Houghton,Physiology or Medicine,Wikidata,json
945,html,2020,Charles M. Rice,Physiology or Medicine,Wikidata,json
946,html,2020,Roger Penrose,Physics,Wikidata,json
947,html,2020,Reinhard Genzel,Physics,Wikidata,json
948,html,2020,Andrea Ghez,Physics,Wikidata,json
949,html,2020,Emmanuelle Charpentier,Chemistry,Wikidata,json


Status 20201009 :
* Ok all has html pages

Templates supporting Wikidata Property 8024 = Wikidata [Q91652187](https://www.wikidata.org/wiki/Q91652187)
* ar:Wikipedia [قالب:جائزة_نوبل](https://ar.wikipedia.org/wiki/%D9%82%D8%A7%D9%84%D8%A8:%D8%AC%D8%A7%D8%A6%D8%B2%D8%A9_%D9%86%D9%88%D8%A8%D9%84)
* ca:wikipedia [Plantilla:Nobelprize](https://ca.wikipedia.org/wiki/Plantilla:Nobelprize)
* en:Wikipedia [Template:Nobelprize](https://en.wikipedia.org/wiki/Template:Nobelprize)
* ka:Wikipedia [თარგი:Nobelprize](https://ka.wikipedia.org/wiki/%E1%83%97%E1%83%90%E1%83%A0%E1%83%92%E1%83%98:Nobelprize)
* nn:Wikipedia [Mal:Nobelprize](https://nn.wikipedia.org/wiki/Mal:Nobelprize)
* pt:Wikipedia [Predefinição:Prémio Nobel](https://pt.wikipedia.org/wiki/Predefini%C3%A7%C3%A3o:Pr%C3%A9mio_Nobel)
* sv:Wikipedia [Mall:Nobelprize](https://sv.wikipedia.org/wiki/Mall:Nobelprize)
* sr:WIkipedia [Шаблон:Nobelprize](https://sr.wikipedia.org/wiki/%D0%A8%D0%B0%D0%B1%D0%BB%D0%BE%D0%BD:Nobelprize)

* Articles using [template](https://www.wikidata.org/wiki/Q91652187) and view statistics this year and during the [Nobel week](https://www.nobelprize.org/press/#/publication/5f1719e411aab40004a17f36/552bd85dccc8e20c00e7f979?&sh=false)  5-12 October 2020 on those articles see also [video](https://youtu.be/Iu1JtefueM8)
  * ar:Wikipedia [using](https://petscan.wmflabs.org/?psid=17573153) / not using
    * [statistics](https://pageviews.toolforge.org/massviews/?platform=all-access&agent=user&source=category&range=this-year&subjectpage=0&subcategories=0&sort=views&direction=1&view=list&target=https://ar.wikipedia.org/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2582%25D8%25A7%25D9%2584%25D8%25A8_%25D8%25AC%25D8%25A7%25D8%25A6%25D8%25B2%25D8%25A9_%25D9%2586%25D9%2588%25D8%25A8%25D9%2584_%25D9%258A%25D8%25B3%25D8%25AA%25D8%25B9%25D9%2585%25D9%2584_%25D8%25AE%25D8%25A7%25D8%25B5%25D8%25A9_%25D9%2588%25D9%258A%25D9%2583%25D9%258A_%25D8%25A8%25D9%258A%25D8%25A7%25D9%2586%25D8%25A7%25D8%25AA_P8024) / [Nobel week](https://pageviews.toolforge.org/massviews/?platform=all-access&agent=user&source=category&start=2020-10-05&end=2020-10-12&subjectpage=0&subcategories=0&sort=views&direction=1&view=list&target=https://ar.wikipedia.org/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2582%25D8%25A7%25D9%2584%25D8%25A8_%25D8%25AC%25D8%25A7%25D8%25A6%25D8%25B2%25D8%25A9_%25D9%2586%25D9%2588%25D8%25A8%25D9%2584_%25D9%258A%25D8%25B3%25D8%25AA%25D8%25B9%25D9%2585%25D9%2584_%25D8%25AE%25D8%25A7%25D8%25B5%25D8%25A9_%25D9%2588%25D9%258A%25D9%2583%25D9%258A_%25D8%25A8%25D9%258A%25D8%25A7%25D9%2586%25D8%25A7%25D8%25AA_P8024)
  * ca:Wikipedia miss track category [Q91672712](https://www.wikidata.org/wiki/Q91672712#sitelinks-wikipedia)
  * en:Wikipedia [using](https://petscan.wmflabs.org/?psid=17573140) / [not using](https://petscan.wmflabs.org/?psid=17579263) 
    * [statistics](https://pageviews.toolforge.org/massviews/?platform=all-access&agent=user&source=category&range=this-year&subjectpage=0&subcategories=0&sort=views&direction=1&view=list&target=https://en.wikipedia.org/wiki/Category:Nobelprize_template_using_Wikidata_property_P8024) / [Nobel week](https://pageviews.toolforge.org/massviews/?platform=all-access&agent=user&source=category&start=2020-10-05&end=2020-10-12&subjectpage=0&subcategories=0&sort=views&direction=1&view=list&target=https://en.wikipedia.org/wiki/Category:Nobelprize_template_using_Wikidata_property_P8024) as a [chart](https://pageviews.toolforge.org/massviews/?platform=all-access&agent=user&source=category&start=2020-10-05&end=2020-10-10&subjectpage=0&subcategories=0&sort=views&direction=1&view=chart&target=https://en.wikipedia.org/wiki/Category:Nobelprize%20template%20using%20Wikidata%20property%20P8024)
  * ka:Wikipedia miss track category [Q91672712](https://www.wikidata.org/wiki/Q91672712#sitelinks-wikipedia)
  * nn:Wikipedia [using](https://petscan.wmflabs.org/?psid=17573155) / not using 
     * [statistics](https://pageviews.toolforge.org/massviews/?platform=all-access&agent=user&source=category&range=this-year&subjectpage=0&subcategories=0&sort=views&direction=1&view=list&target=https://nn.wikipedia.org/wiki/Kategori:Sider_som_nyttar_Mal:Nobelprize) / [Nobel week](https://pageviews.toolforge.org/massviews/?platform=all-access&agent=user&source=category&start=2020-10-05&end=2020-10-12&subjectpage=0&subcategories=0&sort=views&direction=1&view=list&target=https://nn.wikipedia.org/wiki/Kategori:Sider_som_nyttar_Mal:Nobelprize)
  * pt:Wikipedia [using](https://petscan.wmflabs.org/?psid=17573158) / [not using](https://petscan.wmflabs.org/?psid=17579612) 
     * [statistics](https://pageviews.toolforge.org/massviews/?platform=all-access&agent=user&source=category&range=this-year&subjectpage=0&subcategories=0&sort=views&direction=1&view=list&target=https://pt.wikipedia.org/wiki/Categoria:Predefini%25C3%25A7%25C3%25A3o_sobre_pr%25C3%25A9mios_Nobel_que_usam_a_propriedade_do_Wikidata_P8024) / [Nobel week](https://pageviews.toolforge.org/massviews/?platform=all-access&agent=user&source=category&start=2020-10-05&end=2020-10-12&subjectpage=0&subcategories=0&sort=views&direction=1&view=list&target=https://pt.wikipedia.org/wiki/Categoria:Predefini%25C3%25A7%25C3%25A3o_sobre_pr%25C3%25A9mios_Nobel_que_usam_a_propriedade_do_Wikidata_P8024)
  * sv:Wikipedia [using](https://petscan.wmflabs.org/?psid=17573129) / not using 
    * [statistics](https://pageviews.toolforge.org/massviews/?platform=all-access&agent=user&source=category&range=this-year&subjectpage=0&subcategories=0&sort=views&direction=1&view=list&target=https://sv.wikipedia.org/wiki/Kategori:Sidor_som_anv%25C3%25A4nder_Mall:Nobelprize) / [Nobel week](https://pageviews.toolforge.org/massviews/?platform=all-access&agent=user&source=category&start=2020-10-05&end=2020-10-12&subjectpage=0&subcategories=0&sort=views&direction=1&view=list&target=https://sv.wikipedia.org/wiki/Kategori:Sidor_som_anv%25C3%25A4nder_Mall:Nobelprize)
  * sr:Wikipedia miss track category [Q91672712](https://www.wikidata.org/wiki/Q91672712#sitelinks-wikipedia)
  
Task [T251055](https://phabricator.wikimedia.org/T251055)  

# Next Nobelprize announcement
see https://www.nobelprize.org/