# Wikidata SparQL queries framework examples

We just need to import the helper functions and we can start querying Wikidata with simple functions instead of writing long SparQL queries.

In [1]:
import functions as f

## Get the information of one human from Wikidata

(other options are *get_person_locations* and *get_exhibitions_by_id* which include less information, or some less recommended options in the module.)

In [2]:
van_gogh_response = f.get_all_person_info_strict("Vincent van Gogh")
df = f.results_dataframe(van_gogh_response)
df

Unnamed: 0,name,birth_place,birth_date,death_date,death_place,gender,citizenship,locations,occupation,location_dates,id
0,Vincent van Gogh,Zundert,1853-03-30T00:00:00Z,1890-07-29T00:00:00Z,Auvers-sur-Oise,male,Kingdom of the Netherlands,"Amsterdam,Paris,Saint-Rémy-de-Provence,The Hag...","drawer,printmaker,painter","[{'location': 'Amsterdam', 'start_time': '1891...",Q5582


There is support to process dates, and to process the start and end dates of periods of work locations.

We can print out the information from the response prettily:

In [3]:
print(f"Birthplace: {van_gogh_response['birth_place']}, deathplace: {van_gogh_response['death_place']}")
print(f"Birthyear: {f.find_year(van_gogh_response['birth_date'])}, deathdate: {f.find_year(van_gogh_response['death_date'])}")
print(f"Gender: {van_gogh_response['gender']}\nCitizenship: {van_gogh_response['citizenship']}\nOccupations: {str(van_gogh_response['occupation']).strip('[]')}")
print(); print("Work locations:")

places = f.get_places_with_years_from_response(van_gogh_response, return_type="list") #Can also use string return, and use stringlist_to_list() to convert to list
for place in places:
    name,period = place.replace(",", " and ").split(":")
    print(f"{name}, between {period}")

Birthplace: Zundert, deathplace: Auvers-sur-Oise
Birthyear: 1853, deathdate: 1890
Gender: male
Citizenship: Kingdom of the Netherlands
Occupations: drawer,printmaker,painter

Work locations:
Amsterdam, between 1891-1911 and 1877-1878
Saint-Rémy-de-Provence, between 1889-1890
The Hague, between 1881-1883 and 1869-1873
Ramsgate, between 1876-1876
City of Brussels, between 1880-1881
Etten-Leur, between 1881-1881 and 1876-1876
Dordrecht, between 1877-1877
Nuenen, between 1883-1885
Paris, between 1875-1876 and 1886-1888
Auvers-sur-Oise, between 1890-1890
Van Gogh House, between 1883-1883
London, between 1873-1875
Arles, between 1888-1889
Hoogeveen, between 1883-1883
Antwerp, between 1885-1886
Borinage, between 1878-1879
Tilburg, between 1866-1868
Maison Van Gogh, between 1879-1880


**Note**: It is generally recommended to use `get_all_person_info_strict` instead, as that restricts to humans in queries (in some cases, a name might return a statue or a building, not the person).

## A complete example

Let's load artists from WikiArt under the [PainterPalette](https://github.com/me9hanics/PainterPalette) dataset and select a few artists to collect temporal and geographical information about them. This includes birth and death dates and places, and locations (with years).

In [4]:
import pandas as pd
import numpy as np

artists = pd.read_csv("https://raw.githubusercontent.com/me9hanics/PainterPalette/main/datasets/wikiart_artists.csv")
artists["death_place"] = None #None for strings
artists["death_year"] = np.nan #NaN for floats
artists["locations"] = None #This is to not have warnings from pandas.
artists["locations_with_years"] = None

examples = artists[(artists["artist"]=="Vincent van Gogh") | (artists["artist"].str.contains("Rembrandt"))] #3 artists
print("Artists to query:", examples["artist"].values)

for index, artist in examples["artist"].items():
    response = f.get_all_person_info_strict(artist)
    if response is None:
        print(f"Could not find {artist}")
        continue

    examples.loc[index, "death_place"] = response["death_place"]
    examples.loc[index, "death_year"] = f.find_year(response["death_date"])
    examples.loc[index, "locations"] = response['locations'] #can also use: f.get_places_from_response(response)
    examples.loc[index, "locations_with_years"] = f.get_places_with_years_from_response(response, return_type = "string")

    if examples.loc[index, "death_place"] is None:
        print(f"Could not find death place for {artist}")
    if examples.loc[index, "death_year"] is None:
        print(f"Could not find death year for {artist}")
    if examples.loc[index, "locations"] is None:
        print(f"Could not find locations for {artist}")
    if examples.loc[index, "locations_with_years"] is None:
        print(f"Could not find locations with years for {artist}")

examples.drop(columns=["pictures_count","styles"])

Artists to query: ['Rembrandt' 'Vincent van Gogh' 'Rembrandt Peale']


Unnamed: 0,artist,movement,styles_extended,birth_place,birth_year,death_year,death_place,gender,citizenship,occupations,locations,locations_with_years
997,Rembrandt,Baroque,"{Baroque:587},{Tenebrism:128},{Unknown:52}",Leiden,1606.0,1669.0,Amsterdam,male,Dutch Republic,"painter, collector, art collector, etcher, pri...","Amsterdam,Leiden","['Amsterdam:1623-1625,1631-1669', 'Leiden:1625..."
1046,Vincent van Gogh,Post-Impressionism,"{Cloisonnism:11},{Impressionism:2},{Japonism:1...",Zundert,1853.0,1890.0,Auvers-sur-Oise,male,Kingdom of the Netherlands,"painter, printmaker, drawer, art dealer","Amsterdam,Paris,Saint-Rémy-de-Provence,The Hag...","['Amsterdam:1891-1911,1877-1878', 'Saint-Rémy-..."
2459,Rembrandt Peale,Neoclassicism,"{Neoclassicism:85},{Romanticism:1},{Unknown:1}",Pennsylvania,1778.0,1860.0,Philadelphia,male,United States of America,printmaker,"Boston,London,Baltimore,Washington, D.C.,New Y...",[]


## Querying multiple people with one request

Functions such as `get_multiple_people_all_info` query names with 1 query, which speeds up the process but can randomly miss instances.<br>
For this reason, a *highly recommended* function to use is `get_multiple_people_all_info_fast_retry_missing` which for the missing instances, retries with a separate query for each, as we did before. This enables different languages to be tried too.

The power of this function especially shows when querying 200+ names. Wikidata has limits for the amount of queries per minute, and this requires much less queries, therefore not having to wait for 1-2 minutes for timeouts as much.

This gathers theoretically same or better results as using `get_all_person_info_strict` for each person separately, checking for human instances, but this method tries different languages for queries, and with many matches can be considerably faster due to parallel running.<br>

In [5]:
example_names = ["Bracha L. Ettinger", "M.F. Husain", "Gerhard von Graevenitz", "Karl Schmidt-Rottluff", "Inigo Manglano-Ovalle", "Jean-Pierre Raynaud",
                 "Laszlo Moholy-Nagy", "Jose de Guimaraes", "Beatriz González", "John McLaughlin", "Angelo de Sousa", "J.M.W. Turner", "Ha Chong-Hyun",
                 "Lee Quinones", "LeRoy Neiman", "Ayse Erkmen", "Jay DeFeo", "JCJ Vanderheyden", "Li Yuan-chia", "Ding Yi", "Ad Reinhardt", "Alexander Calder",
                 "Alexander Rodchenko", "Alexej von Jawlensky", "Anni Albers", "Anthony Caro", "Beauford Delaney", "Clyfford Still", "David Smith", "Eduardo Chillida",
                 "Erik Bulatov", "Francis Picabia", "Franz Marc", "Giacomo Balla", "Hans Richter", "Henri Matisse", "Henry Moore", "Hilma af Klint", "Jean Arp"]

example_returns = f.get_multiple_people_all_info_fast_retry_missing(example_names)

In [6]:
example_returns[:2]

[{'name': 'Gerhard von Graevenitz',
  'birth_place': 'Schilde',
  'birth_date': '1934-09-19T00:00:00Z',
  'death_date': '1983-08-20T00:00:00Z',
  'death_place': 'Switzerland',
  'gender': 'male',
  'citizenship': 'Germany',
  'locations': 'Amsterdam',
  'occupation': 'painter,photographer',
  'location_dates': [{'location': 'Amsterdam',
    'start_time': None,
    'end_time': None,
    'point_in_time': '1970-01-01T00:00:00Z'}],
  'id': 'Q641688'},
 {'name': 'Karl Schmidt-Rottluff',
  'birth_place': 'Rottluff',
  'birth_date': '1884-12-01T00:00:00Z',
  'death_date': '1976-08-10T00:00:00Z',
  'death_place': 'West Berlin',
  'gender': 'male',
  'citizenship': 'Germany',
  'locations': '',
  'occupation': 'lithographer,drawer,art collector,designer,graphic artist,university teacher,sculptor,painter,Q686932,illustrator',
  'location_dates': [],
  'id': 'Q161143'}]

Let's see how many artists we did not gather information for:

In [7]:
print(f"Found {len(example_returns)} artists")
found_names = [instance['name'] for instance in example_returns]
missing = [name for name in example_names if name not in found_names]
print(f"Missing {len(missing)} artists: {missing}")

Found 37 artists
Missing 2 artists: ['Jose de Guimaraes', 'J.M.W. Turner']


We managed to get data of 37 out of 39 artists, one missing is José de Guimarães, who was missed due to querying an English-normalized name instead of his originally written name, and the other is J.M.W. Turner, whose name we queried in short form.<br>
This is very reasonable.

In fact, if we would use `get_all_person_info` for J.M.W. Turner, we would get a wrong instance, not the painter!

We can check the results in a table:

In [8]:
f.results_dataframe(example_returns)[:10]

Unnamed: 0,name,birth_place,birth_date,death_date,death_place,gender,citizenship,locations,occupation,location_dates,id
0,Gerhard von Graevenitz,Schilde,1934-09-19T00:00:00Z,1983-08-20T00:00:00Z,Switzerland,male,Germany,Amsterdam,"painter,photographer","[{'location': 'Amsterdam', 'start_time': None,...",Q641688
1,Karl Schmidt-Rottluff,Rottluff,1884-12-01T00:00:00Z,1976-08-10T00:00:00Z,West Berlin,male,Germany,,"lithographer,drawer,art collector,designer,gra...",[],Q161143
2,LeRoy Neiman,Saint Paul,1921-06-08T00:00:00Z,2012-06-20T00:00:00Z,New York City,male,United States,,"painter,actor",[],Q3124601
3,JCJ Vanderheyden,'s-Hertogenbosch,1928-06-23T00:00:00Z,2012-02-27T00:00:00Z,'s-Hertogenbosch,male,Kingdom of the Netherlands,,"conceptual artist,visual artist,film director,...",[],Q1846544
4,Alexander Calder,Lawnton,1898-07-22T00:00:00Z,1976-11-11T00:00:00Z,New York City,male,United States,"Tarragona,Florida,Berlin,Calvi,Palma,Barcelona...","drawer,manufacturer,printmaker,designer,jewelr...","[{'location': 'Tarragona', 'start_time': None,...",Q151580
5,Alexander Rodchenko,Saint Petersburg,1891-12-05T00:00:00Z,1956-12-03T00:00:00Z,Moscow,male,Soviet Union,"Kazan,Paris,Moscow","visual artist,graphic artist,sculptor,typograp...","[{'location': 'Kazan', 'start_time': None, 'en...",Q312631
6,Alexej von Jawlensky,Torzhok,1864-03-25T00:00:00Z,1941-03-15T00:00:00Z,Wiesbaden,male,Weimar Republic,"Germany,Zurich,Ascona,Paris,Saint Petersburg,M...","drawer,printmaker,art collector,painter","[{'location': 'Germany', 'start_time': None, '...",Q156426
7,Anni Albers,Berlin,1899-06-12T00:00:00Z,1994-05-09T00:00:00Z,Orange,female,Germany,,"textile designer,lithographer,textile artist,d...",[],Q86078
8,Anthony Caro,Surrey,1924-03-08T00:00:00Z,2013-10-23T00:00:00Z,London,male,United Kingdom of Great Britain and Ireland,,"visual artist,sculptor,artist",[],Q529591
9,Beauford Delaney,Knoxville,1901-12-31T00:00:00Z,1979-03-26T00:00:00Z,14th arrondissement of Paris,male,United States,France,painter,"[{'location': 'France', 'start_time': '1953-01...",Q2893161


We can turn the dates into years instead, and unpickle the locations for some clarity.

In [9]:
df = f.results_dataframe(example_returns)
df['birth_date'] = df['birth_date'].apply(f.find_year)
df['death_date'] = df['death_date'].apply(f.find_year)
df[:10]

Unnamed: 0,name,birth_place,birth_date,death_date,death_place,gender,citizenship,locations,occupation,location_dates,id
0,Gerhard von Graevenitz,Schilde,1934.0,1983.0,Switzerland,male,Germany,Amsterdam,"painter,photographer","[{'location': 'Amsterdam', 'start_time': None,...",Q641688
1,Karl Schmidt-Rottluff,Rottluff,1884.0,1976.0,West Berlin,male,Germany,,"lithographer,drawer,art collector,designer,gra...",[],Q161143
2,LeRoy Neiman,Saint Paul,1921.0,2012.0,New York City,male,United States,,"painter,actor",[],Q3124601
3,JCJ Vanderheyden,'s-Hertogenbosch,1928.0,2012.0,'s-Hertogenbosch,male,Kingdom of the Netherlands,,"conceptual artist,visual artist,film director,...",[],Q1846544
4,Alexander Calder,Lawnton,1898.0,1976.0,New York City,male,United States,"Tarragona,Florida,Berlin,Calvi,Palma,Barcelona...","drawer,manufacturer,printmaker,designer,jewelr...","[{'location': 'Tarragona', 'start_time': None,...",Q151580
5,Alexander Rodchenko,Saint Petersburg,1891.0,1956.0,Moscow,male,Soviet Union,"Kazan,Paris,Moscow","visual artist,graphic artist,sculptor,typograp...","[{'location': 'Kazan', 'start_time': None, 'en...",Q312631
6,Alexej von Jawlensky,Torzhok,1864.0,1941.0,Wiesbaden,male,Weimar Republic,"Germany,Zurich,Ascona,Paris,Saint Petersburg,M...","drawer,printmaker,art collector,painter","[{'location': 'Germany', 'start_time': None, '...",Q156426
7,Anni Albers,Berlin,1899.0,1994.0,Orange,female,Germany,,"textile designer,lithographer,textile artist,d...",[],Q86078
8,Anthony Caro,Surrey,1924.0,2013.0,London,male,United Kingdom of Great Britain and Ireland,,"visual artist,sculptor,artist",[],Q529591
9,Beauford Delaney,Knoxville,1901.0,1979.0,14th arrondissement of Paris,male,United States,France,painter,"[{'location': 'France', 'start_time': '1953-01...",Q2893161
