# Wikidata SparQL queries framework examples

We just need to import the helper functions and we can start querying Wikidata with simple functions instead of writing long SparQL queries.

In [1]:
import functions as f

## Get the information of one human from Wikidata

(other options are *get_person_info* and *get_person_locations* which include less information)

In [2]:
van_gogh_response = f.get_all_person_info("Van Gogh")

Print some information from the dictionary:

In [3]:
print(f"Birthplace: {van_gogh_response['birth_place']}, deathplace: {van_gogh_response['death_place']}")
print(f"Birthyear: {f.find_year(van_gogh_response['birth_date'])}, deathdate: {f.find_year(van_gogh_response['death_date'])}")
print(f"Gender: {van_gogh_response['gender']}, citizenship: {van_gogh_response['citizenship']}, occupations: {str(van_gogh_response['occupation']).strip('[]')}")
print(); print("Work locations:")

places_str = f.get_places_with_years_from_response(van_gogh_response)
places_list = f.stringlist_to_list(places_str)
for place in places_list:
    name,period = place.replace(","," and ").split(":")
    print(f"{name}, between {period}")

Birthplace: Zundert, deathplace: Auvers-sur-Oise
Birthyear: 1853, deathdate: 1890
Gender: male, citizenship: Kingdom of the Netherlands, occupations: 'drawer', 'printmaker', 'painter'

Work locations:
Saint-Rémy-de-Provence, between 1889-1890
The Hague, between 1881-1883 and 1869-1873
Ramsgate, between 1876-1876
City of Brussels, between 1880-1881
Etten-Leur, between 1881-1881 and 1876-1876
Dordrecht, between 1877-1877
Nuenen, between 1883-1885
Paris, between 1875-1876 and 1886-1888
Auvers-sur-Oise, between 1890-1890
Van Gogh House, between 1883-1883
London, between 1873-1875
Amsterdam, between 1877-1878
Arles, between 1888-1889
Hoogeveen, between 1883-1883
Antwerp, between 1885-1886
Borinage, between 1878-1879
Tilburg, between 1866-1868
Maison Van Gogh, between 1879-1880


**NOTE**: It is generally recommended to rather use `get_all_person_info_improved`, as that restricts to humans in queries (in some cases, a name might return a statue or a building, not the person).

Now, on something bigger, using artists from the [PainterPalette](https://github.com/me9hanics/PainterPalette) dataset, let's select a few artists and collect temporal and geographical information about them, such as birth and death dates and places, and locations, with years:

In [5]:
import pandas as pd
import numpy as np

artists_wikiart = pd.read_csv("https://raw.githubusercontent.com/me9hanics/PainterPalette/main/datasets/wikiart_artists.csv")
artists_wikiart["death_place"] = None #None for strings
artists_wikiart["death_year"] = np.nan #NaN for floats
artists_wikiart["locations"] = None #This is to not have warnings from pandas.
artists_wikiart["locations_with_years"] = None

examples = artists_wikiart[(artists_wikiart["artist"]=="Vincent van Gogh") | (artists_wikiart["artist"].str.contains("Rembrandt"))] #3 artists

for index, artist in examples["artist"].items():
    response = f.get_person_info(artist)
    if response is None:
        print(f"Could not find {artist}")
        continue

    examples.loc[index, "death_place"] = response.get("death_place")
    examples.loc[index, "death_year"] = f.find_year(response.get("death_date"))
    examples.loc[index, "locations"] = f.get_places_from_response(response)
    examples.loc[index, "locations_with_years"] = f.get_places_with_years_from_response(response)

    if examples.loc[index, "death_place"] is None:
        print(f"Could not find death place for {artist}")
    if examples.loc[index, "death_year"] is None:
        print(f"Could not find death year for {artist}")
    if examples.loc[index, "locations"] is None:
        print(f"Could not find locations for {artist}")
    if examples.loc[index, "locations_with_years"] is None:
        print(f"Could not find locations with years for {artist}")

examples.drop(columns=["pictures_count","styles"])

Unnamed: 0,artist,movement,styles_extended,birth_place,birth_year,death_year,death_place,gender,citizenship,occupations,locations,locations_with_years
997,Rembrandt,Baroque,"{Baroque:587},{Tenebrism:128},{Unknown:52}",Leiden,1606.0,1669.0,Amsterdam,male,Dutch Republic,"painter, collector, art collector, etcher, pri...","['Amsterdam', 'Leiden']","['Amsterdam:1623-1625,1631-1669', 'Leiden:1625..."
1046,Vincent van Gogh,Post-Impressionism,"{Cloisonnism:11},{Impressionism:2},{Japonism:1...",Zundert,1853.0,1874.0,Breda,male,Kingdom of the Netherlands,"painter, printmaker, drawer, art dealer","['Saint-Rémy-de-Provence', 'The Hague', 'Ramsg...","['Saint-Rémy-de-Provence:1889-1890', 'The Hagu..."
2459,Rembrandt Peale,Neoclassicism,"{Neoclassicism:85},{Romanticism:1},{Unknown:1}",Pennsylvania,1778.0,1860.0,Philadelphia,male,United States of America,printmaker,"['Boston', 'London', 'Baltimore', 'Washington,...",[]


## Querying multiple people with one request

Functions such as `get_multiple_people_all_info` query names with 1 query, which speeds up the process but can miss instances.<br>
For this, we can use `get_multiple_people_all_info_fast_retry_missing` which for the missing instances, retries with a separate query for each, as we did before. Afterwards, for any still missing instance, it checks for instances in different languages.

This gathers theoretically the same or better results as using `get_all_person_info` for each person separately, but this method first checks for human instances, tries in different languages and with many matches can be considerably faster.<br>
(Note that Wikidata also has limits for the amount of queries per minute, and this requires much less queries.)

In [6]:
example_names = ["Bracha L. Ettinger", "M.F. Husain", "Gerhard von Graevenitz", "Karl Schmidt-Rottluff", "Inigo Manglano-Ovalle", "Jean-Pierre Raynaud",
                 "Laszlo Moholy-Nagy", "Jose de Guimaraes", "Beatriz González", "John McLaughlin", "Angelo de Sousa", "J.M.W. Turner", "Ha Chong-Hyun",
                 "Lee Quinones", "LeRoy Neiman", "Ayse Erkmen", "Jay DeFeo", "JCJ Vanderheyden", "Li Yuan-chia", "Ding Yi", "Ad Reinhardt", "Alexander Calder",
                 "Alexander Rodchenko", "Alexej von Jawlensky", "Anni Albers", "Anthony Caro", "Beauford Delaney", "Clyfford Still", "David Smith", "Eduardo Chillida",
                 "Erik Bulatov", "Francis Picabia", "Franz Marc", "Giacomo Balla", "Hans Richter", "Henri Matisse", "Henry Moore", "Hilma af Klint", "Jean Arp"]

example_returns = f.get_multiple_people_all_info_fast_retry_missing(example_names)

In [7]:
example_returns[:2]

[{'name': 'Gerhard von Graevenitz',
  'birth_place': 'Schilde',
  'birth_date': '1934-09-19T00:00:00Z',
  'death_date': '1983-08-20T00:00:00Z',
  'death_place': 'Switzerland',
  'gender': 'male',
  'citizenship': 'Germany',
  'occupation': ['painter', 'photographer'],
  'work_locations': [{'location': 'Amsterdam',
    'start_time': None,
    'end_time': None,
    'point_in_time': '1970-01-01T00:00:00Z'}]},
 {'name': 'Karl Schmidt-Rottluff',
  'birth_place': 'Rottluff',
  'birth_date': '1884-12-01T00:00:00Z',
  'death_date': '1976-08-10T00:00:00Z',
  'death_place': 'West Berlin',
  'gender': 'male',
  'citizenship': 'Germany',
  'occupation': ['lithographer',
   'drawer',
   'art collector',
   'designer',
   'graphic artist',
   'university teacher',
   'sculptor',
   'painter',
   'Q686932',
   'illustrator'],
  'work_locations': []}]

Let's see how many artists we did not gather information for:

In [8]:
print(f"Found {len(example_returns)} artists")
found_names = [instance['name'] for instance in example_returns]
missing = [name for name in example_names if name not in found_names]
print(f"Missing {len(missing)} artists: {missing}")

Found 37 artists
Missing 2 artists: ['Jose de Guimaraes', 'J.M.W. Turner']


We managed to get data of 37 out of 39 artists, one missing is José de Guimarães, who was missed due to using an English-normalized name, and the other is J.M.W. Turner, whose name we queried in short form.<br>
This is very reasonable.

In fact, if we would use `get_all_person_info` for J.M.W. Turner, we would get a wrong instance, not the painter!