<a href="https://colab.research.google.com/github/hawc2/wikidata/blob/main/Wikidata_SPARQL_Queries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Querying and Visualizing Wikidata Overview




## Instructions

This notebook guides you through querying and visualizing wikidata.

# Install Packages

In [None]:
!pip install SPARQLWrapper
%load_ext google.colab.data_table 
import sys
from SPARQLWrapper import SPARQLWrapper, JSON
import pandas as pd
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

Collecting SPARQLWrapper
  Downloading SPARQLWrapper-1.8.5-py3-none-any.whl (26 kB)
Collecting rdflib>=4.0
  Downloading rdflib-6.0.2-py3-none-any.whl (407 kB)
[K     |████████████████████████████████| 407 kB 6.0 MB/s 
Collecting isodate
  Downloading isodate-0.6.0-py2.py3-none-any.whl (45 kB)
[K     |████████████████████████████████| 45 kB 2.7 MB/s 
Installing collected packages: isodate, rdflib, SPARQLWrapper
Successfully installed SPARQLWrapper-1.8.5 isodate-0.6.0 rdflib-6.0.2


# Get Your Data

Using Sparqlwrapper Query Wikidata API

## Set up SPARQL Query

### Relevant Data

https://docs.google.com/spreadsheets/d/1grYsGIwp6yey0ZPdkFXJlzlwZSQ6-4mRHO2XoZYicSQ/edit#gid=0

### Potential Additional SPAQRL Properties

1) social media and website info

2) can we pull first and last name separately.  P735 and P734 for given name and family name


3) geolocating sparql

4) what about sexual orientation/identity?

5) tracing familial and professional connections; 'trained by, influenced by' family connections to other regions
regional cultural background

6) no field being used often for 'exhibiting' - 'has works in collection' is a consistent field, but not much individual gallery shows-events

7) are there other institutiosn that hold these artists? like detroit / michigan artists connections with philly? what connections do philly artists have with other cities, artists outside philly?

8) network connections - significant person field? captures people who were peers, colleagues, if people exhibited together - artists who collaborated together?

9) can we complicate how we're defining ethnicity? synatra has some other properties

In [None]:
 sparql.setQuery("""
SELECT
    ?artist ?artistLabel ?sexGenderLabel ?birthdayLabel ?birthdayPrecision ?deathDateLabel ?deathdayPrecision ?sexualOrientationLabel
    (group_concat(DISTINCT(?occupationLabel);separator=", ") as ?occupations)
    (group_concat(DISTINCT(?residenceLabel);separator=", ") as ?residences)
    (group_concat(DISTINCT(?educationLabel);separator=", ") as ?educations)
    (group_concat(DISTINCT(?employerLabel);separator=", ") as ?employers)
    (group_concat(DISTINCT(?birthPlaceLabel);separator=", ") as ?birthPlaces)
    (group_concat(DISTINCT(?birthPlaceGeoLabel);separator=", ") as ?birthPlacesGeo)
    (group_concat(DISTINCT(?deathPlaceLabel);separator=", ") as ?deathPlaces)
    (group_concat(DISTINCT(?deathPlaceGeoLabel);separator=", ") as ?deathPlacesGeo)
    (group_concat(DISTINCT(?significantPersonLabel);separator=", ") as ?significantPersons)
    (group_concat(DISTINCT(?influenceLabel);separator=", ") as ?influences)
    (group_concat(DISTINCT(?worksInCollectionLabel);separator=", ") as ?museums)
    
WHERE
{
    ?artist wdt:P5008 wd:Q94124522. # PMA African American artists
    ?artist wdt:P106 ?occupation
    OPTIONAL { ?artist wdt:P21 ?sexGender. }
    OPTIONAL { ?artist wdt:P569 ?birthdayLabel. }
    OPTIONAL { ?artist p:P569/psv:P569 [ wikibase:timeValue ?birthdayLabel; wikibase:timePrecision ?birthdayPrecision ]. }
    OPTIONAL { ?artist wdt:P570 ?deathDateLabel. }
    OPTIONAL { ?artist p:P570/psv:P570 [ wikibase:timeValue ?deathDateLabel; wikibase:timePrecision ?deathdayPrecision ]. }
    OPTIONAL { ?artist wdt:P19 ?birthPlace. }
    OPTIONAL { ?artist wdt:P20 ?deathPlace. }
    OPTIONAL { ?artist wdt:P19 ?birthPlace. ?birthPlace wdt:P625 ?birthPlaceGeo}
    OPTIONAL { ?artist wdt:P20 ?deathPlace. ?deathPlace wdt:P625 ?deathPlaceGeo}
    OPTIONAL { ?artist wdt:P551 ?residence. }
    OPTIONAL { ?artist wdt:P69 ?education. }
    OPTIONAL { ?artist wdt:P108 ?employer. }
    OPTIONAL { ?artist wdt:P91 ?sexualOrientation. }
    OPTIONAL { ?artist wdt:P3342 ?significantPerson. }
    OPTIONAL { ?artist wdt:P737 ?influence. }
    OPTIONAL { ?artist wdt:P6379 ?worksInCollection. }
    SERVICE wikibase:label { 
    bd:serviceParam wikibase:language "en". 
    ?artist rdfs:label ?artistLabel . 
    ?occupation rdfs:label ?occupationLabel .
    ?sexGender rdfs:label ?sexGenderLabel .
    ?birthPlace rdfs:label ?birthPlaceLabel .
    ?deathPlace rdfs:label ?deathPlaceLabel .
    ?residence rdfs:label ?residenceLabel .
    ?education rdfs:label ?educationLabel .
    ?employer rdfs:label ?employerLabel .
    ?sexualOrientation rdfs:label ?sexualOrientationLabel .
    ?significantPerson rdfs:label ?significantPersonLabel .
    ?influence rdfs:label ?influenceLabel .
    ?birthPlaceGeo rdfs:label ?birthPlaceGeoLabel .
    ?deathPlaceGeo rdfs:label ?deathPlaceGeoLabel .
    ?worksInCollection rdfs:label ?worksInCollectionLabel .
  }
}
GROUP BY ?artist ?artistLabel ?sexGenderLabel ?birthdayLabel ?deathDateLabel ?sexualOrientationLabel ?birthdayPrecision ?deathdayPrecision
ORDER BY ?artistLabel
""")

In [None]:
sparql.setReturnFormat(JSON)
results = sparql.query().convert()


EndPointInternalError: ignored

In [None]:
results

{'head': {'vars': ['artist',
   'artistLabel',
   'sexGenderLabel',
   'birthdayLabel',
   'birthdayPrecision',
   'deathDateLabel',
   'deathdayPrecision',
   'sexualOrientationLabel',
   'occupations',
   'residences',
   'educations',
   'employers',
   'birthPlaces',
   'birthPlacesGeo',
   'deathPlaces',
   'deathPlacesGeo',
   'significantPersons',
   'influences',
   'museums']},
 'results': {'bindings': [{'artist': {'type': 'uri',
     'value': 'http://www.wikidata.org/entity/Q89042076'},
    'artistLabel': {'type': 'literal',
     'value': 'A.J. Smith',
     'xml:lang': 'en'},
    'birthPlaces': {'type': 'literal', 'value': ''},
    'birthPlacesGeo': {'type': 'literal', 'value': ''},
    'birthdayLabel': {'datatype': 'http://www.w3.org/2001/XMLSchema#dateTime',
     'type': 'literal',
     'value': '1952-01-01T00:00:00Z'},
    'birthdayPrecision': {'datatype': 'http://www.w3.org/2001/XMLSchema#integer',
     'type': 'literal',
     'value': '9'},
    'deathPlaces': {'type': 'l

# Wrangle your Data

1) Why do we have multiple records/rows for some artists? 
John Woodrow Wilson for instance
Concatenate more fields

2) standardize null values NaN or blank

3) birthdate/deathdate values could also be simplified and clarified, doesnt' look like time is worth keeping - *just simplify it to year*

4) set up geopandas to geolocate addresses

## Create Dataframe


In [None]:
df = pd.io.json.json_normalize(results['results']['bindings'])

  """Entry point for launching an IPython kernel.


In [None]:
cols = ['artist.value',
        'artistLabel.value', 
            'sexGenderLabel.value', 
            'birthdayLabel.value',
            'birthdayPrecision.value',
            'deathDateLabel.value',
            'deathdayPrecision.value',
            'birthPlaces.value',
            'birthPlacesGeo.value',
            'deathPlaces.value',
            'deathPlacesGeo.value',
            'residences.value',
            'educations.value',
            'occupations.value',
            'employers.value',
            'sexualOrientationLabel.value',
            'significantPersons.value',
            'influences.value',
            'museums.value']

In [None]:
df = df[cols]

In [None]:
df.columns = ['qnumber', 'name', 'gender', 'birthdate', 'birthdatePrecision', 'deathdate', 'deathdatePrecision', 'birthplace', 'birthplaceGeo', 'deathplace', 'deathplaceGeo', 'residence', 'education', 'occupation', 'employer', 'sexualOrientation', 'significantPerson', 'influence', 'museums']
df

Unnamed: 0,qnumber,name,gender,birthdate,birthdatePrecision,deathdate,deathdatePrecision,birthplace,birthplaceGeo,deathplace,deathplaceGeo,residence,education,occupation,employer,sexualOrientation,significantPerson,influence,museums
0,http://www.wikidata.org/entity/Q89042076,A.J. Smith,male,1952-01-01T00:00:00Z,9,,,,,,,,,"professor, artist, printmaker",University of Arkansas at Little Rock,,,,Philadelphia Museum of Art
1,http://www.wikidata.org/entity/Q4661979,Aaron Douglas,male,1899-05-26T00:00:00Z,11,1979-02-02T00:00:00Z,11,Topeka,Point(-95.67804 39.04833),Nashville,Point(-86.774444444 36.162222222),,"University of Kansas, University of Nebraska–L...","illustrator, painter, muralist",Fisk University,,,,"Metropolitan Museum of Art, National Gallery o..."
2,http://www.wikidata.org/entity/Q28858134,Akili Ron Anderson,male,1946-02-19T00:00:00Z,11,,,"Washington, D.C.",Point(-77.036666666 38.895),,,"Washington, D.C.",Howard University,"scenographer, visual artist, printmaker, photo...",,,,,Philadelphia Museum of Art
3,http://www.wikidata.org/entity/Q90042639,Alfred A. Smith,male,1896-09-17T00:00:00Z,11,1940-01-01T00:00:00Z,9,New York City,Point(-74.006015 40.712728),,,"New York City, Paris",,artist,,,,,Philadelphia Museum of Art
4,http://www.wikidata.org/entity/Q4727179,Alison Saar,female,1956-02-05T00:00:00Z,11,,,Los Angeles,Point(-118.24368 34.05223),,,,"Scripps College, Otis College of Art and Design","photographer, artist, illustrator, painter, sc...",,,,,"Studio Museum Harlem, San Francisco Museum of ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
220,http://www.wikidata.org/entity/Q28120443,William Majors,male,1930-07-21T00:00:00Z,11,1982-08-29T00:00:00Z,11,Indianapolis,Point(-86.158055555 39.768611111),Portsmouth,Point(-70.76075 43.07572),,Herron School of Art and Design,"artist, university teacher",,,,,"Metropolitan Museum of Art, Museum of Modern A..."
221,http://www.wikidata.org/entity/Q19757452,William Plummer,male,1873-01-01T00:00:00Z,9,1943-01-01T00:00:00Z,9,,,,,Smyth County,,"inventor, cabinetmaker",,,,,Philadelphia Museum of Art
222,http://www.wikidata.org/entity/Q8019143,William T. Williams,male,1942-07-17T00:00:00Z,11,,,Cross Creek Township,Point(-78.8997 35.0886),,,,"Yale University, Pratt Institute, Yale School ...","professor, artist, painter, printmaker",Brooklyn College,,,,"Museum of Modern Art, National Gallery of Art,..."
223,http://www.wikidata.org/entity/Q20861416,Willie Birch,male,1942-01-01T00:00:00Z,9,,,New Orleans,Point(-90.07507 29.95465),,,New Orleans,"Maryland Institute College of Art, Southern Un...","university teacher, artist",Touro College,,,,"National Gallery of Art, Philadelphia Museum o..."


In [None]:
## Gathering duplicate entries so I can figure out which columns still need to be concatenated.
dupe = df[df.duplicated(subset=['name'], keep=False)]
dupe

Unnamed: 0,qnumber,name,gender,birthdate,birthdatePrecision,deathdate,deathdatePrecision,birthplace,birthplaceGeo,deathplace,deathplaceGeo,residence,education,occupation,employer,sexualOrientation,significantPerson,influence,museums
17,http://www.wikidata.org/entity/Q2893161,Beauford Delaney,male,1901-12-30T00:00:00Z,11,1979-03-26T00:00:00Z,11.0,Knoxville,Point(-83.95 35.966666666),14th arrondissement of Paris,Point(2.326888888 48.833022222),,"Harvard University, Austin-East High School",painter,,,,,"Museum of Fine Arts, Metropolitan Museum of Ar..."
18,http://www.wikidata.org/entity/Q2893161,Beauford Delaney,male,1901-12-31T00:00:00Z,11,1979-03-26T00:00:00Z,11.0,Knoxville,Point(-83.95 35.966666666),14th arrondissement of Paris,Point(2.326888888 48.833022222),,"Harvard University, Austin-East High School",painter,,,,,"Museum of Fine Arts, Metropolitan Museum of Ar..."
19,http://www.wikidata.org/entity/Q2893161,Beauford Delaney,male,1901-12-30T00:00:00Z,11,1979-03-25T00:00:00Z,11.0,Knoxville,Point(-83.95 35.966666666),14th arrondissement of Paris,Point(2.326888888 48.833022222),,"Harvard University, Austin-East High School",painter,,,,,"Museum of Fine Arts, Metropolitan Museum of Ar..."
20,http://www.wikidata.org/entity/Q2893161,Beauford Delaney,male,1901-12-31T00:00:00Z,11,1979-03-25T00:00:00Z,11.0,Knoxville,Point(-83.95 35.966666666),14th arrondissement of Paris,Point(2.326888888 48.833022222),,"Harvard University, Austin-East High School",painter,,,,,"Museum of Fine Arts, Metropolitan Museum of Ar..."
34,http://www.wikidata.org/entity/Q5083521,Charles Wilbert White,male,1918-04-02T00:00:00Z,11,1979-11-03T00:00:00Z,11.0,Chicago,Point(-87.627777777 41.881944444),Los Angeles,Point(-118.24368 34.05223),,School of the Art Institute of Chicago,"artist, painter, printmaker",,,,,"Metropolitan Museum of Art, Museum of Modern A..."
35,http://www.wikidata.org/entity/Q5083521,Charles Wilbert White,male,1918-04-02T00:00:00Z,11,1979-10-03T00:00:00Z,11.0,Chicago,Point(-87.627777777 41.881944444),Los Angeles,Point(-118.24368 34.05223),,School of the Art Institute of Chicago,"artist, painter, printmaker",,,,,"Metropolitan Museum of Art, Museum of Modern A..."
78,http://www.wikidata.org/entity/Q1374436,Henry Ossawa Tanner,male,1859-06-21T00:00:00Z,11,1937-05-25T00:00:00Z,11.0,Pittsburgh,Point(-80.0 40.441666666),6th arrondissement of Paris,Point(2.332233333 48.850530555),Henry O. Tanner House,"Académie Julian, Pennsylvania Academy of the F...","printmaker, photographer, painter, university ...",Clark University,,,,"National Gallery of Art, Art Institute of Chic..."
79,http://www.wikidata.org/entity/Q1374436,Henry Ossawa Tanner,male,1859-06-21T00:00:00Z,11,1937-05-24T00:00:00Z,11.0,Pittsburgh,Point(-80.0 40.441666666),6th arrondissement of Paris,Point(2.332233333 48.850530555),Henry O. Tanner House,"Académie Julian, Pennsylvania Academy of the F...","photographer, painter, university teacher, pri...",Clark University,,,,"National Gallery of Art, Art Institute of Chic..."
82,http://www.wikidata.org/entity/Q325076,Horace Pippin,male,1888-01-01T00:00:00Z,9,1946-07-05T00:00:00Z,11.0,West Chester,Point(-75.605 39.9586),West Chester,Point(-75.605 39.9586),Pennsylvania,,"painter, sculptor, soldier",National Guard of the United States,,,,"The Phillips Collection, Whitney Museum of Ame..."
83,http://www.wikidata.org/entity/Q325076,Horace Pippin,male,1888-01-01T00:00:00Z,9,1946-01-01T00:00:00Z,9.0,West Chester,Point(-75.605 39.9586),West Chester,Point(-75.605 39.9586),Pennsylvania,,"painter, sculptor, soldier",National Guard of the United States,,,,"The Phillips Collection, Whitney Museum of Ame..."


In [None]:
dupe.to_csv(r'Dupe.csv', index = True)
from google.colab import files 

## Download Data as CSV

In [None]:
dl = df.to_csv('Wikidata_{}.csv'.format(pd.datetime.now().strftime("%Y-%m-%d_%Hh%Mm%Ss")), index=True) 

  """Entry point for launching an IPython kernel.


In [None]:
from google.colab import files
files.download('/content/Wikidata_2021-09-22 22h09m39s.csv')

FileNotFoundError: ignored

# Geocode Data

https://towardsdatascience.com/pythons-geocoding-convert-a-list-of-addresses-into-a-map-f522ef513fd6

In [None]:
!pip install geopandas
!pip install geopy

In [None]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="sample app")

In [None]:
# 2- - create location column
df['location'] = df['birthplace'].apply(geolocator.geocode)

In [None]:
df['location']

In [None]:
# 3 - create longitude, laatitude and altitude from location column (returns tuple)
df['point'] = df['location'].apply(lambda loc: tuple(loc.point) if loc else None)
# 4 - split point column into latitude, longitude and altitude columns
df[['latitude', 'longitude', 'altitude']] = pd.DataFrame(df['point'].tolist(), index=df.index)