# Wikidata Project

This project explores some of the data available in Wikidata's open knowledge graph through the use of SPARQL and analyzes and displays the data with Python and graphing libraries: matplotlib, plotly. 

There are two sections in this notebook.

The first section looks at musical tonalities of all works available in Wikidata and the movements they belong to. I was curious about the relationship between musical movements and tonalities, and through this data, wanted to answer questions such as "what were the most popular tonalities during the Romantic music period?", "what is the tonality landscape like for works of the First Viennese School?", "which movements had the most pieces in D minor?". The queries and resulting graphs attempt to address these questions.

The second section displays a map of indigenous languages in what is now the United States. Please see the sections below for more details on how this is done.

## Getting Started
The requirements.txt file has been updated to include `plotly` and `ipywidgets` packages. Please note that `ipywidgets` also needs to be enabled as a Jupyter extension. This should be handled by Binder through the `postBuild` file but for reference, the following command can be run to enable the extension:

```
jupyter nbextension enable --py widgetsnbextension
```

Import the needed python packages

In [1]:
import requests
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from ipywidgets import interact

In [2]:
#from ipywidgets import interact, interactive, fixed, interact_manual
#import ipywidgets as widgets

## Wikidata Query Setup

The code below is unchanged from the example.

Defines Wikidata SPARQL endpoint.  Numerous query examples using this endpoint can be found on the [Wikidata examples page](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples).

In [3]:
wikidata_endpoint = 'https://query.wikidata.org/sparql'

`do_query` is a simple helper method for setting the HTTP header and submitting the request using the excellent python [requests](https://requests.readthedocs.io/en/master/) package.

In [4]:
def do_query(query):
    rsp = requests.post(
        wikidata_endpoint,
        data=query,
        headers={
            'Content-type': 'application/sparql-query',
            'Accept': 'application/sparql-results+json',
            'User-Agent': 'https://github.com/JSTOR-Labs/sw-dev-project'
        }
    )
    if rsp.status_code != 200:
        raise Exception(f"Query failed with status code {rsp.status_code}.")
    return rsp.json()


# Tonality in Musical Movements



In [97]:
movements_dict = {}
movements_tonalities = {}

In [98]:
#get all movements
def movements():
    print('\nMovements with tonalities')
    
    query = '''
        SELECT ?movement ?movementLabel
        WHERE
        {
          ?work wdt:P826 ?tonality . #find all works that have tonality
          ?work wdt:P86 ?composer . #find composer of work
          ?composer wdt:P135 ?movement . #find musical movement of composer

          SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
        }
        GROUP BY ?movement ?movementLabel
    '''
        
    results = do_query(query)['results']['bindings']
    
    for m in results:
        #split URI string to get QID
        qid = m['movement']['value'].split('/')[-1]
        movements_dict[m['movementLabel']['value']] = qid
        
    print(movements_dict)

In [99]:
movements()


Movements with tonalities
{'Classical period': 'Q17723', 'Romantic music': 'Q207591', 'Western classical music': 'Q9730', 'baroque music': 'Q8361', 'Impressionism in music': 'Q837182', 'Harlem Renaissance': 'Q829895', 'First Viennese School': 'Q702207', 'Russian symbolism': 'Q1879488', 'musical modernism': 'Q2426218', 'Romanticism': 'Q37068', 'avant-garde': 'Q102932', '20th-century classical music': 'Q1338153', 'jazz': 'Q8341', 'vocal': 'Q2529757', 'glam rock': 'Q76092', 'new wave': 'Q187760', 'art rock': 'Q217467', 'pop rock': 'Q484641', 'blue-eyed soul': 'Q885561', 'art pop': 'Q25094849', 'social contract': 'Q1326430', 'classicism': 'Q170292'}


In [100]:
#get counts of tonalities for each movement
def movements():
    print('\nCount of tonalities by movement')

    for qid in movements_dict.values():
        query = '''
            SELECT ?tonalityLabel (COUNT(?tonalityLabel) as ?count)
                WHERE
                {
                  ?composer wdt:P135 wd:%s . #find all composers of movement
                  ?work wdt:P86 ?composer . #find works of composers
                  ?work wdt:P826 ?tonality . #find tonalities of works

                  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
                }
                GROUP BY ?tonalityLabel
                ORDER BY DESC(?count)
        ''' % qid
        
        results = do_query(query)['results']['bindings']
        
        movements_tonalities[qid] = {}
        for t in results:
            movements_tonalities[qid][t['tonalityLabel']['value']] = int(t['count']['value'])
            
    print(movements_tonalities)

In [101]:
movements()


Count of tonalities by movement
{'Q17723': {'C major': 90, 'D major': 83, 'F major': 66, 'E-flat major': 61, 'G major': 55, 'B-flat major': 55, 'A major': 33, 'C minor': 29, 'D minor': 12, 'G minor': 12, 'A minor': 8, 'F minor': 6, 'B major': 4, 'E major': 3, 'E minor': 3, 'C-sharp minor': 2, 'A-flat major': 2, 'B minor': 1, 'F-sharp major': 1, 'E-flat minor': 1}, 'Q207591': {'E-flat major': 56, 'C minor': 49, 'D minor': 41, 'B-flat major': 39, 'C major': 38, 'F major': 38, 'D major': 37, 'A minor': 34, 'G major': 33, 'A-flat major': 33, 'G minor': 33, 'A major': 30, 'F minor': 28, 'E minor': 26, 'C-sharp minor': 22, 'B minor': 18, 'E major': 17, 'D-flat major': 16, 'F-sharp minor': 13, 'B major': 12, 'E-flat minor': 11, 'F-sharp major': 9, 'B-flat minor': 9, 'G-flat major': 8, 'G-sharp minor': 6, 'atonality': 1}, 'Q9730': {'A-flat major': 27, 'G minor': 24, 'E-flat major': 20, 'F minor': 19, 'C major': 19, 'C minor': 18, 'A minor': 18, 'F major': 18, 'C-sharp minor': 16, 'E major': 1

In [107]:
def plot(id):
    print('Tonality Count')
    for tonality, count in movements_tonalities[id].items():
        print('{} {}'.format(tonality, count))

    
    plt.title('Tonalities')
    plt.pie(movements_tonalities[id].values(), labels=movements_tonalities[id].keys(), shadow=True)
    plt.axis('equal')
    plt.subplots_adjust(0,0,1,1)
    plt.show()
    
movements = movements_dict
interact(plot, id=movements)


interactive(children=(Dropdown(description='id', options={'Classical period': 'Q17723', 'Romantic music': 'Q20…

<function __main__.plot(id)>

In [103]:
import random

def plotly():
    #randomly pick 5 movements
    sample = random.sample(list(movements_dict.items()), 5)
    
    print(sample)
    print(movements_tonalities[sample[0][1]].keys())
    print(movements_tonalities[sample[0][1]].values())

    fig = go.Figure(data=[
        go.Bar(name=sample[0][0], x=list(movements_tonalities[sample[0][1]].keys()), y=list(movements_tonalities[sample[0][1]].values())),
        go.Bar(name=sample[1][0], x=list(movements_tonalities[sample[1][1]].keys()), y=list(movements_tonalities[sample[1][1]].values())),
        go.Bar(name=sample[2][0], x=list(movements_tonalities[sample[2][1]].keys()), y=list(movements_tonalities[sample[2][1]].values())),
        go.Bar(name=sample[3][0], x=list(movements_tonalities[sample[3][1]].keys()), y=list(movements_tonalities[sample[3][1]].values())),

    ])
    fig.update_layout(barmode='stack')
    fig.show()

In [106]:
plotly()

[('Harlem Renaissance', 'Q829895'), ('20th-century classical music', 'Q1338153'), ('Russian symbolism', 'Q1879488'), ('Romanticism', 'Q37068'), ('art rock', 'Q217467')]
dict_keys(['G major', 'C major', 'F minor', 'C minor', 'A-flat major', 'B-flat minor', 'D minor', 'B-flat major', 'E-flat minor'])
dict_values([3, 3, 2, 2, 2, 2, 1, 1, 1])


In this example four separate queries are used to generate a count of entities that are an `instance of` (property P31) 4 types of academic journal.  This same information could have been requested in a single SPARQL query using `GROUP BY` clause but Wikidata has become so large (now ~90M entities) that queries of that kind will often time out before completing.  More targeted queries are sometimes required to work around these timeouts. 




# Map of Languages in the Contiguous United States

In [87]:
mapdata = {
    'lat': [],
    'lon':[],
    'text':[],
    'count_size':[],
    'count_color':[],
}

In [88]:
def languages_query():
    print('\nLanguages in the Contiguous United States:')
    print('querying...')
    
    query = '''
        SELECT ?language ?languageLabel ?coordinates ?statecount

        WHERE {
          ?language wdt:P625 ?coordinates . #get coordinates of language

          #subquery
          { SELECT ?language (COUNT(?state) as ?statecount)
                   WHERE {
                      ?language wdt:P31 wd:Q34770 . #find languages
                      ?language wdt:P2341 ?state . #that are indigenous to a state
                      ?state wdt:P361 wd:Q578170 . #a state that is a part of the contiguous united states
                     }

            GROUP BY ?language
          }
          #end of subquery

          SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

          }
    '''
    
    results = do_query(query)['results']['bindings']
    
    for lang in results:
        #append coordinates
        coord = lang['coordinates']['value'][6:-1]
        lon_coord, lat_coord = coord.split()
        
        mapdata['lat'].append(lat_coord)
        mapdata['lon'].append(lon_coord)
        
        #append text (language label)
        text = lang['languageLabel']['value']
        mapdata['text'].append(text)
        
        #append count
        count = int(lang['statecount']['value'])
        mapdata['count_size'].append(count+6)
        mapdata['count_color'].append(count*200)

    print('...done')
    print('\nMapdata results:')
    print(mapdata)


In [89]:
languages_query()


Languages in the contiguous United States:

Query results:
[{'language': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q56749'}, 'coordinates': {'datatype': 'http://www.opengis.net/ont/geosparql#wktLiteral', 'type': 'literal', 'value': 'Point(-85.0 44.5)'}, 'statecount': {'datatype': 'http://www.w3.org/2001/XMLSchema#integer', 'type': 'literal', 'value': '4'}, 'languageLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Potawatomi'}}, {'language': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q12714767'}, 'coordinates': {'datatype': 'http://www.opengis.net/ont/geosparql#wktLiteral', 'type': 'literal', 'value': 'Point(-83.0 43.0)'}, 'statecount': {'datatype': 'http://www.w3.org/2001/XMLSchema#integer', 'type': 'literal', 'value': '4'}, 'languageLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Mesquakie language'}}, {'language': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q1185119'}, 'coordinates': {'datatype': 'http://www.opengis.net/ont/ge

In [90]:
#display map
def display_map(data):
    fig = go.Figure(data=go.Scattergeo(
        lon = data['lon'],
        lat = data['lat'],
        text = data['text'],
        mode = 'markers',
        
        marker = dict(
            size=data['count_size'],
            color=data['count_color'],
            )
        ))

    fig.update_layout(
            title = 'Languages in Contiguous United States',
            geo_scope = 'north america'
        )
    fig.show()

In [91]:
display_map(mapdata)