**Use LOD to produce a chart**

Our triplestore can be used as a basis for data visualisation

An example question could be "what are the film bases for each film item per film work?"

To begin with we would use the same Python functions as before

In [1]:
import altair
import pandas
import pydash
import requests

def value_extract(row, col):

    """Extract dictionary values."""

    return pydash.get(row[col], "value")

def sparql_query(query):

    """Send sparql request, and formulate results into a dataframe."""

    r = requests.post('http://138.197.180.196:3030/test-data/sparql', data={'query': query}, verify=False)  
    data = pydash.get(r.json(), "results.bindings")
    data = pandas.DataFrame.from_dict(data)
    for x in data.columns:
        data[x] = data.apply(value_extract, col=x, axis=1)
    return data

Write a query which gets a list of all "film work" and "film item" relationships

I have moved the SPARQL query on to multiple lines because we are now going to be building more complex questions

In [2]:
dataframe = sparql_query(
    '''
    select ?film_work_url ?film_item_url 
    where { 
        ?film_work_url <http://my-archive.org/has_film_item> ?film_item_url .
    }
    '''
    ) 

dataframe.head()

Unnamed: 0,film_work_url,film_item_url
0,http://www.wikidata.org/entity/Q2316927,http://my-archive.org/acetate_print_of_simple_men
1,http://www.wikidata.org/entity/Q455552,http://my-archive.org/polyester_print_of_amateur


The ?film_work_url and ?film_item_url are called "variables" and hold the matching elements

The same result would happen if we used any variable name, although it would be confusing to read the query and the result

In [3]:
dataframe = sparql_query(
    '''
    select ?banana ?grapefruit 
    where { 
        ?banana <http://my-archive.org/has_film_item> ?grapefruit .
    }
    '''
    ) 

dataframe.head()

Unnamed: 0,banana,grapefruit
0,http://www.wikidata.org/entity/Q2316927,http://my-archive.org/acetate_print_of_simple_men
1,http://www.wikidata.org/entity/Q455552,http://my-archive.org/polyester_print_of_amateur


We can also make use of our statements to return labels not URLs 

Here we are not just filtering the graph, we are pulling different data statements and joining them together

In [4]:
dataframe = sparql_query(
    '''
    select ?film_work_url ?film_work ?film_item_url ?film_item 
    where { 
        ?film_work_url <http://my-archive.org/has_film_item> ?film_item_url .
        ?film_work_url <http://www.w3.org/2000/01/rdf-schema#label> ?film_work .
        ?film_item_url <http://www.w3.org/2000/01/rdf-schema#label> ?film_item .
    }
    '''
    ) 

dataframe.head()

Unnamed: 0,film_work_url,film_work,film_item_url,film_item
0,http://www.wikidata.org/entity/Q2316927,Simple Men,http://my-archive.org/acetate_print_of_simple_men,Acetate Print of Simple Men
1,http://www.wikidata.org/entity/Q455552,Amateur,http://my-archive.org/polyester_print_of_amateur,Polyester Print of Amateur


Add the item base and the base label

In [5]:
dataframe = sparql_query(
    '''
    select ?film_work_url ?film_work ?film_item_url ?film_item ?film_base_url ?film_base
    where { 
        ?film_work_url <http://my-archive.org/has_film_item> ?film_item_url .
        ?film_work_url <http://www.w3.org/2000/01/rdf-schema#label> ?film_work .
        ?film_item_url <http://www.w3.org/2000/01/rdf-schema#label> ?film_item .
        ?film_item_url <http://www.wikidata.org/entity/P186> ?film_base_url .
        ?film_base_url <http://www.w3.org/2000/01/rdf-schema#label> ?film_base .
    }
    '''
    ) 

dataframe.head()

Unnamed: 0,film_work_url,film_work,film_item_url,film_item,film_base_url,film_base
0,http://www.wikidata.org/entity/Q2316927,Simple Men,http://my-archive.org/acetate_print_of_simple_men,Acetate Print of Simple Men,http://www.wikidata.org/entity/Q124686,Acetate
1,http://www.wikidata.org/entity/Q455552,Amateur,http://my-archive.org/polyester_print_of_amateur,Polyester Print of Amateur,http://www.wikidata.org/entity/Q188245,Polyester


The variables in the "select" area can be reduced to only those data elements which we are interested in

Here the question was around "film base" per "film item" by "film base" per "film work". 

In [6]:
dataframe = sparql_query(
    '''
    select ?film_work ?film_base 
    where { 
        ?film_work_url <http://my-archive.org/has_film_item> ?film_item_url .
        ?film_work_url <http://www.w3.org/2000/01/rdf-schema#label> ?film_work .
        ?film_item_url <http://www.w3.org/2000/01/rdf-schema#label> ?film_item .
        ?film_item_url <http://www.wikidata.org/entity/P186> ?film_base_url .
        ?film_base_url <http://www.w3.org/2000/01/rdf-schema#label> ?film_base .
    }
    '''
    ) 

dataframe.head()

Unnamed: 0,film_work,film_base
0,Simple Men,Acetate
1,Amateur,Polyester


Now that we have a retrieved the requried data into a table, we can use a Python library to draw the chart

In [7]:
altair.Chart(dataframe).mark_bar().encode(
    y='film_work',
    x='count(film_base)',
    color='film_base'
)