# Welcome to Jupyter!

This repo contains an introduction to [Jupyter](https://jupyter.org) and [IPython](https://ipython.org).

Outline of some basics:

* [Notebook Basics](../examples/Notebook/Notebook%20Basics.ipynb)
* [IPython - beyond plain python](../examples/IPython%20Kernel/Beyond%20Plain%20Python.ipynb)
* [Markdown Cells](../examples/Notebook/Working%20With%20Markdown%20Cells.ipynb)
* [Rich Display System](../examples/IPython%20Kernel/Rich%20Output.ipynb)
* [Custom Display logic](../examples/IPython%20Kernel/Custom%20Display%20Logic.ipynb)
* [Running a Secure Public Notebook Server](../examples/Notebook/Running%20the%20Notebook%20Server.ipynb#Securing-the-notebook-server)
* [How Jupyter works](../examples/Notebook/Multiple%20Languages%2C%20Frontends.ipynb) to run code in different languages.

You can also get this tutorial and run it on your laptop:

    git clone https://github.com/ipython/ipython-in-depth

Install IPython and Jupyter:

with [conda](https://www.anaconda.com/download):

    conda install ipython jupyter

with pip:

    # first, always upgrade pip!
    pip install --upgrade pip
    pip install --upgrade ipython jupyter

Start the notebook in the tutorial directory:

    cd ipython-in-depth
    jupyter notebook

In [2]:
import sys
!{sys.executable} -m pip install SPARQLWrapper
from SPARQLWrapper import SPARQLWrapper, JSON
import pandas as pd
import numpy as np
import datetime
import func_lib

endpoint_url = "https://query.wikidata.org/sparql"
item = "item"


class Relation:
    """
    The class returned when createRelation is called.
    It contains string field with query.
    We call Relation.query when we need to do the query.
    """

    def __init__(self, entity_id: str, property_id: str, isSubject: bool, rowVerbose: bool,
                 colVerbose: bool, time_property: str, time: str, name: str, label: bool, limit=10000):
        self.entity_id = entity_id
        self.query_str = ""
        self.dic = {}
        self.result_dic = {"Entity ID": []}
        self.df = pd.DataFrame()
        self.count = 0
        self.time_property = time_property
        self.time = time
        self.limit = limit
        self.focus = "Entity ID"
        if property_id:
            self.extend(property_id, isSubject, name, rowVerbose, colVerbose, limit, time_property, time, label)

    def generate_html(self, name: str):
        html = (self.df).to_html()
        text_file = open(name, "w", encoding='utf-8')
        text_file.write(html)
        text_file.close()

    def query(self, require=None):
        if self.query_str == "":
            self.result_dic = {"Entity ID": ['http://www.wikidata.org/entity/' + str(self.entity_id)]}
            return self.result_dic
        results = get_results(endpoint_url, self.query_str)
        result_dict = {"Entity ID": ['http://www.wikidata.org/entity/' + str(self.entity_id)]}
        for i in range(1, self.count + 1):
            result_dict[self.dic[i]["name"] + '_' + self.dic[i]['property_id']] = []
            if self.dic[i]["colVerbose"]:
                result_dict[self.dic[i]["name"] + '_rank_' + self.dic[i]['property_id'] + '_rank'] = []
                for key, value in self.dic[i]["property_name_dic"].items():
                    result_dict[
                        self.dic[i]["name"] + "_" + value + '_' + self.dic[i]['property_id'] + '_' + str(key)] = []
                for key, value in self.dic[i]["ref_dic"].items():
                    result_dict[self.dic[i]["name"] + "_ref_" + self.dic[i]['property_id'] + '_' + str(key)] = []

            if self.dic[i]["label"]:
                result_dict[self.dic[i]["name"] + '_' + self.dic[i]['property_id'] + 'Label'] = []

        for result in results['results']['bindings']:
            for key, value in result_dict.items():
                if key in result.keys():
                    result_dict[key].append(result[key]['value'])
                else:
                    result_dict[key].append('NA')
        result_dict["Entity ID"] = ['http://www.wikidata.org/entity/' + str(self.entity_id)] * len(
            result_dict[self.dic[self.count]["name"] + '_' + self.dic[self.count]["property_id"]])
        self.result_dic = result_dict
        self.df = pd.DataFrame.from_dict(self.result_dic)
        for i in range(1, self.count + 1):
            if self.dic[i]["colVerbose"] and not self.dic[i]["rowVerbose"]:
                col = self.dic[i]['name'] + '_rank_' + self.dic[i]['property_id'] + '_rank'
                if any(self.df[col] == 'http://wikiba.se/ontology#PreferredRank'):
                    self.df = self.df.loc[self.df[col] == 'http://wikiba.se/ontology#PreferredRank']
                else:
                    self.df = self.df.loc[self.df[col] == 'http://wikiba.se/ontology#NormalRank']
#         if require is not None:
#             for r in require:
#                 self.df = self.df.loc[self.df[r] != 'NA']
        self.df = pd.DataFrame(data=self.df)
#         if self.df.shape[0] >= 10000:
#             print("Warning: Your query leads to too many results. Only 10,000 returned.")
        return self.df

    def extend(self, property_id: str, isSubject: bool, name: str, rowVerbose=False, colVerbose=False, limit=None,
               time_property=None, time=None, search=None, label=False):
        self.count += 1
        self.dic[self.count] = {}
        self.dic[self.count]["name"] = name
        self.dic[self.count]["focus"] = self.focus
        self.dic[self.count]["property_id"] = property_id
        self.dic[self.count]["isSubject"] = isSubject
        self.dic[self.count]["limit"] = limit
        self.dic[self.count]["rowVerbose"] = rowVerbose
        self.dic[self.count]["colVerbose"] = colVerbose
        self.dic[self.count]['time_property'] = time_property
        self.dic[self.count]['time'] = time
        self.dic[self.count]['search'] = search
        self.dic[self.count]['label'] = label
        if rowVerbose or colVerbose:
            self.dic[self.count]["property_name_dic"], self.dic[self.count][
                "ref_dic"] = self.search_property_for_verbose()
        if time_property and time:
            self.time_property = time_property
            self.time = time
        if limit:
            self.limit = limit
        self.query_str = self.define_query_relation()

    def changeFocus(self, name="Entity ID"):
        self.focus = name
        
    def applyFunction(self, objcolumn, func, name):
        if type(func) == str:
            if func.startswith('F'):
                try:
                    func_id = int(func[1:])
                    if func_id == 0:
                        self.df[name] = self.df[objcolumn]
                    else:
                        if func_id >= func_lib.func_num():
                            print("Not available.")
                        else:
                            self.df[name] = self.df[objcolumn].apply(func_lib.func_list[func_id])
                except:
                    raise Exception("Not a valid function id, a valid function id should be 'Fn', n is an integer.")
            else:
                raise Exception("Not a valid function id, a valid function id should be 'Fn', n is an integer.")
        else:
            self.df[name] = self.df[objcolumn].apply(func)

    def define_query_relation(self):
        rdf_triple, time_filter, limit_statement = """""", """""", """"""
        if self.count < 1:
            return None
        focusChanges = 0
        for i in range(1, self.count + 1):
            if self.dic[i]["rowVerbose"] or self.dic[i]["colVerbose"]:
                if self.dic[i]["search"] is None and not self.dic[i]["isSubject"]:
                        rdf_triple += """OPTIONAL {"""
                if self.dic[i]["focus"] == "Entity ID":
#                     if self.dic[i]["search"] is None:
#                         rdf_triple += """OPTIONAL {"""
                    rdf_triple += """wd:""" + self.entity_id + """ p:""" + self.dic[i][
                        'property_id'] + """ ?statement_""" + str(i) + """. """ \
                                  + """?statement_""" + str(i) + """ ps:""" + self.dic[i][
                                      'property_id'] + """ ?""" + \
                                  self.dic[i]['name'] \
                                  + """_""" + self.dic[i]['property_id'] + """. """
                else:
                    rdf_triple += """?""" + self.dic[i]["focus"] + """ p:""" + self.dic[i][
                        'property_id'] + """ ?statement_""" + str(i) + """. """ \
                                  + """?statement_""" + str(i) + """ ps:""" + self.dic[i][
                                      'property_id'] + """ ?""" + \
                                  self.dic[i]['name'] \
                                  + """_""" + self.dic[i]['property_id'] + """. """
                for key, value in self.dic[i]["property_name_dic"].items():
                    rdf_triple += """OPTIONAL { """ + """?statement_""" + str(i) + """ pq:""" + str(key) \
                                  + """ ?""" + self.dic[i]['name'] + """_""" + value + """_""" + self.dic[i][
                                      'property_id'] + """_""" + str(key) + """.} """
                for key, value in self.dic[i]["ref_dic"].items():
                    rdf_triple += """OPTIONAL { ?statement_""" + str(
                        i) + """ prov:wasDerivedFrom ?refnode_""" + str(
                        i) + """. ?refnode_""" + str(i) \
                                  + """ pr:""" + str(key) + """ ?""" + self.dic[i]['name'] + """_ref_""" + \
                                  self.dic[i][
                                      'property_id'] + """_""" + str(key) + """.} """
                rdf_triple += """OPTIONAL { ?statement_""" + str(i) + """ wikibase:rank ?""" + self.dic[i][
                    'name'] + """_rank_""" + self.dic[i]['property_id'] + """_rank. } """
            # none-verbose version
            else:
                if self.dic[i]["focus"] == "Entity ID":
                    if self.dic[i]["isSubject"]:
#                         if self.dic[i]["search"] is None:
#                             rdf_triple += """OPTIONAL {"""
                        rdf_triple += """?""" + self.dic[i]["name"] + """_""" + self.dic[i][
                            'property_id'] + """ wdt:""" + self.dic[i][
                                          "property_id"] + """ wd:""" + self.entity_id + """. """
                    else:
                        if self.dic[i]["search"] is None:
                            rdf_triple += """OPTIONAL {"""
                        rdf_triple += """wd:""" + self.entity_id + """ wdt:""" + self.dic[i][
                            "property_id"] + """ ?""" + \
                                      self.dic[i]["name"] + """_""" + self.dic[i]['property_id'] + """. """
                else:
                    if self.dic[i]["isSubject"]:
#                         if self.dic[i]["search"] is None:
#                             rdf_triple += """OPTIONAL {"""
                        rdf_triple += """?""" + self.dic[i]["name"] + """_""" + self.dic[i][
                            'property_id'] + """ wdt:""" + self.dic[i]["property_id"] + """ ?""" + self.dic[i][
                                          'focus'] + """. """
                    else:
                        if self.dic[i]["search"] is None:
                            rdf_triple += """OPTIONAL {"""
                        rdf_triple += """?""" + self.dic[i]['focus'] + """ wdt:""" + self.dic[i][
                            "property_id"] + """ ?""" + self.dic[i]["name"] + """_""" + self.dic[i][
                                          'property_id'] + """. """
            if not self.dic[i]["isSubject"]:
                if i < self.count and self.dic[i]["focus"] != self.dic[i + 1]["focus"] and self.dic[i]["search"] is None:
                    focusChanges += 1
                elif self.dic[i]["search"] is None:
                    rdf_triple += """} """
        for i in range(focusChanges):
            rdf_triple += """} """
        for i in range(1, self.count + 1):
            if self.dic[i]['search'] is not None and self.dic[i]["search"] != '!NA':
                if isinstance(self.dic[i]['search'], tuple):
                    if isinstance(self.dic[i]['search'][0], str):
                        rdf_triple += """FILTER (YEAR(?""" + self.dic[i]['name'] + """_""" + self.dic[i][
                            'property_id'] + """) >= """ + \
                                      self.dic[i]['search'][0] + """ && YEAR(?""" + self.dic[i]['name'] + \
                                      """_""" + self.dic[i]['property_id'] + """) <= """ + self.dic[i]['search'][
                                          1] + """) """
                    else:
                        rdf_triple += """FILTER (?""" + self.dic[i]['name'] + """_""" + self.dic[i]['property_id'] + \
                                      """ >= """ + str(self.dic[i]['search'][0]) + """ && ?""" + self.dic[i]['name'] + \
                                      """_""" + self.dic[i]['property_id'] + """ <= """ + str(
                            self.dic[i]['search'][1]) + """) """
                else:
                    rdf_triple += """FILTER (?""" + self.dic[i]['name'] + """_""" + self.dic[i][
                        'property_id'] + """ = """ + \
                                  """wd:""" + self.dic[i]['search'] + """) """
        if self.time_property is not None:
            time_filter = """?""" + self.dic[1]["name"] + """ p:""" + self.time_property + """ ?pubdateStatement.	
                          ?pubdateStatement ps:""" + self.time_property + """ ?date	
                          FILTER (YEAR(?date) = """ + self.time + """)"""
        if self.limit is not None:
            limit_statement = """LIMIT """ + str(self.limit)
        label_statement = """Service wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }"""
        query = """SELECT DISTINCT"""
        for i in range(1, self.count + 1):
            if self.dic[i]["rowVerbose"] or self.dic[i]["colVerbose"]:
                query += """ ?""" + self.dic[i]["name"] + """_""" + self.dic[i]['property_id']
                if self.dic[i]["label"]:
                    query += """ ?""" + self.dic[i]["name"] + """_""" + self.dic[i]['property_id'] + """Label"""
                for key, value in self.dic[i]["property_name_dic"].items():
                    query += """ ?""" + self.dic[i]["name"] + """_""" + value + """_""" + self.dic[i][
                        'property_id'] + """_""" + str(key)
                for key, value in self.dic[i]["ref_dic"].items():
                    query += """ ?""" + self.dic[i]["name"] + """_ref_""" + self.dic[i]['property_id'] + """_""" + str(
                        key)
                query += """ ?""" + self.dic[i]["name"] + """_rank_""" + self.dic[i]['property_id'] + """_rank"""
            else:
                query += """ ?""" + self.dic[i]["name"] + """_""" + self.dic[i]['property_id']
                if self.dic[i]["label"]:
                    query += """ ?""" + self.dic[i]["name"] + """_""" + self.dic[i]['property_id'] + """Label"""
        query += """ WHERE {""" + rdf_triple + time_filter + label_statement + """} """ + limit_statement
        return query

    def search_property_for_verbose(self):
        property_to_name = {}
        ref_to_name = {}
        rdf_triple, time_filter, limit_statement = """""", """""", """"""
        if self.dic[self.count]["rowVerbose"] or self.dic[self.count]["colVerbose"]:
            for i in range(1, self.count):
                if self.dic[i]["focus"] == "Entity ID":
                    if self.dic[i]["isSubject"]:
                        rdf_triple += """?""" + self.dic[i]["name"] + """ wdt:""" + self.dic[i][
                            "property_id"] + """ wd:""" + self.entity_id + """ ."""
                    else:
                        rdf_triple += """wd:""" + self.entity_id + """ wdt:""" + self.dic[i]["property_id"] + """ ?""" + \
                                      self.dic[i]["name"] + """ ."""
                else:
                    last = self.dic[i]["focus"].rfind('_')
                    focus = self.dic[i]["focus"][:last]
                    if self.dic[i]["isSubject"]:
                        rdf_triple += """?""" + self.dic[i]["name"] + """ wdt:""" + self.dic[i][
                            "property_id"] + """ ?""" + focus + """ ."""
                    else:
                        rdf_triple += """?""" + focus + """ wdt:""" + self.dic[i][
                            "property_id"] + """ ?""" + self.dic[i]["name"] + """ ."""
            if self.dic[self.count]["focus"] == "Entity ID":
                rdf_triple += """wd:""" + self.entity_id + """ p:""" + self.dic[self.count][
                    'property_id'] + """ ?statement.""" + \
                              """?statement """ + """ps:""" + self.dic[self.count]['property_id'] + """ ?item.""" + \
                              """?statement """ + """?pq """ + """?obj.""" + \
                              """?qual wikibase:qualifier ?pq.""" + \
                              """OPTIONAL{ ?statement prov:wasDerivedFrom ?refnode. ?refnode ?pr ?r.}"""
            else:
                last = self.dic[self.count]["focus"].rfind('_')
                focus = self.dic[self.count]["focus"][:last]
                rdf_triple += """?""" + focus + """ p:""" + self.dic[self.count][
                    'property_id'] + """ ?statement.""" + \
                              """?statement """ + """ps:""" + self.dic[self.count]['property_id'] + """ ?item.""" + \
                              """?statement """ + """?pq """ + """?obj.""" + \
                              """?qual wikibase:qualifier ?pq.""" + \
                              """OPTIONAL{ ?statement prov:wasDerivedFrom ?refnode. ?refnode ?pr ?r.}"""
        if self.time_property is not None:
            time_filter = """?""" + self.dic[1]["name"] + """ p:""" + self.time_property + """ ?pubdateStatement.	
                                  ?pubdateStatement ps:""" + self.time_property + """ ?date	
                                  FILTER (YEAR(?date) = """ + self.time + """)"""
        if self.limit is not None:
            limit_statement = """LIMIT """ + str(self.limit)
        label_statement = """Service wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }"""
        query = """SELECT DISTINCT """
        if self.dic[self.count]["rowVerbose"] or self.dic[self.count]["colVerbose"]:
            query += """?item""" + """ ?qual""" + """ ?qualLabel""" + """ ?obj """ + """?pr ?prLabel"""
            query += """ WHERE {""" + rdf_triple + time_filter + label_statement + """} """ + limit_statement
            query_result = get_results(endpoint_url, query)
            for result in query_result['results']['bindings']:
                if 'qual' in result:
                    property_to_name[result['qual']['value'].split('/')[-1]] = result['qualLabel']['value'].replace(' ',
                                                                                                                    '_')
                if 'pr' in result:
                    ref_to_name[result['pr']['value'].split('/')[-1]] = result['prLabel']['value'].replace(' ', '_')
        else:
            query += """?""" + self.dic[self.count]["name"] + """ """
        return property_to_name, ref_to_name

    def __str__(self):
        return str(self.df)

    def __getattr__(self, col_name):
        if col_name in self.df.columns:
            return self.df[col_name]
        else:
            print(col_name + " has not been found.")
            return None


def createRelation(entity_id: str, property_id=None, isSubject=None, rowVerbose=None, colVerbose=None,
                   time_property=None, time=None, name=None, label=False, limit=None):
    if property_id and not name:
        print("Please specify the name of the first column")
        return None
    return Relation(entity_id, property_id, isSubject, rowVerbose, colVerbose, time_property, time, name, label, limit)

def get_Firstname(name: str):
    return name.split(' ')[0]

def get_Lastname(name: str):
    return name.split(' ')[-1]

def remove_prefix(text, prefix):
    if text.startswith(prefix):
        return text[len(prefix):]
    return text


def get_results(endpoint_url, query):
    user_agent = "WDQS-example Python/%s.%s" % (sys.version_info[0], sys.version_info[1])
    # TODO adjust user agent; see https://w.wiki/CX6
    sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    return sparql.query().convert()


def get_name(id: str):
    query = """PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 	
                PREFIX wd: <http://www.wikidata.org/entity/> 	
                select  *	
                where {	
                wd:""" + id + """ rdfs:label ?label .	
                FILTER (langMatches( lang(?label), "EN" ) )	
                } 	
                LIMIT 1"""
    results = get_results(endpoint_url, query)
    result = ''
    for res in results["results"]["bindings"]:
        result = res['label']['value']
    return result


You should consider upgrading via the '/home/jennyvo/env/bin/python3 -m pip install --upgrade pip' command.[0m


# Query Courthouses in the US

Jenny tests

In [34]:
r = createRelation("Q1137809") # create relation for Q1137809 = courthouses

In [35]:
r.extend("P31", True, "Courthouse", limit=10, label=True) # extend via property P31 = is instance of

In [36]:
r.query()

Unnamed: 0,Entity ID,Courthouse_P31,Courthouse_P31Label
0,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q128652,Palace of Justice
1,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q481170,Courthouses in Wuppertal
2,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q1006058,Q1006058
3,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q719011,Palace of Justice
4,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q850537,Law Courts of Brussels
5,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q876884,Palace of Justice
6,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q1281321,Q1281321
7,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q1229654,East Hawaii Cultural Center
8,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q1503389,United States Post Office
9,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q2267420,Q2267420


In [37]:
r.changeFocus('Courthouse_P31')

In [38]:
r.extend('P17', False, 'Country', label=True)

In [40]:
r.query()

Unnamed: 0,Entity ID,Courthouse_P31,Courthouse_P31Label,Country_P17,Country_P17Label
0,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q719011,Palace of Justice,http://www.wikidata.org/entity/Q218,Romania
1,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q201780,1841 Goshen Courthouse,http://www.wikidata.org/entity/Q30,United States of America
2,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q2269681,St. Paul's Abbey,http://www.wikidata.org/entity/Q55,Netherlands
3,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q3001513,Courthouse Place,http://www.wikidata.org/entity/Q30,United States of America
4,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q3361019,Justice Court in Nîmes,http://www.wikidata.org/entity/Q142,France
5,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q1322792,Palais de justice de Bordeaux,http://www.wikidata.org/entity/Q142,France
6,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q3361006,Lille Courthouse,http://www.wikidata.org/entity/Q142,France
7,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q3361012,Marseille Courthouse,http://www.wikidata.org/entity/Q142,France
8,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q3361024,Palais de Justice of Strasbourg,http://www.wikidata.org/entity/Q142,France
9,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q3361033,Argyle Township Court House and Jail,http://www.wikidata.org/entity/Q16,Canada


In [41]:
df = r.df
df_us_courthouses = list(df[df.Country_P17Label=='United States of America'].Courthouse_P31Label)
df_us_courthouses

['1841 Goshen Courthouse', 'Courthouse Place']

## Without limiting the sample, try to grab all courthouses in the US

In [42]:
r = createRelation("Q1137809") # create relation for Q1137809 = courthouses
r.extend("P31", True, "Courthouse", label=True) # extend via property P31 = is instance of
r.changeFocus('Courthouse_P31')
r.extend('P17', False, 'Country', label=True)
r.query()
df = r.df
df_us_courthouses = list(df[df.Country_P17Label=='United States of America'].Courthouse_P31Label)
df_us_courthouses

['St. Stephens Courthouse',
 'Richard C. Lee United States Courthouse',
 'Pioneer Courthouse',
 'United States Post Office, Courthouse, and Customhouse',
 'United States Post Office, Courthouse, and Customhouse',
 'United States Post Office, Courthouse, and Federal Office Building',
 'Somerset County Court House complex',
 'Schoharie County Courthouse Complex',
 'Susquehanna County Courthouse Complex',
 'United States Post Office and Court House',
 'United States Post Office and Court House',
 "Stutsman County Courthouse and Sheriff's Residence/Jail",
 'US Court House-Aiken, South Carolina',
 'Ronald N. Davies Federal Building and U.S. Courthouse',
 'Pocahontas County Courthouse and Jail',
 'Strom Thurmond Federal Building and United States Courthouse',
 'United States Court House and Custom House',
 'United States Courthouse, Post Office and Customs House',
 'United States Customhouse and Post Office',
 'Pierce Courthouse',
 'Joseph F. Weis, Jr. United States Courthouse',
 'Old United

In [44]:
len(df_us_courthouses)

410

## Grab all courthouses in the US as well as states and lon/lat data 

In [55]:
building_type = 'Courthouse'
qnum = 'Q1137809'

In [65]:
r = createRelation(qnum, label=True) # create relation for Q1137809 = courthouses
r.extend('P31', True, building_type, label=True) # extend via property P31 = is instance of
r.changeFocus('%s_P31' % building_type)
r.extend('P17', False, 'Country', label=True) # extend via property P17 = is in country
r.extend('P131', False, 'State', label=True)
r.extend('P625', False, 'Lon_Lat')
r.query()

# filter r's dataframe to only include entities in the US
df = r.df
r.df = df[df.Country_P17Label=='United States of America']

r.df.head()

Unnamed: 0,Entity ID,Courthouse_P31,Courthouse_P31Label,Country_P17,Country_P17Label,State_P131,State_P131Label,Lon_Lat_P625
1,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q5127180,Clark County Court House,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q1603,Kentucky,Point(-84.178056 37.992778)
2,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q5123209,City Hall Post Office and Courthouse,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q11299,Manhattan,Point(-74.0075 40.7121)
3,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q5049154,"Cass County Court House, Jail, and Sheriff's H...",http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q34109,Fargo,Point(-96.793056 46.871944)
4,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q4838521,Babylon Town Hall,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q3461796,Babylon Village,Point(-73.324111111 40.696416666)
5,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q4968884,Bristol County Courthouse Complex,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q771,Massachusetts,Point(-71.0925 41.9145)


In [66]:
r.df

Unnamed: 0,Entity ID,Courthouse_P31,Courthouse_P31Label,Country_P17,Country_P17Label,State_P131,State_P131Label,Lon_Lat_P625
1,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q5127180,Clark County Court House,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q1603,Kentucky,Point(-84.178056 37.992778)
2,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q5123209,City Hall Post Office and Courthouse,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q11299,Manhattan,Point(-74.0075 40.7121)
3,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q5049154,"Cass County Court House, Jail, and Sheriff's H...",http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q34109,Fargo,Point(-96.793056 46.871944)
4,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q4838521,Babylon Town Hall,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q3461796,Babylon Village,Point(-73.324111111 40.696416666)
5,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q4968884,Bristol County Courthouse Complex,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q771,Massachusetts,Point(-71.0925 41.9145)
...,...,...,...,...,...,...,...,...
1147,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q18158200,Somerville Courthouse,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q67979,Somerville,Point(-86.7958 34.475)
1151,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q7084281,Old Isle of Wight Courthouse,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q1370,Virginia,Point(-76.6322 36.9817)
1158,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q18154867,Napa County Courthouse Plaza,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q60537,Napa,Point(-122.284 38.2975)
1161,http://www.wikidata.org/entity/Q1137809,http://www.wikidata.org/entity/Q18159020,Winston E. Arnow Federal Building,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q486306,Pensacola,Point(-87.2156 30.4144)


So a problem that arises is that not every building in WikiData has a state associated as its administrative territorial entity. (That is, P131 = "located in the administrative territorial entity.")

For example, Thurgood Marshall United States Courthouse's administrative territorial entity is Manhattan and Appomattox Court House's administrative territorial entity is Virginia.

However, since longitude and latitude are given for all these entities, anyone who wants address information could easily look it up.

# Can we see all government buildings in the US in one go?

In [67]:
building_type = 'Gov_Building'
qnum = 'Q16831714' # qnum of gov building

In [68]:
r = createRelation(qnum, label=True)
r.extend('P31', True, building_type, label=True) # extend via property P31 = is instance of
r.changeFocus('%s_P31' % building_type)
r.extend('P17', False, 'Country', label=True) # extend via property P17 = is in country
r.extend('P131', False, 'State', label=True)
r.extend('P625', False, 'Lon_Lat')
r.query()

# filter r's dataframe to only include entities in the US
df = r.df
r.df = df[df.Country_P17Label=='United States of America']

r.df.head()

Unnamed: 0,Entity ID,Gov_Building_P31,Gov_Building_P31Label,Country_P17,Country_P17Label,State_P131,State_P131Label,Lon_Lat_P625
0,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q5440392,"Federal Reserve Bank of San Francisco, Los Ang...",http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q65,Los Angeles,Point(-118.258611 34.042778)
1,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q7071913,O'Neill House Office Building,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q61,"Washington, D.C.",Point(-77.0068 38.8857)
5,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q6133196,James E. Rudder State Office Building,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q16559,Austin,Point(-97.739444 30.271944)
7,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q5440314,Federal Office Building,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q43199,Omaha,Point(-95.936667 41.255278)
8,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q5547508,Georgia Governor's Mansion,http://www.wikidata.org/entity/Q30,United States of America,http://www.wikidata.org/entity/Q486633,Fulton County,Point(-84.399217 33.846176)


In [70]:
len(r.df)

117

Answer: No, apparently because WikiData doesn't transitively populate instance relationships all the way through subclasses, so only 117 buildings are listed as instances of `government building` even though there are 400+ courthouses and `courthouse` is a subclass of `government building`.

# Trace through class hierarchies under `government building`

## Find every subclass of `government building`

In [79]:
gov_building_subclass = 'Gov_Building'
qnum = 'Q16831714' # qnum of government building
r = createRelation(qnum, label=True)
r.extend('P279', True, gov_building_subclass, label=True) # extend via property P279 = is subclass of
r.query()
r.df

Unnamed: 0,Entity ID,Gov_Building_P279,Gov_Building_P279Label
0,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q481289,official residence
1,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q1407236,weigh house
2,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q1137809,courthouse
3,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q757292,border checkpoint
4,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q218653,custom house
5,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q5469110,Forest Service Guard Station
6,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q100705206,Prefectural Office Building
7,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q3250715,province building
8,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q861951,police station
9,http://www.wikidata.org/entity/Q16831714,http://www.wikidata.org/entity/Q30124446,legislative building


## Find every subclass of `government building` -> `official residence`

In [82]:
gov_building_subclass = 'Official_Residence'
qnum = 'Q481289' # qnum of official residence
r = createRelation(qnum, label=True)
r.extend('P279', True, gov_building_subclass, label=True) # extend via property P279 = is subclass of
r.query()
r.df

Unnamed: 0,Entity ID,Official_Residence_P279,Official_Residence_P279Label
0,http://www.wikidata.org/entity/Q481289,http://www.wikidata.org/entity/Q2128615,Raj Bhavan
1,http://www.wikidata.org/entity/Q481289,http://www.wikidata.org/entity/Q2114972,Presidential palace
2,http://www.wikidata.org/entity/Q481289,http://www.wikidata.org/entity/Q5588918,Government House of the British Empire and Com...
3,http://www.wikidata.org/entity/Q481289,http://www.wikidata.org/entity/Q3012073,Daikansho
4,http://www.wikidata.org/entity/Q481289,http://www.wikidata.org/entity/Q1180262,residenz
5,http://www.wikidata.org/entity/Q481289,http://www.wikidata.org/entity/Q23925100,Q23925100
6,http://www.wikidata.org/entity/Q481289,http://www.wikidata.org/entity/Q47163308,ambassador's residence


## Find every subclass of `government building` -> `official residence` -> `ambassador's residence`

In [87]:
gov_building_subclass = 'Ambassadors_Residence'
qnum = 'Q47163308' # qnum of ambassador's residence
r = createRelation(qnum, label=True)
r.extend('P279', True, gov_building_subclass, label=True) # extend via property P279 = is subclass of
r.query()
r.df

Unnamed: 0,Entity ID,Ambassadors_Residence_P279,Ambassadors_Residence_P279Label


In [88]:
gov_building_subclass = 'Ambassadors_Residence'
qnum = 'Q47163308' # qnum of ambassador's residence
r = createRelation(qnum, label=True)
r.extend('P31', True, gov_building_subclass, label=True) # extend via property P31 = is instance of
r.query()
r.df

Unnamed: 0,Entity ID,Ambassadors_Residence_P31,Ambassadors_Residence_P31Label
0,http://www.wikidata.org/entity/Q47163308,http://www.wikidata.org/entity/Q7272144,Quincy House
1,http://www.wikidata.org/entity/Q47163308,http://www.wikidata.org/entity/Q1137916,Winfield House
2,http://www.wikidata.org/entity/Q47163308,http://www.wikidata.org/entity/Q3298156,Deerfield Residence
3,http://www.wikidata.org/entity/Q47163308,http://www.wikidata.org/entity/Q4969437,"British Ambassador's residence in Washington, ..."
4,http://www.wikidata.org/entity/Q47163308,http://www.wikidata.org/entity/Q5501970,"French ambassador's residence in Washington, D.C."
5,http://www.wikidata.org/entity/Q47163308,http://www.wikidata.org/entity/Q7315412,Residence of the Ambassador of the Netherlands...
6,http://www.wikidata.org/entity/Q47163308,http://www.wikidata.org/entity/Q7382168,"Russian ambassador's residence in Washington, ..."
7,http://www.wikidata.org/entity/Q47163308,http://www.wikidata.org/entity/Q14954758,49 Belgrave Square
8,http://www.wikidata.org/entity/Q47163308,http://www.wikidata.org/entity/Q47163357,Dutch ambassador's residence in Tokyo
9,http://www.wikidata.org/entity/Q47163308,http://www.wikidata.org/entity/Q47163498,German ambassador's residence in Tokyo


# Find every subclass of `public building`

In [80]:
gov_building_subclass = 'Public_Building'
qnum = 'Q294422' # qnum of public building
r = createRelation(qnum, label=True) # create relation for Q1137809 = courthouses
r.extend('P279', True, gov_building_subclass, label=True) # extend via property P279 = is subclass of
r.query()
r.df

Unnamed: 0,Entity ID,Public_Building_P279,Public_Building_P279Label
0,http://www.wikidata.org/entity/Q294422,http://www.wikidata.org/entity/Q180370,hospital
1,http://www.wikidata.org/entity/Q294422,http://www.wikidata.org/entity/Q1137809,courthouse
2,http://www.wikidata.org/entity/Q294422,http://www.wikidata.org/entity/Q3917681,embassy
3,http://www.wikidata.org/entity/Q294422,http://www.wikidata.org/entity/Q16410648,Q16410648
4,http://www.wikidata.org/entity/Q294422,http://www.wikidata.org/entity/Q39364723,hospital building
5,http://www.wikidata.org/entity/Q294422,http://www.wikidata.org/entity/Q3530380,toguna
6,http://www.wikidata.org/entity/Q294422,http://www.wikidata.org/entity/Q55221816,Q55221816
7,http://www.wikidata.org/entity/Q294422,http://www.wikidata.org/entity/Q6908719,moot hall
8,http://www.wikidata.org/entity/Q294422,http://www.wikidata.org/entity/Q35054,post office
9,http://www.wikidata.org/entity/Q294422,http://www.wikidata.org/entity/Q32350958,bingo hall


# Find every subclass of `civic building`

In [81]:
gov_building_subclass = 'Civic_Building'
qnum = 'Q52177058' # qnum of civic building
r = createRelation(qnum, label=True) # create relation for Q1137809 = courthouses
r.extend('P279', True, gov_building_subclass, label=True) # extend via property P279 = is subclass of
r.query()
r.df

Unnamed: 0,Entity ID,Civic_Building_P279,Civic_Building_P279Label
0,http://www.wikidata.org/entity/Q52177058,http://www.wikidata.org/entity/Q367885,village hall
1,http://www.wikidata.org/entity/Q52177058,http://www.wikidata.org/entity/Q543654,city hall
2,http://www.wikidata.org/entity/Q52177058,http://www.wikidata.org/entity/Q1137809,courthouse
3,http://www.wikidata.org/entity/Q52177058,http://www.wikidata.org/entity/Q7138926,parliament building
4,http://www.wikidata.org/entity/Q52177058,http://www.wikidata.org/entity/Q5177802,county hall
5,http://www.wikidata.org/entity/Q52177058,http://www.wikidata.org/entity/Q1500368,Municipal office


# Conclusion for now

At this point, it became clear that manually going through to consolidate a list of all instances under "government buildings" is not the way to do this.

I asked Dinghao if there were plans to include some functionality in the system to address this, and we had a good correspondence about it.

#### My question to Dinghao:

I'd like to grab every government building in wikidata. However, because WikiData doesn't transitively populate instance relationships all the way through subclasses, when I query for all instances of "government building", only 117 buildings are listed as instances of "government building" (even though there are 400+ courthouses alone and courthouse is a subclass of government building).

With your code, is there away to recursively search through subclasses and find all government buildings?

In the SPARQL web interface, I am able to recursively see all the subclasses under "government buildings" by adding a plus sign (+) after the P-number for "is subclass of" (P279), and this shows me all the classes, as follows:

https://query.wikidata.org/#SELECT%20%3Fs%20%3Fdesc%0AWHERE%0A%7B%0A%20%20%3Fs%20wdt%3AP279%2B%20wd%3AQ16831714%20.%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%3Fs%20rdfs%3Alabel%20%3Fdesc%20filter%20%28lang%28%3Fdesc%29%20%3D%20%22en%22%29.%0A%20%20%20%7D%0A%20%7D

#### Dinghao's response:

Hi Jenny. A short answer is no. This is actually a very good point. This problem has actually been raised before. As you can see, "government buildings" actually has a lot of subclasses, and all of these subclasses may contain a lot of instances. This might require great computational cost. At that time, we hadn't build up our system yet, so we decided to shelve it. But now I believe it's a good time to reconsider this problem. I'll try to fix this problem this week if you are not in a hurry. But if you really need it to be done as soon as possible, please let me know. Thank you for your great question which improves the system.

# Next steps

Once Dinghao implements recursive collection of instances under a class, I'll use that to collect all instances under `Government Buildings` and filter + grab related data we want, and that will go into the knowledge graph.