# NLP4Stat - Query results from the Knowledge DB Demo

15-07-2021

Content of the notebook : 
- connection to the DB
- get the SQL tables'names
- KDB already populated (last demo), so just get some content from that

## Library import and connections to the DB

In [1]:
import os 
import re
import logging
import sys
import pyodbc
import hashlib
import pandas as pd
import numpy as np
from datetime import datetime
from SPARQLWrapper import SPARQLWrapper, POST, DIGEST, GET
from SPARQLWrapper import JSON, INSERT, DELETE
import sparql_dataframe

In [2]:
def connect_db(DSN, DBA, UID, PWD):

    connection = pyodbc.connect('DSN={};DBA={};UID={};PWD={}'.format(DSN, 
                                                                     DBA,
                                                                     UID,
                                                                     PWD))
    cursor = connection.cursor()

    return connection, cursor


def connect_virtuoso(DSN, UID, PWD):

    sparql = SPARQLWrapper(DSN)
    sparql.setHTTPAuth(DIGEST)
    sparql.setCredentials(UID, PWD)
    sparql.setMethod(GET)

    return sparql


In [3]:
# Connection to CDB 
connection, cursor = connect_db('Virtuoso All', 
                                'ESTAT', 
                                'dba', 
                                '30gFcpQzj7sPtRu5bkes')


# Connection to the KDB 
endpoint = "http://virtuoso-test.kapcode.fr:8890/sparql/"
sparql = connect_virtuoso(endpoint, 
                          'dba', 
                          '30gFcpQzj7sPtRu5bkes')


## Querying the populated KDB 

#### Get all links from the KDB 

In [43]:

SelectStatements = """
PREFIX estat: <https://nlp4statref/knowledge/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?id ?url ?title ?resourceinfo ?resourcetype 
WHERE { 

     ?estatid rdf:about ?id .
     ?estatid dct:title ?title .
     ?estatid estat:resourceInformation ?resourceinfo .
     ?estatid dct:type ?resourcetype.
     ?estatid dct:source ?url .

}
ORDER BY ASC(?id)
"""

sparql.setQuery(SelectStatements)
sparql.method = "POST"
sparql.setReturnFormat(JSON)
results = sparql.query().convert()['results']['bindings']
results = pd.json_normalize(results)
results = results.drop([x for x in results.columns if '.type' in x], axis = 1)
results.columns = results.columns.str.rstrip('.value')
results

Unnamed: 0,id,ur,tit,resourceinfo,resourcetyp
0,1,https://ec.europa.eu/eurostat/statistics-expla...,Accident at work,https://nlp4statref/knowledge/resource/authori...,https://nlp4statref/knowledge/resource/authori...
1,10,https://ec.europa.eu/eurostat/statistics-expla...,Gross domestic product (GDP),https://nlp4statref/knowledge/resource/authori...,https://nlp4statref/knowledge/resource/authori...
2,100,https://ec.europa.eu/eurostat/statistics-expla...,Toxicity,https://nlp4statref/knowledge/resource/authori...,https://nlp4statref/knowledge/resource/authori...
3,1000,https://ec.europa.eu/eurostat/statistics-expla...,Structural fund,https://nlp4statref/knowledge/resource/authori...,https://nlp4statref/knowledge/resource/authori...
4,10000,https://ec.europa.eu/eurostat/statistics-expla...,"Figure 3: EU-28, index of production (volume) ...",https://nlp4statref/knowledge/resource/authori...,Other
...,...,...,...,...,...
9995,9497,https://ec.europa.eu/eurostat/statistics-expla...,05 Ageing Europe Pensions income and expenditu...,https://nlp4statref/knowledge/resource/authori...,https://nlp4statref/knowledge/resource/authori...
9996,9498,https://ec.europa.eu/eurostat/databrowser/view...,"Figure 11: Aggregate replacement ratio, 2010 a...",https://nlp4statref/knowledge/resource/authori...,Other
9997,9499,https://ec.europa.eu/eurostat/databrowser/view...,"Figure 19: Mean consumption expenditure, by ty...",https://nlp4statref/knowledge/resource/authori...,Other
9998,95,https://ec.europa.eu/eurostat/statistics-expla...,Agriculture statistics at regional level,https://nlp4statref/knowledge/resource/authori...,https://nlp4statref/knowledge/resource/authori...


In [48]:
 results.iloc[9995][4]

'https://nlp4statref/knowledge/resource/authority/resource-type#statistic-reference-metadata'

### Search for a specific concept

In [31]:
SelectStatements = """
PREFIX estat: <https://nlp4statref/knowledge/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?id ?concept_id ?prefLabel ?definition ?related ?statisticalInformation ?sourceInformation ?altLabel
WHERE { 
     ?estatid skos:prefLabel "Business demography".
     ?estatid skos:Concept ?concept_id .
     ?estatid skos:prefLabel ?prefLabel.
     ?estatid skos:definition ?definition .

}
"""

sparql.setQuery(SelectStatements)
sparql.method = "POST"
sparql.setReturnFormat(JSON)
results = sparql.query().convert()['results']['bindings']
results = pd.json_normalize(results)
results = results.drop([x for x in results.columns if '.type' in x], axis = 1)
results.columns = results.columns.str.rstrip('.value')
results

Unnamed: 0,concept_id,prefLab,definition
0,2014,Business demography,Business demography covers: events in th...


In [32]:
results['definition'][0]

'Business demography    covers:    events in the life cycle of an enterprise such as      births and other creations of enterprises     ,      deaths and other cessations of units     , and their ratio to the      business population     ;     the follow-up of enterprises over time, thus offering information on their      survival     or discontinuity;     development over time of certain characteristics like      size     , thus offering information on the growth of enterprises, or a      cohort     of enterprises, by type of activity.    Summarizing, business demography statistics presents data on:    the active population of enterprises;     their birth;     survival (followed up to five years after birth), and;     death.'

#### Related links

In [52]:
SelectStatements = """
PREFIX estat: <https://nlp4statref/knowledge/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?related
WHERE { 
     estat:2014 skos:related ?related.
}
"""

sparql.setQuery(SelectStatements)
sparql.method = "POST"
sparql.setReturnFormat(JSON)
results = sparql.query().convert()['results']['bindings']
results = pd.json_normalize(results)
results = results.drop([x for x in results.columns if '.type' in x], axis = 1)
results.columns = results.columns.str.rstrip('.value')
results

Unnamed: 0,related
0,267
1,274
2,275
3,276


#### Statistical Information

In [53]:
SelectStatements = """
PREFIX estat: <https://nlp4statref/knowledge/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?statisticalInformation
WHERE { 
     estat:2014 estat:statisticalInformation ?statisticalInformation.
}
"""

sparql.setQuery(SelectStatements)
sparql.method = "POST"
sparql.setReturnFormat(JSON)
results = sparql.query().convert()['results']['bindings']
results = pd.json_normalize(results)
results = results.drop([x for x in results.columns if '.type' in x], axis = 1)
results.columns = results.columns.str.rstrip('.value')
results


Unnamed: 0,statisticalInformation
0,277
1,2014


#### Source information

In [54]:
SelectStatements = """
PREFIX estat: <https://nlp4statref/knowledge/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?statisticalInformation
WHERE { 
     estat:2014 estat:sourceInformation ?sourceInformation.
}
"""

sparql.setQuery(SelectStatements)
sparql.method = "POST"
sparql.setReturnFormat(JSON)
results = sparql.query().convert()['results']['bindings']
results = pd.json_normalize(results)
results = results.drop([x for x in results.columns if '.type' in x], axis = 1)
results.columns = results.columns.str.rstrip('.value')
results

### Search for a description that contains the concept

In [55]:
SelectStatements = """
PREFIX estat: <https://nlp4statref/knowledge/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?id ?resourceinfo ?resourcetype ?description
WHERE { 
     ?estatid skos:description ?description.
     ?estatid rdf:about ?id.
     ?estatid estat:resourceInformation ?resourceinfo .
     ?estatid dct:type ?resourcetype.
     FILTER regex(?description, "Business demography", "i")

}
"""

sparql.setQuery(SelectStatements)
sparql.method = "POST"
sparql.setReturnFormat(JSON)
results = sparql.query().convert()['results']['bindings']
results = pd.json_normalize(results)
results = results.drop([x for x in results.columns if '.type' in x], axis = 1)
results.columns = results.columns.str.rstrip('.value')
results

Unnamed: 0,id,resourceinfo,resourcetyp,description
0,3754,https://nlp4statref/knowledge/resource/authori...,https://nlp4statref/knowledge/resource/authori...,Eurostat compiles data on culture-related ente...
1,9133,https://nlp4statref/knowledge/resource/authori...,https://nlp4statref/knowledge/resource/authori...,Business demography data has been collected on...
2,6479,https://nlp4statref/knowledge/resource/authori...,https://nlp4statref/knowledge/resource/authori...,Eurostatâs structural business statistics de...
3,7736,https://nlp4statref/knowledge/resource/authori...,https://nlp4statref/knowledge/resource/authori...,This is the first publication covered by the m...


### Select articles that uses one of the dataset

In [56]:
SelectStatements = """
PREFIX estat: <https://nlp4statref/knowledge/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?articleid ?title ?url 
WHERE { 
     ?estatid estat:StatisticsExplainedData "3754".
     ?estatid rdf:about ?articleid.
     ?estatid dct:title ?title .
     ?estatid dct:source ?url .
     
}
"""

sparql.setQuery(SelectStatements)
sparql.method = "POST"
sparql.setReturnFormat(JSON)
results = sparql.query().convert()['results']['bindings']
results = pd.json_normalize(results)
results = results.drop([x for x in results.columns if '.type' in x], axis = 1)
results.columns = results.columns.str.rstrip('.value')
results

Unnamed: 0,articleid,tit,ur
0,3752,Culture statistics - cultural enterprises,https://ec.europa.eu/eurostat/statistics-expla...


#### relatedEditorialContent

In [61]:
SelectStatements = """
PREFIX estat: <https://nlp4statref/knowledge/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?relatedEditorialContent ?title
WHERE { 
     estat:3752 estat:relatedEditorialContent ?relatedEditorialContent.
     ?editorial rdf:about ?relatedEditorialContent.
     ?editorial dct:title ?title.
     
}
"""

sparql.setQuery(SelectStatements)
sparql.method = "POST"
sparql.setReturnFormat(JSON)
results = sparql.query().convert()['results']['bindings']
results = pd.json_normalize(results)
results = results.drop([x for x in results.columns if '.type' in x], axis = 1)
results.columns = results.columns.str.rstrip('.value')
results

Unnamed: 0,relatedEditorialContent,tit
0,3770,European statistical system network on culture...


#### relatedLegalInformation

In [58]:
SelectStatements = """
PREFIX estat: <https://nlp4statref/knowledge/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?relatedLegalInformation ?title
WHERE { 
     estat:3752 estat:relatedLegalInformation ?relatedLegalInformation.
     ?legalinfo rdf:about ?relatedLegalInformation.
     ?legalinfo dct:title ?title.
     
}
"""

sparql.setQuery(SelectStatements)
sparql.method = "POST"
sparql.setReturnFormat(JSON)
results = sparql.query().convert()['results']['bindings']
results = pd.json_normalize(results)
results = results.drop([x for x in results.columns if '.type' in x], axis = 1)
results.columns = results.columns.str.rstrip('.value')
results

Unnamed: 0,relatedLegalInformation,tit
0,3774,European Council Work Plan for Culture (2019-2...
1,3775,European Council Work Plan for Culture (2015-2...
2,3776,Regulation (EU) No 1295/2013 of the European P...
3,3777,Summaries of EU Legislation: Creative Europe P...
4,3778,Communication from the Commission to the Europ...


#### relatedStatisticData

In [60]:
SelectStatements = """
PREFIX estat: <https://nlp4statref/knowledge/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?relatedStatisticData ?title
WHERE { 
     estat:3752 estat:relatedStatisticData ?relatedStatisticData.
     ?statinfo rdf:about ?relatedStatisticData.
     ?statinfo dct:title ?title.
     
}
"""

sparql.setQuery(SelectStatements)
sparql.method = "POST"
sparql.setReturnFormat(JSON)
results = sparql.query().convert()['results']['bindings']
results = pd.json_normalize(results)
results = results.drop([x for x in results.columns if '.type' in x], axis = 1)
results.columns = results.columns.str.rstrip('.value')
results

Unnamed: 0,relatedStatisticDat,tit
0,3754,Cultural enterprises CP2021.xlsx
1,3771,Enterprises in cultural sectors
2,3772,Structural business statistics
3,3773,Business demography


#### Details on one of the related links

In [42]:

SelectStatements = """
PREFIX estat: <https://nlp4statref/knowledge/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?id ?url ?title ?resourceinfo ?resourcetype 
WHERE { 

     ?estatid rdf:about "3774".
     ?estatid rdf:about ?id .
     ?estatid dct:title ?title .
     ?estatid estat:resourceInformation ?resourceinfo .
     ?estatid dct:type ?resourcetype.
     ?estatid dct:source ?url .

}
ORDER BY ASC(?id)
"""

sparql.setQuery(SelectStatements)
sparql.method = "POST"
sparql.setReturnFormat(JSON)
results = sparql.query().convert()['results']['bindings']
results = pd.json_normalize(results)
results = results.drop([x for x in results.columns if '.type' in x], axis = 1)
results.columns = results.columns.str.rstrip('.value')
results

Unnamed: 0,id,ur,tit,resourceinfo,resourcetyp
0,3774,https://eur-lex.europa.eu/legal-content/EN/TXT...,European Council Work Plan for Culture (2019-2...,https://nlp4statref/knowledge/resource/authori...,https://nlp4statref/knowledge/resource/authori...
