# Task: Find corresponding data in DBpedia and Wikidata

## Overview:
Find instances in DBpedia and Wikidata equivalent/co-refer to the movies in your RDF data set, this means, find resources that you can link to via the _owl:sameAs_ property


## Task Details 

1. Use the SPARQL Endpoints of DBpedia (https://dbpedia.org/sparql) and Wikidata (https://query.wikidata.org) to get the data
> __Hint__: Try a few queries in the SPARQL endpoints before incporporating the query in your code
2. If using Python, you  can query the SPARQL endpoints with RDFLib or SPARQLWrapper. To find the correct match, you can use the title of the movie, its publication date (or year), and/or the directors' names. 
> __Hint__: Exact matches do not always get the desired results, as labels might be different across Knowledge bases, e.g., “Charles Chaplin” vs. “Charlie Chaplin”
3. If a match is found, add the one correct _owl:sameAs_ link to the DBpedia and the one correct link to the Wikidata resource to your dataset from Task 1, e.g.,
`<https://firstname-lastname.org/resource/the_godfather> owl:sameAs  <http://dbpedia.org/resource/The_Godfather>`
> __Hint__: Use RDFLib to load the data you have saved in Task 1 and add the links to the corresponding movies

<br>

## Sumission 2: 

Save the new dataset containing _owl:sameAs_ statements in N3 in the output folder with the naming __movies_task_2.n3__.


<br>

## Your code

In [137]:
from rdflib import URIRef, Literal, Graph, Namespace
from rdflib.namespace import FOAF, RDF, RDFS, XSD, DC
import urllib
from datetime import datetime
from SPARQLWrapper import SPARQLWrapper, JSON, N3

In [181]:
sparql_query = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX dc:  <http://purl.org/dc/elements/1.1/>

    Select distinct ?scientist ?name ?birthdate ?description ?thumbnail WHERE {
        ?scientist rdf:type dbo:Scientist ;           
            rdfs:label ?name ;
            dbo:birthDate ?birthdate; 
            dct:description ?description .
     FILTER ((lang(?name)="en") && (lang(?description)="en") 
                 && (SUBSTR(STR(?birthdate),6)=SUBSTR(STR(bif:curdate('')),6)) 
                 && (STRLEN(STR(?birthdate))>6)) .
     OPTIONAL {?scientist dbo:thumbnail ?thumbnail }.
    }
"""


sparql_query_db = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

    Select distinct ?movie ?title ?director ?director_name WHERE {
        ?movie rdf:type dbo:Film ;
            rdfs:label ?title ;
            dbo:director ?director .
        ?director foaf:name ?director_name .
    FILTER (lang(?title) = "en" && contains(?title, "The Apartment") || contains("The Apartment", ?title)) 
    FILTER (contains(?director_name, "Billy Wilder") || contains("Billy Wilder", ?director_name)) .
    }
"""

# create the query with positional variables which we can format later
sparql_query_db = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

    Select distinct ?movie ?title ?director ?director_name WHERE {
        ?movie rdf:type dbo:Film ;
            rdfs:label ?title ;
            dbo:director ?director .
        ?director foaf:name ?director_name .
    FILTER (lang(?title) = "en" && contains(?title, {title}) || contains({title}, ?title)) 
    FILTER (contains(?director_name, {director}) || contains({director}, ?director_name)) .
    }
"""

sparql_query_wd = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    
    Select distinct ?movie ?title ?director ?director_name WHERE {
        ?movie rdf
    }

"""

In [165]:
sparql = SPARQLWrapper("https://dbpedia.org/sparql")
sparql.setQuery(sparql_query_db)

sparql.setReturnFormat(JSON)
results = sparql.query().convert()

In [184]:
sparql = SPARQLWrapper("https://dbpedia.org/sparql")
file = open('../output_data/movies_task_1.n3')
type(file)

_io.TextIOWrapper

In [185]:
Graph?

In [166]:
results['results']['bindings']

[{'movie': {'type': 'uri',
   'value': 'http://dbpedia.org/resource/The_Apartment'},
  'title': {'type': 'literal', 'xml:lang': 'en', 'value': 'The Apartment'},
  'director': {'type': 'uri',
   'value': 'http://dbpedia.org/resource/Billy_Wilder'},
  'director_name': {'type': 'literal',
   'xml:lang': 'en',
   'value': 'Billy Wilder'}}]

In [130]:
for line in results['results']['bindings']:
    if 'Pramathesh Barua' in line['director_name']['value'] :
        print(line['director']['value'])

http://dbpedia.org/resource/Pramathesh_Barua
http://dbpedia.org/resource/Pramathesh_Barua
http://dbpedia.org/resource/Pramathesh_Barua
