# TASK: Model the data in RDF



## Overview
In this task, your assignment is to model the 'input_movie_data.csv' data set as Linked Open Data represented in RDF. 
This means, entities and relationships must be represented via IRIs. A good Linked Data best practice is to re-use existing RDF vocabularies for such a process.

## Task details
1. Re-use the schema.org vocabulary (https://schema.org/) to model the data in RDF. 
2. Use self-defined valid URLs as IRIs to represent all entities, using your first- and last name as domain. For example, the movie “The Godfather” can be represented with http://firstname-lastname.org/resource/the_godfather  
> __HINT__: URLs must be valid, but do not have to resolve!
3. Re-using schema.org, find the classes of the corresponding instances and represent the instances with that class (or classes). For example, if we were re-using the DBpedia vocabulary, the movie “The Godfather” would be of type <http://dbpedia.org/ontology/Film> resulting in a triple representation: <br>
`<http://firstname-lastname.org/resource/the_godfather> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Film>` 

> Find such RDF classes in the schema.org vocabulary to specify the rdf:type properties for movie and person.  

4. Re-using schema.org, find the appropriate object property specifying that a movie has a director. 
> __Hint__: Some movies have more than one director
5. In addition, find appropriate properties to describe 
  - the name of a movie including the English language tag, 
  - the publication year of a movie including the literal data type of year 
  >__HINT__: not xsd:date, see xsd definition for correct data type for year
  - the name of a person
  > __HINT__: Watch out for the correct domain and range of properties, as well as subclass relationships
  
<br><br>

## Submission 1

Save the data set that you have represented in RDF (in N3) to the output_data folder with the naming __movies_task_1.n3__.

<br>

## Your code

In [1]:
# Example code

In [49]:
from rdflib import URIRef, Literal, Graph, Namespace
from rdflib.namespace import FOAF, RDF, RDFS, XSD, DC
import urllib
import csv
from datetime import datetime
from SPARQLWrapper import SPARQLWrapper, JSON, N3

In [50]:
# Defines further Namespaces
EX = Namespace("https://ex1.org/")
DBO = Namespace("http://dbpedia.org/ontology/")
RSC = Namespace("http://philip-broehl.org/resource/")

In [53]:
input_file = open("../input_data/input_movie_data.csv")
text = csv.reader(input_file, delimiter = ',')

g = Graph()
line_count = 0
for line in text:
    if line_count == 0:
        line_count += 1
    else:
        title = line[1]
        film_resource = title.lower().replace(' ', '_')  # just for valid URI format
        
        year = line[2]
        director = line[3]
        director_list = director.split(', ')  # there can be multiple directors
        
        g.add((URIRef(RSC[film_resource]), RDF.type, DBO.Film))
        g.add((URIRef(RSC[film_resource]), RDFS.label, Literal(title, lang = 'en')))
        g.add((URIRef(RSC[film_resource]), XSD.gYear, Literal(year)))
        
        for director in director_list: 
            director_resource = director.lower().replace(' ', '_')
            g.add((URIRef(RSC[director_resource]), RDF.type, DBO.Person))
            g.add((URIRef(RSC[director_resource]), FOAF.name, Literal(director, lang = 'en')))        
            g.add((URIRef(RSC[film_resource]), DBO.director, URIRef(RSC[director_resource])))
        
print(g.serialize(format="n3").decode("utf-8"))

g.serialize(destination='../output_data/movies_task_1.n3', format='n3')

@prefix ns1: <http://dbpedia.org/ontology/> .
@prefix ns2: <http://xmlns.com/foaf/0.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://philip-broehl.org/resource/12_angry_men> a ns1:Film ;
    rdfs:label "12 Angry Men"@en ;
    ns1:director <http://philip-broehl.org/resource/sidney_lumet> ;
    xsd:gYear "1957" .

<http://philip-broehl.org/resource/12_years_a_slave> a ns1:Film ;
    rdfs:label "12 Years a Slave"@en ;
    ns1:director <http://philip-broehl.org/resource/steve_mcqueen> ;
    xsd:gYear "2013" .

<http://philip-broehl.org/resource/2001:_a_space_odyssey> a ns1:Film ;
    rdfs:label "2001: A Space Odyssey"@en ;
    ns1:director <http://philip-broehl.org/resource/stanley_kubrick> ;
    xsd:gYear "1968" .

<http://philip-broehl.org/resource/a_beautiful_mind> a ns1:Film ;
    rdfs:label "A Beautiful Mind"@en ;
    ns1:director <http://philip-broehl.org/resource/ron_howard> ;
    xsd:gYear "2001" .

<http://ph

In [22]:
FOAF.title

rdflib.term.URIRef('http://xmlns.com/foaf/0.1/title')

In [35]:
s = "Hubert, Peter, Heike"
s = s.split(', ')
s[0].lower().replace('u', 'ooo')

'hooobert'