## In this lab, we will explore how to work with RDF files in Python3 as well as using open source software Protégé. We will:


1. Install python3 module rdflib for working with rdf in python3.
2. Load rdf file from the disk and internet using rdflib.
3. Add new triples to the loaded rdf file.
4. Convert the rdf file to different formats.
5. Installation of Protégé on our systems.
7. Load the rdf files on Protégé.  


In [1]:
# Install rdfllib on this colab file.
!pip3 install rdflib
!pip3 install rdflib-jsonld
!pip3 install sparqlwrapper

Collecting rdflib
  Downloading rdflib-6.2.0-py3-none-any.whl (500 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m500.3/500.3 KB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting isodate
  Downloading isodate-0.6.1-py2.py3-none-any.whl (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.7/41.7 KB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: isodate, rdflib
Successfully installed isodate-0.6.1 rdflib-6.2.0
Collecting rdflib-jsonld
  Downloading rdflib_jsonld-0.6.2-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: rdflib-jsonld
Successfully installed rdflib-jsonld-0.6.2
Collecting sparqlwrapper
  Downloading SPARQLWrapper-2.0.0-py3-none-any.whl (28 kB)
Installing collected packages: sparqlwrapper
Successfully installed sparqlwrapper-2.0.0


## Load rdf file from the disk and internet using rdflib.

To load a file from the google drive in colab, we would need to:

1.   Mount our respective disk space.
2.   Load the file in python



In [None]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)


# Go the folder where you have uploaded the data file. Ideally, go to your google drive, make a folder "Colab Notebooks/LD_Lab_3" and put the rdf file in it.

In [55]:
#Now that we have the colab file and data in place, we will load the file.

# In the lines below, write the code to load a rdf file "Sharknado.rdf" using rdflib.

from rdflib import Graph, plugin
from SPARQLWrapper import SPARQLWrapper
from rdflib.serializer import Serializer

g = Graph()
g.parse("Sharknado.rdf", format="n3")

<Graph identifier=N2475e57fc04c4275abe84f9be20a9e85 (<class 'rdflib.graph.Graph'>)>

In [5]:
# Print all the triples present in the file
for subj, pred, obj in g:
    # check if there is at least one triple in the Graph
    print(str(subj)+" --- "+str(pred)+" ---> "+str(obj))
print("graph has {} statements.".format(len(g)))

http://ex.org/Sharknado --- http://ex.org/stars ---> http://ex.org/IanZiering
http://ex.org/Sharknado --- http://ex.org/title ---> Sharknado
http://ex.org/Sharknado --- http://ex.org/firstAired ---> 2013-07-11
http://ex.org/Sharknado --- http://ex.org/stars ---> http://ex.org/JohnHeard
http://ex.org/Sharknado --- http://www.w3.org/1999/02/22-rdf-syntax-ns#type ---> http://ex.org/Movie
graph has 5 statements.


In [6]:
# Print the rdf file in turtle format
print(g.serialize(format="turtle"))

# Print the rdf file in Ntriples format
print(g.serialize(format="nt"))

# Print the rdf file in xml format
print(g.serialize(format="xml"))

# Print the rdf file in Json-LD format
print(g.serialize(format="json-ld"))


@prefix ex: <http://ex.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:Sharknado a ex:Movie ;
    ex:firstAired "2013-07-11"^^xsd:date ;
    ex:stars ex:IanZiering,
        ex:JohnHeard ;
    ex:title "Sharknado"@en .


<http://ex.org/Sharknado> <http://ex.org/stars> <http://ex.org/IanZiering> .
<http://ex.org/Sharknado> <http://ex.org/title> "Sharknado"@en .
<http://ex.org/Sharknado> <http://ex.org/firstAired> "2013-07-11"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://ex.org/Sharknado> <http://ex.org/stars> <http://ex.org/JohnHeard> .
<http://ex.org/Sharknado> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ex.org/Movie> .

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
   xmlns:ex="http://ex.org/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:about="http://ex.org/Sharknado">
    <rdf:type rdf:resource="http://ex.org/Movie"/>
    <ex:firstAired rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2013-07-11</ex:first

In [60]:
# https://rdflib.readthedocs.io/en/stable/intro_to_creating_rdf.html
from rdflib import URIRef, BNode, Literal
from rdflib.namespace import FOAF

tara = URIRef("http://ex.org/people/Tara")
node = BNode()  # a GUID is generated

name = Literal('Tara Reid')  # passing a string
g.bind("foaf", FOAF) #<- bind URI of FOAF to "foaf"
g.add((tara, FOAF.name, name))
# Add the relation between node: tara and the node: sharknado.


NameError: name 'foaf' is not defined

In [8]:
# Tara Reid hasActed Sharknado
hasActed = URIRef("http://ex.org/hasActed")
sharknado = URIRef("http://ex.org/Sharknado")

In [9]:
g.add((tara, hasActed, sharknado))

<Graph identifier=Need67ed58eb34328b302e7e37f429791 (<class 'rdflib.graph.Graph'>)>

In [59]:
print(g.serialize(format="nt"))
print(g.serialize(format="turtle"))

<http://ex.org/people/Tara> "Tara Reid" "Tara Reid" .
<http://ex.org/Sharknado> <http://ex.org/stars> <http://ex.org/IanZiering> .
<http://ex.org/Sharknado> <http://ex.org/title> "Sharknado"@en .
<http://ex.org/people/Tara> <http://xmlns.com/foaf/0.1/name> "Tara Reid" .
<http://ex.org/Sharknado> <http://ex.org/firstAired> "2013-07-11"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://ex.org/Sharknado> <http://ex.org/stars> <http://ex.org/JohnHeard> .
<http://ex.org/Sharknado> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ex.org/Movie> .

@prefix ex: <http://ex.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:Sharknado a ex:Movie ;
    ex:firstAired "2013-07-11"^^xsd:date ;
    ex:stars ex:IanZiering,
        ex:JohnHeard ;
    ex:title "Sharknado"@en .

<http://ex.org/people/Tara> foaf:name "Tara Reid" ;
    "Tara Reid" "Tara Reid" .




**Given the following information, construct an RDF graph and dump the rdf file into nt format and JSON-LD format:**

Alice and Bob know each other. They both work for a company named Fictional Dynamics. Bob also knows Charlie. Charlie works for a company named Actual Dynamics. Charlie knows his colleague Dave. Alice is a female. Bob, Charlie and Dave are men. Bob was born on 01-01‑1990. Alice was born on 02-02-1884. Charlie was born on 03-03-1992. Dave as born on 04-04-1991. Both Bob and Charlie have a monthly salary of 10,000. Alice has a monthly salary of 12,000, while Dave has a monthly salary of 15,000. Alice’s full name is “Alice Smith” and Bob’s full name is “Bob Wilson”.

You can consider the URI prefix ex: http://ex.org/ for all cases.

In [105]:
# Solution
from rdflib import URIRef, BNode, Literal, Graph, XSD
from rdflib.namespace import RDF, FOAF, ORG

g = Graph()

alice = URIRef("http://ex.org/people/Alice")
bob = URIRef("http://ex.org/people/Bob")
charlie = URIRef("http://ex.org/people/Charlie")
fictional_dynamics = URIRef("http://ex.org/Fictional%sDynamics")
actual_dynamics = URIRef("http://ex.org/Actual%sDynamics")
organization = URIRef("http://ex.org/ORG")
dave = URIRef("http://ex.org/people/Dave")
male = URIRef("http://ex.org/Male")
female = URIRef("http://ex.org/Female")

alice_name = Literal("Alice Smith")
bob_name = Literal("Bob Wilson")
fictional_dynamics_name = Literal("Fictional Dynamics")
actual_dynamics_name = Literal("Actual Dynamics")
alice_dob = Literal('1884-02-02',datatype=XSD.date)
bob_dob = Literal('1990-01-01',datatype=XSD.date)
charlie_dob = Literal('1992-03-03',datatype=XSD.date)
dave_dob = Literal('1991-04-04',datatype=XSD.date)
alice_salary = Literal('12000',datatype=XSD.integer)
bob_salary = Literal('10000', datatype=XSD.integer)
charlie_salary = Literal('10000', datatype=XSD.integer)
dave_salary = Literal('15000', datatype=XSD.integer)


worksFor = URIRef("http://ex.org/people/worksFor")
hasGender = URIRef("http://ex.org/people/hasGender")
hasCompanyName = URIRef("http://ex.org/people/hasCompanyName")
hasFullName = URIRef("http://ex.org/people/hasFullName")
hasDOB = URIRef("http://ex.org/people/hasDOB")
hasMonthlySalary =URIRef("http://ex.org/people/hasMonthlySalary")
hasColleague = URIRef("http://ex.org/people/hasColleague")

knows = URIRef("http://ex.org/people/knows")

g.add((fictional_dynamics, hasCompanyName, fictional_dynamics_name))
g.add((actual_dynamics, hasCompanyName, actual_dynamics_name))


g.add((alice, worksFor, fictional_dynamics))
g.add((alice, hasGender, female))
g.add((alice, hasDOB, alice_dob))
g.add((alice, hasMonthlySalary, alice_salary))
g.add((alice, hasFullName, alice_name))
g.add((alice, knows, bob))

g.add((bob, worksFor, fictional_dynamics))
g.add((bob, hasGender, male))
g.add((bob, hasDOB, bob_dob))
g.add((bob, hasMonthlySalary, bob_salary))
g.add((bob, hasFullName, bob_name))
g.add((bob, knows, charlie))
g.add((bob, knows, alice))

g.add((charlie, worksFor, actual_dynamics))
g.add((charlie, hasGender, male))
g.add((charlie, hasDOB, charlie_dob))
g.add((charlie, hasMonthlySalary, charlie_salary))
g.add((charlie, knows, dave))

g.add((dave, worksFor, actual_dynamics))
g.add((dave, hasGender, male))
g.add((dave, hasDOB, dave_dob))
g.add((dave, hasMonthlySalary, dave_salary))

<Graph identifier=N81f4af99acee48fc9ca677e1c1ad93e6 (<class 'rdflib.graph.Graph'>)>

In [35]:
# print(g.serialize(format="turtle"))
# print(g.serialize(format="json-ld"))
# print(g.serialize(format="xml"))


Extract the following information using SPARQL questies:

1. Extract the people who earn more than 10,000.
2. Extract the people who earn more than 10,000 and were born before the year 1994.
3. Extract the people who earn more than 10,000 or were born in the year 1994.
Find the distinct salaries received by people and list them in sorted order.
4. Delete the information about "Bob" from the dataset. #processUpdate

In [75]:
query = """
    PREFIX ex: <http://ex.org/>
    SELECT DISTINCT ?a
    WHERE {
        ?a ex:people/hasMonthlySalary ?b .
        FILTER(?b > "10000"^^xsd:integer)
    }
"""

query = """
    SELECT DISTINCT ?a
    WHERE {
        ?a <http://ex.org/people/hasMonthlySalary> ?b .
        FILTER(?b > "10000"^^xsd:integer)
    }
"""

res = g.query(query)
for row in res:
    print(f"{row.a}")


http://ex.org/people/Alice
http://ex.org/people/Dave


In [79]:
query = """
    SELECT DISTINCT ?a
    WHERE {
        ?a <http://ex.org/people/hasMonthlySalary> ?b .
        ?a <http://ex.org/people/hasDOB> ?c .
        FILTER(?b > "10000"^^xsd:integer && ?c < "1994-01-01"^^xsd:date)
    }
"""

res = g.query(query)
for row in res:
    print(f"{row.a}")

http://ex.org/people/Alice
http://ex.org/people/Dave


In [81]:
query = """
    SELECT DISTINCT ?b
    WHERE {
        ?a <http://ex.org/people/hasMonthlySalary> ?b .
        ?a <http://ex.org/people/hasDOB> ?c .
        FILTER(?b > "10000"^^xsd:integer || ?c < "1994-01-01"^^xsd:date)
    }
    ORDER BY ASC(?b)
"""

res = g.query(query)
for row in res:
    print(f"{row.b}")

10000
12000
15000


In [114]:
from rdflib import plugins
# from rdflib.sparql.processor import processUpdate

query = """
    DELETE
    {?s ?p ?o}
    where {
    values(?s) {(<http://ex.org/people/Bob>)}
    ?s ?p ?o }
"""

plugins.sparql.processor.processUpdate(g, query)
# print(g.serialize(format="turtle"))

In [115]:
print(g.serialize(format="turtle"))

@prefix ns1: <http://ex.org/people/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ns1:Alice ns1:hasDOB "1884-02-02"^^xsd:date ;
    ns1:hasFullName "Alice Smith" ;
    ns1:hasGender <http://ex.org/Female> ;
    ns1:hasMonthlySalary 12000 ;
    ns1:knows ns1:Bob ;
    ns1:worksFor <http://ex.org/Fictional%sDynamics> .

ns1:Charlie ns1:hasDOB "1992-03-03"^^xsd:date ;
    ns1:hasGender <http://ex.org/Male> ;
    ns1:hasMonthlySalary 10000 ;
    ns1:knows ns1:Dave ;
    ns1:worksFor <http://ex.org/Actual%sDynamics> .

<http://ex.org/Fictional%sDynamics> ns1:hasCompanyName "Fictional Dynamics" .

ns1:Dave ns1:hasDOB "1991-04-04"^^xsd:date ;
    ns1:hasGender <http://ex.org/Male> ;
    ns1:hasMonthlySalary 15000 ;
    ns1:worksFor <http://ex.org/Actual%sDynamics> .

<http://ex.org/Actual%sDynamics> ns1:hasCompanyName "Actual Dynamics" .


