---
Relation Extraction Exercises
=====

<img src="http://www.accessola2.com/olita/insideolita/wordpress/wp-content/uploads/2014/04/hitchens_app2_Slide9.jpg" style="width: 600px;"/>

We are going to explore [RDFLib](https://github.com/RDFLib/rdflib)

In [1]:
reset -fs

In [2]:
from io import StringIO

try:
    import rdflib
except ImportError:
    import pip
    pip.main(["install", "rdflib"])
    import rdflib

Collecting rdflib
  Downloading rdflib-4.2.1.tar.gz (889kB)
Collecting isodate (from rdflib)
  Downloading isodate-0.5.4.tar.gz
Building wheels for collected packages: rdflib, isodate
  Running setup.py bdist_wheel for rdflib: started
  Running setup.py bdist_wheel for rdflib: finished with status 'done'
  Stored in directory: /Users/brianspiering/Library/Caches/pip/wheels/5f/cc/22/bba65aa21c61d538dffb4960bd74ece936f3b0ef71ade47fb5
  Running setup.py bdist_wheel for isodate: started
  Running setup.py bdist_wheel for isodate: finished with status 'done'
  Stored in directory: /Users/brianspiering/Library/Caches/pip/wheels/8a/b6/2c/9346712a1822e562d16c4d6a5ae242ebf45128a8cb2e7b4abe
Successfully built rdflib isodate
Installing collected packages: isodate, rdflib
Successfully installed isodate-0.5.4 rdflib-4.2.1
RDFLib Version: 4.2.1


In [5]:
contents = '''subject|predicate|object
Jill|Likes|Snowboarding
Snowboarding|Is a|Sport
'''  

fake_file = StringIO(contents)

In [6]:
graph = rdflib.Graph()

for line in fake_file:
    triple = line.split("|")                       # triple is now a list of 3 strings
    triple = (rdflib.URIRef(t) for t in triple) # we have to wrap them in URIRef
    graph.add(triple)                           # and add to the graph

Is a does not look like a valid URI, trying to serialize this will break.


In [None]:
graph.

In [7]:
list(graph.subjects())

[rdflib.term.URIRef('subject'),
 rdflib.term.URIRef('Snowboarding'),
 rdflib.term.URIRef('Jill')]

In [8]:
list(graph.subject_predicates())

[(rdflib.term.URIRef('subject'), rdflib.term.URIRef('predicate')),
 (rdflib.term.URIRef('Snowboarding'), rdflib.term.URIRef('Is a')),
 (rdflib.term.URIRef('Jill'), rdflib.term.URIRef('Likes'))]

In [9]:
list(graph.subject_objects())

[(rdflib.term.URIRef('subject'), rdflib.term.URIRef('object\n')),
 (rdflib.term.URIRef('Snowboarding'), rdflib.term.URIRef('Sport\n')),
 (rdflib.term.URIRef('Jill'), rdflib.term.URIRef('Snowboarding\n'))]

---
Let's learn more about Donald Trump
----

![](http://vignette4.wikia.nocookie.net/animal-jam-clans-1/images/4/49/Donald-trump-hair-blown-by-wind.jpg/revision/latest?cb=20160203231507)

In [10]:
from IPython.display import IFrame

IFrame("http://dbpedia.org/page/Donald_Trump",
      width=800,
      height=600)

In [11]:
url = 'http://dbpedia.org/resource/Donald_Trump' # Note: 'resource', not 'page'

g = rdflib.Graph()
g.load(url)

In [12]:
# Take a peak at Subject, Predicate, Object Triples
print(next(g.subjects()), next(g.objects()), next(g.predicates()), sep=" | ")

http://dbpedia.org/resource/Donald_Trump | http://nl.dbpedia.org/resource/Donald_Trump | http://www.w3.org/2002/07/owl#sameAs


![](http://assets.devx.com/articlefigs/19556.jpg)

__1__) Find Donald Trump's birth place within the graph object 

[RTFM](https://rdflib.readthedocs.org/en/stable/intro_to_graphs.html)

__2__) Add "Fred_Trump" as "relation of" to the graph object

[RTFM](https://rdflib.readthedocs.org/en/stable/intro_to_creating_rdf.html)  
[Source](https://en.wikipedia.org/wiki/Donald_Trump)

__HINT__: Check out http://dbpedia.org/ontology/

---
THERE MUST BE A BETTER WAY!
---

SPARQL is SQL for RDF graphs.

SQL is a query langauge for tabular data (Pandas or databases)

SPARQL attempts to match patterns in the graph

In [13]:
try:
    from SPARQLWrapper import SPARQLWrapper, JSON
except ImportError:
    import pip
    pip.main(["install", "SPARQLWrapper"])
    from SPARQLWrapper import SPARQLWrapper, JSON

Collecting SPARQLWrapper
  Downloading SPARQLWrapper-1.7.6.tar.gz
Collecting keepalive>=0.5 (from SPARQLWrapper)
  Downloading keepalive-0.5.tar.gz
Building wheels for collected packages: SPARQLWrapper, keepalive
  Running setup.py bdist_wheel for SPARQLWrapper: started
  Running setup.py bdist_wheel for SPARQLWrapper: finished with status 'done'
  Stored in directory: /Users/brianspiering/Library/Caches/pip/wheels/25/f9/b1/1890e8eccdc8fbdba78e6e92b41e4fe4c6b9c9512ce57e0834
  Running setup.py bdist_wheel for keepalive: started
  Running setup.py bdist_wheel for keepalive: finished with status 'done'
  Stored in directory: /Users/brianspiering/Library/Caches/pip/wheels/34/0a/41/2d4e439360f77d3ac7fdf950f4dcf230a65e742bc0299c1535
Successfully built SPARQLWrapper keepalive
Installing collected packages: keepalive, SPARQLWrapper
Successfully installed SPARQLWrapper-1.7.6 keepalive-0.5


In [14]:
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
    SELECT ?pred ?object
    WHERE { <http://dbpedia.org/resource/Donald_Trump> ?pred ?object}
""")
sparql.setReturnFormat(JSON)
sparql.query().convert()

{'head': {'link': [], 'vars': ['pred', 'object']},
 'results': {'bindings': [{'object': {'type': 'uri',
     'value': 'http://dbpedia.org/class/yago/Administrator109770949'},
    'pred': {'type': 'uri',
     'value': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'}},
   {'object': {'type': 'uri',
     'value': 'http://dbpedia.org/class/yago/Adult109605289'},
    'pred': {'type': 'uri',
     'value': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'}},
   {'object': {'type': 'uri',
     'value': 'http://dbpedia.org/class/yago/Alumnus109786338'},
    'pred': {'type': 'uri',
     'value': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'}},
   {'object': {'type': 'uri',
     'value': 'http://dbpedia.org/class/yago/AmericanBillionaires'},
    'pred': {'type': 'uri',
     'value': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'}},
   {'object': {'type': 'uri',
     'value': 'http://dbpedia.org/class/yago/AmericanBusinessWriters'},
    'pred': {'type': 'uri',
     'value': 'http://www.w3

__3__) Find Donald Trump's birth place within the query results

[Help](https://www.w3.org/2009/Talks/0615-qbe/)

__HINTS__: Write your query as regular SQL then rewrite in SPARQL

__4__) Using sparql, walk the graph to find a list of all the people that are also born where Donald Trump was born.

It is okay for it take multiple steps.

---
Optional
----

Parse the abstract with str methods and find as many associated elements in the RDF graph.

Which ones were easy?

Which ones were hard?

How would you do the inverse, go from a RDF to a narrative summary?

<br>
<br>
----