# Self study 9

## Libraries
For this task you will need the RDFLib libraries. You can find the documentation at this [link](https://rdflib.readthedocs.io/en/stable/). After installing it, you can execute the following command.

In [23]:
from rdflib import Graph
from rdflib.term import BNode, Literal, URIRef
from rdflib.namespace import RDF

## Tasks
In this task you are expected to collect some RDF graphs from web and merge them into a local RDF graph. 

We will use RDF graphs from DBLP, a web site describing bibliographic data about scientific article in computer science. Each researcher has an associated DBLP person ID. For example, the person ID of Daniele is 57/8026. The code can be used to create Daniele's IRI: [http://dblp.org/pid/57/8026](http://dblp.org/pid/57/8026). Starting from here, it is possible to access different representations of the resource by adding a file extension. For example:

* By adding ".html", one can access the HTML page describing Daniele: [http://dblp.org/pid/57/8026.html](http://dblp.org/pid/57/8026.html)
* By adding ".rdf", one can access the XML/RDF page describing Daniele: [http://dblp.org/pid/57/8026.rdf](http://dblp.org/pid/57/8026.rdf)
* By adding ".ttl", one can access the Turtle page describing Daniele: [http://dblp.org/pid/57/8026.ttl](http://dblp.org/pid/57/8026.ttl)

### Task 1
The code below constructs a simple crawler to start from a person and collect all articles written by that person or their coauthors (i.e., only using two types of links: authorOf and coCreatorWith). To fulfill the task you should extend it as follows:

* Select about 5 teachers or project supervisors you have worked with. 
* Identify their DBLP personal identifiers, and consequently their IRIs.
* Collect the RDF describing them and combine them in a RDF graph
* Analyse the graph (i.e. iterate over the graph and check its content) to answer the following questions:
    * How many articles have they authored?
    * Who is the person who co-authored with more people?
    * What are their Google Scholar IDs?

### Task 2
Starting from the IRIs of the people you used in Task 1, create a co-author network, i.e., a graph describing who wrote papers with whom. In this graph, the nodes represent people, and the edges indicate that the two nodes wrote an article together. Create this graph as a knowledge graph, using the Graph class from RDFLib. 

In detail:

* Use the profiles you used in Task 1 as seeds, and collect their RDF graphs.
* Populate the co-author network with the information about the co-authorships. For each author, store its IRI and the Literal with the name of the author.
* Develop a strategy to decide the next person to crawl.
* Stop when the co-authors network contains ca. 1000 authors. 

### Code snippets
Feel free to reuse the following code to solve the above tasks.

This function retrieves a graph from DBLP, given the IRI.

In [31]:
def get_graph(iri: str) -> Graph:
    g = Graph()
    g.parse(iri + '.nt', format='nt')
    return g

This example uses the IRI of Daniele in DBLP and it retrieves the graph describing him, his publications and his collaborators.

In [15]:
daniele_iri = "https://dblp.org/pid/57/8026"
daniele_graph = get_graph(daniele_iri)

The following code counts the statements stored in the graph.

In [30]:
i = 0
for s, p, o in daniele_graph:
    i += 1

print(f"Number of triples: {i}")

Number of triples: 221


The following code prints the statements where the object is a literal

In [29]:
for s, p, o in daniele_graph:
    if isinstance(o, Literal):
        print(s, p, o)

https://dblp.org/pid/57/8026 https://dblp.org/rdf/schema#creatorName Daniele Dell'Aglio
Na6989d274a82427da77627a194ba4394 http://purl.org/spar/literal/hasLiteralValue 1217255869
https://dblp.org/pid/57/8026 https://dblp.org/rdf/schema#primaryCreatorName Daniele Dell'Aglio
N682b05e24f2a41918823633701c76b6f http://purl.org/spar/literal/hasLiteralValue 81485655252
https://dblp.org/pid/57/8026 http://www.w3.org/2000/01/rdf-schema#label Daniele Dell'Aglio
N434ebc633e8040ed9eb68b3c089ebb2b http://purl.org/spar/literal/hasLiteralValue Q57225906
Nba74d284024b4b35a9bace8445b8f0a6 http://purl.org/spar/literal/hasLiteralValue 57/8026
https://dblp.org/pid/57/8026.nt http://www.w3.org/2000/01/rdf-schema#label provenance information for RDF data of dblp person '57/8026'
Nfb66a5032c6844e5bae77a284bf0756d http://purl.org/spar/literal/hasLiteralValue dandellaglio
Neb1238ec9a264717959b071a1bebc9cb http://purl.org/spar/literal/hasLiteralValue 0000-0003-4904-2511
https://dblp.org/pid/57/8026.nt http://pur

This code prints the statements having rdf:type as predicate

In [28]:
for s, p, o in daniele_graph.triples((None, RDF.type, None)):
    print(s, p, o)

https://dblp.org/pid/57/8026 http://www.w3.org/1999/02/22-rdf-syntax-ns#type https://dblp.org/rdf/schema#Person
https://dblp.org/pid/57/8026 http://www.w3.org/1999/02/22-rdf-syntax-ns#type https://dblp.org/rdf/schema#Creator
Nba74d284024b4b35a9bace8445b8f0a6 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.org/spar/datacite/PersonalIdentifier
N6bf36e7957fb4bec92c2c799890e47d9 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.org/spar/datacite/PersonalIdentifier
N682b05e24f2a41918823633701c76b6f http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.org/spar/datacite/PersonalIdentifier
Nfb66a5032c6844e5bae77a284bf0756d http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.org/spar/datacite/PersonalIdentifier
Neb1238ec9a264717959b071a1bebc9cb http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.org/spar/datacite/PersonalIdentifier
N434ebc633e8040ed9eb68b3c089ebb2b http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.org/spar/datacite/Identifi

More information on how to navigate graphs is available [here](https://rdflib.readthedocs.io/en/stable/intro_to_graphs.html).