# RDF-   NLS -  Encyclopaedia Britannica

This notebook is going to create the RDF triples to generate our RDLIB GRAPH

For each postprocess edition dataframe that we got from **Merging_EB_Terms.ipynb** (e.g. results_eb_1_edition_dataframe, results_eb_2_edition_dataframe, etc) we are going to add the information from the dataframe that we got from **Metadata_EB.ipynb** (metadata_eb_dataframe). 

The idea is to have per edition dataframe (and also supplement dataframe), all the information (which currently is splitted across several dataframes) in one. 


This notebook will store the final dataframes in results_NLS directory, and their name schema will be **final_eb_< NUM_EDITION >_dataframe**.

Per entry in these new dataframes we will have the following columns (see an example of one entry of the first edition):

- MMSID:                                              
- editionTitle:                          First edition, 1771, Volume 1, A-B
- editor:                                                  Smellie, William
- editor_date:                                                   1740-1795
- genre:                                                       encyclopedia
- language:                                                             eng
- termsOfAddress:                                                       NaN
- numberOfPages:                                                        832
- physicalDescription:               3 v., 160 plates : ill. ; 26 cm. (4to)
- place:                                                         Edinburgh
- publisher:              Printed for A. Bell and C. Macfarquhar; and so...
- referencedBy:           [Alston, R.C.  Engl. language III, 560, ESTC T...
- shelfLocator:                                                        EB.1
- editionSubTitle:        Illustrated with one hundred and sixty copperp...
- volumeTitle:            Encyclopaedia Britannica; or, A dictionary of ...
- year:                                                                1771
- volumeId:                                                       144133901
- metsXML:                                               144133901-mets.xml
- permanentURL:                            https://digital.nls.uk/144133901
- publisherPersons:                     [C. Macfarquhar, Colin Macfarquhar]
- volumeNum:                                                              1
- letters:                                                              A-B
- part:                                                                   0
- editionNum:                                                             1
- supplementTitle:                                                         
- supplementSubTitle:                                                      
- supplementsTo:                                                         []
- numberOfVolumes:                                                        6
- term:                                                                  OR
- definition:             A NEW A D I C T I A A, the name of several riv...
- relatedTerms:                                                          []
- header:                                           EncyclopaediaBritannica
- startsAt:                                                              15
- endsAt:                                                                15
- numberOfTerms:                                                         22
- numberOfWords:                                                         54
- positionPage:                                                           0
- typeTerm:                                                         Article
- altoXML:                                  144133901/alto/188082904.34.xml

### Loading the necessary libraries

In [25]:
import rdflib
from rdflib.extras.external_graph_libs import rdflib_to_networkx_multidigraph
import networkx as nx
import matplotlib.pyplot as pl
from rdflib import Graph, Namespace, Literal
from rdflib.plugins.sparql import prepareQuery

### Functions

### 1. Loading the graph

In [26]:
g = Graph()
g.parse("../../results_NLS/edition1st.ttl", format="ttl") 

<Graph identifier=Nc78759a950004b57be36919a19117826 (<class 'rdflib.graph.Graph'>)>

List all the resources with the property eb:editor

In [27]:
eb = Namespace("https://w3id.org/eb#")

q1 = prepareQuery('''
  SELECT ?Subject WHERE { 
    ?Subject eb:editor ?FullName. 
  }
  ''',
  initNs = { "eb": eb}
)


for r in g.query(q1):
      print(r.Subject)

https://w3id.org/eb/i/Edition/992277653804341
https://w3id.org/eb/i/Edition/9929192893804340


Same query but asking more information about the resources obtained. 

In [28]:
q2 = prepareQuery('''
  SELECT ?Subject ?FullName WHERE { 
    ?Subject eb:editor ?FullName.
  } 
  ''',
  initNs = { "eb": eb}
)

for r in g.query(q2):
  print(r.Subject, r.FullName)

https://w3id.org/eb/i/Edition/992277653804341 https://w3id.org/eb/i/Person/Smellie,William
https://w3id.org/eb/i/Edition/9929192893804340 https://w3id.org/eb/i/Person/Smellie,William


Same query asking for the first 10 resources with the properity eb.name

In [32]:
q2 = prepareQuery('''
  SELECT ?Subject ?FullName WHERE { 
    ?Subject eb:name ?FullName.
  } 
  ''',
  initNs = { "eb": eb}
)

cont=0
for r in g.query(q2):
    print(r.Subject, r.FullName)
    cont+=1
    if cont == 10:
        break

https://w3id.org/eb/i/Article/992277653804341_144133902_CURATE_1 CURATE
https://w3id.org/eb/i/Article/992277653804341_144133902_DRAGOMAN_1 DRAGOMAN
https://w3id.org/eb/i/Article/9929192893804340_144850367_IMAGE_1 IMAGE
https://w3id.org/eb/i/Article/9929192893804340_144850366_AVOWEE_1 AVOWEE
https://w3id.org/eb/i/Article/992277653804341_144133902_CERTHIA_1 CERTHIA
https://w3id.org/eb/i/Article/992277653804341_144133903_PARATHENAR_1 PARATHENAR
https://w3id.org/eb/i/Article/9929192893804340_144850366_BISANT_1 BISANT
https://w3id.org/eb/i/Article/992277653804341_144133901_ARCTAPELIOTES_1 ARCTAPELIOTES
https://w3id.org/eb/i/Article/9929192893804340_144850368_PROVOST_1 PROVOST
https://w3id.org/eb/i/Article/992277653804341_144133901_AUSTRAL_2 AUSTRAL


Asking for resources which name is "Smellie, Willian"

In [30]:
from rdflib import XSD
q3 = prepareQuery('''
  SELECT ?Subject WHERE { 
    ?Subject eb:name ?Family.
  } 
  ''',
    initNs = { "eb": eb}
)

for r in g.query(q3, initBindings = {'?Family' : Literal('Smellie, William', datatype=XSD.string)}):
  print(r.Subject)

https://w3id.org/eb/i/Person/Smellie,William


In [None]:
Asking for resources with name is ABACUS

In [31]:
from rdflib import XSD
q3 = prepareQuery('''
  SELECT ?Subject WHERE { 
    ?Subject eb:name ?Term.
  } 
  ''',
    initNs = { "eb": eb}
)

for r in g.query(q3, initBindings = {'?Term' : Literal('ABACUS', datatype=XSD.string)}):
  print(r.Subject)

https://w3id.org/eb/i/Article/9929192893804340_144850366_ABACUS_4
https://w3id.org/eb/i/Article/992277653804341_144133901_ABACUS_4


In [16]:
#G = rdflib_to_networkx_multidigraph(result)

# Plot Networkx instance of RDF Graph
#pos = nx.spring_layout(G, scale=2)
#edge_labels = nx.get_edge_attributes(G, 'r')
#nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
#nx.draw(G, with_labels=True)

#if not in interactive mode for 
#plt.show()
