# Knowledge Graph construction and query with extracted software metadata

This notebook first generates a knowledge graph from the information extracted about software repositories. It is later queried to assess the good practices followed by the extracted repositories.

In [1]:
import morph_kgc
import pyoxigraph

## KG Construction
The knowledge graph is generated using Morph-KGC, that uses RML mappings to transform the JSON file into RDF. This tool requires some configuration parameters, where we indicate the desired output serialisation and the name and path to the RML mapping file. Then, the kg is generated and stored as a oxigraph store in the variable `graph`, that it is also saved as a `.nq` file.

In [2]:
config = """
             [CONFIGURATION]
             output_format=N-QUADS
             
             [SOMEF-json]
             mappings=../mappings/mapping-somef-star.ttl
         """

In [3]:
graph = morph_kgc.materialize_oxigraph(config)

INFO | 2023-07-03 15:52:54,517 | 145 mapping rules retrieved.
INFO | 2023-07-03 15:52:54,526 | Mappings processed in 1.346 seconds.
INFO | 2023-07-03 15:53:03,752 | Number of triples generated in total: 16278.


In [4]:
graph.add(pyoxigraph.Quad(
    pyoxigraph.NamedNode('https://w3id.org/okn/i/graph/20230628'),
    pyoxigraph.NamedNode('http://purl.org/dc/terms/created'),
    pyoxigraph.Literal('2023-06-28 00:00:00', datatype=pyoxigraph.NamedNode('http://www.w3.org/2001/XMLSchema#dateTime')),
    pyoxigraph.NamedNode('https://w3id.org/okn/i/graph/default')))
graph.add(pyoxigraph.Quad(
    pyoxigraph.NamedNode('https://w3id.org/okn/i/graph/20230628'),
    pyoxigraph.NamedNode('http://www.w3.org/ns/prov#wasAttributedTo'),
    pyoxigraph.Literal('SOftware Metadata Extraction Framework (SOMEF)', datatype=pyoxigraph.NamedNode('http://www.w3.org/2001/XMLSchema#string')),
    pyoxigraph.NamedNode('https://w3id.org/okn/i/graph/default')))

In [35]:
type(graph)

oxigraph.Store

In [5]:
with open('/Users/aiglesias/GitHub/oeg-software-graph/data/somef-kg.nq', 'w') as result:
    result.write(str(graph))

## KG querying - FAIRness assessment

In [52]:
query = """
            PREFIX sd: <https://w3id.org/okn/o/sd#>
            
            SELECT DISTINCT ?s ?p ?o
            WHERE {
                GRAPH <https://w3id.org/okn/i/graph/default> {?s ?p ?o}
            }
"""

In [53]:
q_res = graph.query(query)

for solution in q_res:
    print(solution['s'],solution['p'],solution['o'])


<https://w3id.org/okn/i/graph/20230628> <http://www.w3.org/ns/prov#wasAttributedTo> "SOftware Metadata Extraction Framework (SOMEF)"
<https://w3id.org/okn/i/graph/20230628> <http://purl.org/dc/terms/created> "2023-06-28 00:00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
