# rdf_demo

This notebook takes a sample EPC domestic certificates CSV file and converts it to RDF data, using the information in the 'certificates.csv-metadata.json' CSVW metadata file.

## Import packages

This includes using the [`csvw_functions`](https://github.com/stevenkfirth/csvw_functions) package for working with CSVW files.

In [1]:
import csvw_functions
import rdflib
import json

## View the metadata file

This is a CSVW table metadata file. It refers to both the CSV file and to the 'epc_domestic_certificates-schema-metadata.json' CSVW schema metadata file.

The CSV file 'certificates.csv' is a sample of five EPC domestic certificates.

In [2]:
with open('certificates.csv-metadata.json') as f:
    display(json.load(f))

{'$schema': 'https://raw.githubusercontent.com/stevenkfirth/csvw_metadata_json_schema/main/schema_files/table_description.schema.json',
 '@context': 'http://www.w3.org/ns/csvw',
 'tableSchema': '../epc_domestic_certificates-schema-metadata.json',
 'url': 'certificates.csv',
 '@type': 'Table'}

## Convert the CSV data to RDF data.

This is a two stage process:
1. The CSV data and CSVW metadata are combined using the `create_annotated_table_group` function.
2. The RDF data is created using the `create_rdf` function.

The first three RDF triples are displayed.

In [3]:
annotated_table_group_dict=\
    csvw_functions.create_annotated_table_group(
        'certificates.csv-metadata.json'
    )
rdf_ntriples=csvw_functions.create_rdf(annotated_table_group_dict,mode='minimal')
rdf_ntriples.split('\n')[:3]



['<http://purl.org/berg/epc_data#Certificate_63ff1a3a9c341ac8c4bc081160d572dfc98e7fa98dc5244d3620cd22ada45045> <http://purl.org/berg/epc_domestic_vocab#LMK_KEY> "63ff1a3a9c341ac8c4bc081160d572dfc98e7fa98dc5244d3620cd22ada45045"^^<http://www.w3.org/2001/XMLSchema#string> .',
 '<http://purl.org/berg/epc_data#Certificate_63ff1a3a9c341ac8c4bc081160d572dfc98e7fa98dc5244d3620cd22ada45045> <http://purl.org/berg/epc_domestic_vocab#ADDRESS1> "__REMOVED__"^^<http://www.w3.org/2001/XMLSchema#string> .',
 '<http://purl.org/berg/epc_data#Certificate_63ff1a3a9c341ac8c4bc081160d572dfc98e7fa98dc5244d3620cd22ada45045> <http://purl.org/berg/epc_domestic_vocab#ADDRESS2> "__REMOVED__"^^<http://www.w3.org/2001/XMLSchema#string> .']

**NOTE:** The warnings seen here are valid.

The first refers to the '$schema' property in the CSVW table metadata file which is used to help write the fiel contents but which isn't part of the CSVW specification.

The remaining errors highlight some issues with data types in the data - specifically some of the columns with given datatypes of integer appear to contain non-integer (decimal) values.

## Convert the RDF triples to Turtle format

This converts the RDF into the easier-to-read Turtle format.

This is displayed and saves to a file 'certificates.ttl'.


In [4]:
g=rdflib.Graph().parse(data=rdf_ntriples, format='ntriples')
g.bind('sosa',rdflib.SOSA)
g.bind('epc_data',rdflib.URIRef('http://purl.org/berg/epc_data#'))
g.bind('epc_domestic_vocab',rdflib.URIRef('http://purl.org/berg/epc_domestic_vocab#'))
g.serialize('certificates.ttl',format='ttl')
print(g.serialize(format='ttl'))

@prefix epc_data: <http://purl.org/berg/epc_data#> .
@prefix epc_domestic_vocab: <http://purl.org/berg/epc_domestic_vocab#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

epc_data:Certificate_63ff1a3a9c341ac8c4bc081160d572dfc98e7fa98dc5244d3620cd22ada45045 epc_domestic_vocab:ADDRESS "__REMOVED__"^^xsd:string ;
    epc_domestic_vocab:ADDRESS1 "__REMOVED__"^^xsd:string ;
    epc_domestic_vocab:ADDRESS2 "__REMOVED__"^^xsd:string ;
    epc_domestic_vocab:ADDRESS3 "__REMOVED__"^^xsd:string ;
    epc_domestic_vocab:BUILDING_REFERENCE_NUMBER "10002887813"^^xsd:string ;
    epc_domestic_vocab:BUILT_FORM "Semi-Detached"^^xsd:string ;
    epc_domestic_vocab:CO2_EMISSIONS_CURRENT 2.9 ;
    epc_domestic_vocab:CO2_EMISSIONS_POTENTIAL 0.9 ;
    epc_domestic_vocab:CO2_EMISS_CURR_PER_FLOOR_AREA 38.0 ;
    epc_domestic_vocab:CONSTITUENCY "E14000625"^^xsd:string ;
    epc_domestic_vocab:CONSTITUENCY_LABEL "Charnwood"^^xsd:string ;
    epc_domestic_vocab:CONSTRUCTION_AGE_BAND "England and Wales: 1

This demonstrates the conversion of the CSV data to valid RDF data.

The definitions of the terms in the 'epc_domestic_vocab' namespace are given in the [epc_domestic_vocab.ttl](../epc_domestic_vocab.ttl) file.

