#Converting AHRC Data to RDF

2016.11.25

##1. Find each entity in the JSON file and assign it a variable name in Python

Use safeJSON to import the file and convert it to a Python object (I originally used json but found that it was difficult to deal with values that do not exist - safeJSON resolves this by replacing any nonexistent values with SafeNone).

In [457]:
import safeJSON
from pprint import pprint

with open('json_files/file19.json') as data_file:    
    data = safeJSON.load(data_file)

pprint(data)

{'ad': {'csvSurveyEnabled': False,
        'lastRefreshDate': '03 October 2016',
        'popUpAskAgainGapDays': 14,
        'popUpWaitSeconds': 30,
        'surveyEnabled': True},
 'fba': {},
 'feedbackAnswers': {},
 'headerData': {'csvSurveyEnabled': False,
                'lastRefreshDate': '03 October 2016',
                'popUpAskAgainGapDays': 14,
                'popUpWaitSeconds': 30,
                'surveyEnabled': True},
 'pagedListHolder': {'firstElementOnPage': 0,
                     'firstLinkedPage': 0,
                     'firstPage': True,
                     'lastElementOnPage': 4,
                     'lastLinkedPage': 0,
                     'lastPage': True,
                     'maxLinkedPages': 10,
                     'nrOfElements': 5,
                     'page': 0,
                     'pageCount': 1,
                     'pageList': [{'author': [{'id': '14f219700de6ba0cf22951ceafa277b8',
                                               'otherNames': 'Fürs

###Project URI

Identify the element for the project's ID (projectOverview.projectComposition.project.id), assign it to the variable project_id and print the value:

In [458]:
project_id = data['projectOverview']['projectComposition']['project']['id']
project_id

'EB8275BC-FF3C-4A89-8A50-043C60520F29'

Construct the project's URI:

In [459]:
ahproject_base_uri = 'http://data.open.ac.uk/meta/ontology/ahproject/'
project_uri = ahproject_base_uri + 'project/' + project_id
project_uri

'http://data.open.ac.uk/meta/ontology/ahproject/project/EB8275BC-FF3C-4A89-8A50-043C60520F29'

###Terms directly linked to Project

####Title

Identify the element for the project's title (projectOverview.projectComposition.project.title), assign it to the variable project_title and print the value:

In [460]:
project_title = data['projectOverview']['projectComposition']['project']['title']
project_title

'Dropping-Out of Socialism: Alternative Life-Styles in the Socialist Bloc 1960-1990'

####Status

Identify the element for the project's status (projectOverview.projectComposition.project.status), assign it to the variable project_status and print the value:

In [461]:
project_status = data['projectOverview']['projectComposition']['project']['status']
project_status

'Closed'

####Abstract

Identify the element for the project's abstract (projectOverview.projectComposition.project.abstract), assign it to the variable project_abstract and print the value:

In [462]:
project_abstract = data['projectOverview']['projectComposition']['project']['abstractText']
project_abstract

"The project 'Dropping out of Socialism' is a study of the hidden side of life in the former Soviet bloc. It investigates those cultures and life-styles under socialism which literally 'dropped out' of the picture - both out of the picture that was portrayed by the Eastern European communist regimes to its own public and the West and the picture that has since been created by historians and political scientists looking at the last decades of communism in Europe. 'Dropping out of Socialism' is devoted to the historical documentation and interpretation of social phenomena on the margins of socialist society: beatniks, hippies, punks, trampers, new-agers, hobos and any other group of people who had decided to ignore rather than comply with the official socialist code of behaviour and participation.\\n\\nMost people express surprise when told that such cultures existed at all under the repressive conditions of the Eastern European communist regimes. Indeed, even in Eastern Europe itself li

####URL

Identify the element for the project's URL (projectOverview.projectComposition.project.url), assign it to the variable project_url and print the value:

In [463]:
project_url = data['projectOverview']['projectComposition']['project']['url']
project_url

'http://gtr.rcuk.ac.uk:80/projects?ref=AH%2FI002502%2F1'

####Potential impact

Identify the element for the project's potential impact (projectOverview.projectComposition.project.potentialImpactText), assign it to the variable project_potential_impact and print the value:

In [464]:
project_potential_impact = data['projectOverview']['projectComposition']['project']['potentialImpactText']
project_potential_impact

"Youth policy makers\\nThis project is all about dealing with societal drop-outs and their culture. Most of our subjects will have been young or youngish when they decided to leave mainstream society and culture. Their study and evaluation necessarily involves understanding their motivations and actions. While the project examines socialist societies, which demanded a large degree of conformity, and explores a time that is thirty years in the past, many of its findings will be of value to modern-day policy makers and youth specialists. Generational conflict, rebellion against mainstream culture, rejection of commercialism, the search for higher meaning in communitarian movements, itinerant life-styles etc. still provide the hallmarks of many non-conformist youth cultures today.\\n\\nTeachers and Youth Workers\\nA project whose primary goal is not to 'solve' the youth problem, but understand young drop-outs and alternatives can have great educational and socially integrative benefits. L

###Subject and Topic Keywords

Define the list variables 'research_topics' and 'research_subjects':

In [465]:
research_topics = data['projectOverview']['projectComposition']['project']['researchTopic']
research_topics

[{'id': '78037', 'percentage': 0, 'text': 'Cultural History'}]

In [466]:
research_subjects = data['projectOverview']['projectComposition']['project']['researchSubject']
research_subjects


[{'id': '127553', 'percentage': 0, 'text': 'History'}]

We can then iterate through these lists in a for loop and convert the results to RDF, as shown in section 2, below.

###Fund and related terms

####Fund

Construct a URI for 'fund':

In [467]:
fund_uri = project_uri + '#fund'
fund_uri

'http://data.open.ac.uk/meta/ontology/ahproject/project/EB8275BC-FF3C-4A89-8A50-043C60520F29#fund'

Identify the element for the fund type (projectOverview.projectComposition.project.fund.type), assign it to the variable fund_type and print the value:

In [468]:
fund_type = data['projectOverview']['projectComposition']['project']['fund']['type']
fund_type

'INCOME_ACTUAL'

####Grant reference

Identify the element for the project's grant reference (projectOverview.projectComposition.project.grantReference), assign it to the variable grant_reference and print the value:

In [469]:
grant_reference = data['projectOverview']['projectComposition']['project']['grantReference']
grant_reference

'AH/I002502/1'

####Grant category

Identify the element for the project's grant reference (projectOverview.projectComposition.project.grantCategory), assign it to the variable grant_category and print the value:

In [470]:
grant_category = data['projectOverview']['projectComposition']['project']['grantCategory']
grant_category

'Research Grant'

####Start date

Identify the element for the project's start date (projectOverview.projectComposition.project.fund.start), assign it to the variable fund_start and print the value:

In [471]:
fund_start = data['projectOverview']['projectComposition']['project']['fund']['start']
fund_start

'2011-07-01'

In [472]:
type(fund_start)

str

Convert from string to datetime object:

In [473]:
from datetime import datetime
fund_start_datetime = datetime.strptime(fund_start,'%Y-%m-%d')
fund_start_datetime

datetime.datetime(2011, 7, 1, 0, 0)

####End date

Identify the element for the project's end date (projectOverview.projectComposition.project.fund.end), assign it to the variable fund_end and print the value:

In [474]:
fund_end = data['projectOverview']['projectComposition']['project']['fund']['end']
fund_end


'2015-01-31'

In [475]:
type(fund_end)

str

Convert from string to datetime object:

In [476]:
from datetime import datetime
fund_end_datetime = datetime.strptime(fund_end,'%Y-%m-%d')
fund_end_datetime

datetime.datetime(2015, 1, 31, 0, 0)

###Funder and related terms

####Funder

Construct a URI for 'funder':

Identify the element for the project's funder ID (projectOverview.projectComposition.project.fund.funder.id), assign it to the variable funder_id and print the value:

In [477]:
funder_id = data['projectOverview']['projectComposition']['project']['fund']['funder']['id']
funder_id

'1291772D-DFCE-493A-AEE7-24F7EEAFE0E9'

In [478]:
funder_uri = ahproject_base_uri + 'organisation/' + funder_id
funder_uri

'http://data.open.ac.uk/meta/ontology/ahproject/organisation/1291772D-DFCE-493A-AEE7-24F7EEAFE0E9'

####Funder name

Identify the element for the project's funder name (projectOverview.projectComposition.project.fund.funder.name), assign it to the variable funder_name and print the value:

In [479]:
funder_name = data['projectOverview']['projectComposition']['project']['fund']['funder']['name']
funder_name

'AHRC'

####Funder URL

Identify the element for the project's funder URL (projectOverview.projectComposition.project.fund.funder.url), assign it to the variable funder_url and print the value:

In [480]:
funder_url = data['projectOverview']['projectComposition']['project']['fund']['funder']['url']
funder_url

'http://gtr.rcuk.ac.uk:80/organisation/1291772D-DFCE-493A-AEE7-24F7EEAFE0E9'

###Lead Research Organisation and related terms

####Lead Research Organisation

Construct a URI for 'Lead Research Organisation':

In [481]:
lead_research_org_id = data['projectOverview']['projectComposition']['leadResearchOrganisation']['id']
lead_research_org_id

'FD94FDDE-5BBC-4E12-911F-0B70DBCFA743'

In [482]:
lead_research_org_uri = ahproject_base_uri + 'organisation/' + lead_research_org_id
lead_research_org_uri

'http://data.open.ac.uk/meta/ontology/ahproject/organisation/FD94FDDE-5BBC-4E12-911F-0B70DBCFA743'

####Lead Research Organisation Name

Identify the element for the lead research organisation's name (projectOverview.projectComposition.leadResearchOrganization.name), assign it to the variable lead_research_org_name and print the value:

In [483]:
lead_research_org_name = data['projectOverview']['projectComposition']['leadResearchOrganisation']['name']
lead_research_org_name

'University of Bristol'

####Department

Identify the element for the department (projectOverview.projectComposition.leadResearchOrganization.department), assign it to the variable lead_research_org_dept and print the value:

In [484]:
lead_research_org_dept = data['projectOverview']['projectComposition']['leadResearchOrganisation']['department']
lead_research_org_dept

'School of Humanities'

####Lead Research Organisation Type

Identify the element for the type of lead research organisation (projectOverview.projectComposition.leadResearchOrganization.typeInd), assign it to the variable lead_research_org_type and print the value:

In [485]:
lead_research_org_type = data['projectOverview']['projectComposition']['leadResearchOrganisation']['typeInd']
lead_research_org_type

'RO'

####Lead Research Organisation URL

Identify the element for the lead research organisation's URL (projectOverview.projectComposition.leadResearchOrganization.url), assign it to the variable lead_research_org_url and print the value:

In [486]:
lead_research_org_url = data['projectOverview']['projectComposition']['leadResearchOrganisation']['url']
lead_research_org_url

'http://gtr.rcuk.ac.uk:80/organisation/FD94FDDE-5BBC-4E12-911F-0B70DBCFA743'

####Lead Research Organisation Address

Construct the URI for the lead research organisation's address:

In [487]:
lead_research_org_address_uri = lead_research_org_uri + '#address'
lead_research_org_address_uri

'http://data.open.ac.uk/meta/ontology/ahproject/organisation/FD94FDDE-5BBC-4E12-911F-0B70DBCFA743#address'

#####Address Lines 1-5

Identify the elements for lines 1-5 of the lead research organisation's address (projectOverview.projectComposition.leadResearchOrganization.address.line1-projectOverview.projectComposition.leadResearchOrganization.address.line5), assigning them to the variables lead_research_org_address_line1-lead_research_org_address_line5 and printing the values:

In [488]:
lead_research_org_address_line1 = data['projectOverview']['projectComposition']['leadResearchOrganisation']['address']['line1']
lead_research_org_address_line2 = data['projectOverview']['projectComposition']['leadResearchOrganisation']['address']['line2']
lead_research_org_address_line3 = data['projectOverview']['projectComposition']['leadResearchOrganisation']['address']['line3']
lead_research_org_address_line4 = data['projectOverview']['projectComposition']['leadResearchOrganisation']['address']['line4']
lead_research_org_address_line5 = data['projectOverview']['projectComposition']['leadResearchOrganisation']['address']['line5']

print('Line 1: %s' % lead_research_org_address_line1)
print('Line 2: %s' % lead_research_org_address_line2)
print('Line 3: %s' % lead_research_org_address_line3)
print('Line 4: %s' % lead_research_org_address_line4)
print('Line 5: %s' % lead_research_org_address_line5)

Line 1: University of Bristol
Senate House
Tyndall Avenue
Line 2: SafeNone
Line 3: SafeNone
Line 4: SafeNone
Line 5: SafeNone


In [489]:
type(lead_research_org_address_line2)

safeJSON.SafeNoneClass

Construct a string that concatenates all existing values:

In [490]:
lead_research_org_address_lines = ''
if (type(lead_research_org_address_line1) != safeJSON.SafeNoneClass):
    lead_research_org_address_lines = lead_research_org_address_lines + lead_research_org_address_line1
    if ((type(lead_research_org_address_line2) != safeJSON.SafeNoneClass) or (type(lead_research_org_address_line3) != safeJSON.SafeNoneClass) or (type(lead_research_org_address_line4) != safeJSON.SafeNoneClass) or (type(lead_research_org_address_line5) != safeJSON.SafeNoneClass)):
        lead_research_org_address_lines = lead_research_org_address_lines + ', '
if (type(lead_research_org_address_line2) != safeJSON.SafeNoneClass):
    lead_research_org_address_lines = lead_research_org_address_lines + lead_research_org_address_line2
    if ((type(lead_research_org_address_line3) != safeJSON.SafeNoneClass) or (type(lead_research_org_address_line4) != safeJSON.SafeNoneClass) or (type(lead_research_org_address_line5) != safeJSON.SafeNoneClass)):
        lead_research_org_address_lines = lead_research_org_address_lines + ', '
if (type(lead_research_org_address_line3) != safeJSON.SafeNoneClass):
    lead_research_org_address_lines = lead_research_org_address_lines + lead_research_org_address_line3
    if ((type(lead_research_org_address_line4) != safeJSON.SafeNoneClass) or (type(lead_research_org_address_line5) != safeJSON.SafeNoneClass)):
        lead_research_org_address_lines = lead_research_org_address_lines + ', '
if (type(lead_research_org_address_line4) != safeJSON.SafeNoneClass):
    lead_research_org_address_lines = lead_research_org_address_lines + lead_research_org_address_line4
    if (type(lead_research_org_address_line5) != safeJSON.SafeNoneClass):
        lead_research_org_address_lines = lead_research_org_address_lines + ', '
if (type(lead_research_org_address_line5) != safeJSON.SafeNoneClass):
    lead_research_org_address_lines = lead_research_org_address_lines + lead_research_org_address_line5
lead_research_org_address_lines

'University of Bristol\r\nSenate House\r\nTyndall Avenue'

In [491]:
lead_research_org_address_lines = lead_research_org_address_lines.replace("\r\n", ", ")
lead_research_org_address_lines

'University of Bristol, Senate House, Tyndall Avenue'

#####Postcode

Identify the element for the lead research organisation's postcode (projectOverview.projectComposition.leadResearchOrganization.address.postCode), assign it to the variable lead_research_org_postcode and print the value:

In [492]:
lead_research_org_postcode = data['projectOverview']['projectComposition']['leadResearchOrganisation']['address']['postCode']
lead_research_org_postcode

'BS8 1TH'

#####Region

Identify the element for the lead research organisation's region (projectOverview.projectComposition.leadResearchOrganization.address.region), assign it to the variable lead_research_org_region and print the value:

In [493]:
lead_research_org_region = data['projectOverview']['projectComposition']['leadResearchOrganisation']['address']['region']
lead_research_org_region

'South West'

#####Country

Identify the element for the lead research organisation's country (projectOverview.projectComposition.leadResearchOrganization.address.country), assign it to the variable lead_research_org_country and print the value:

In [494]:
lead_research_org_country = data['projectOverview']['projectComposition']['leadResearchOrganisation']['address']['country']
lead_research_org_country

SafeNone

###Person and related terms

Define the 'people' object:

In [495]:
people = data['projectOverview']['projectComposition']['personRole']
people

[{'firstName': 'Josie',
  'id': '7729A3BA-8B99-4DBA-A577-FF76802F2226',
  'role': [{'name': 'CO_INVESTIGATOR'}],
  'surname': 'McLellan',
  'url': 'http://gtr.rcuk.ac.uk:80/person/7729A3BA-8B99-4DBA-A577-FF76802F2226'},
 {'firstName': 'Juliane Christiane Angelika',
  'id': '2EE07735-86D6-485D-B530-DB03D951E316',
  'role': [{'name': 'PRINCIPAL_INVESTIGATOR'}],
  'surname': 'Furst',
  'url': 'http://gtr.rcuk.ac.uk:80/person/2EE07735-86D6-485D-B530-DB03D951E316'}]

In [496]:
for person in people:
    person_role = person['role']
person_role

[{'name': 'PRINCIPAL_INVESTIGATOR'}]

In [497]:
for role in person_role:
    person_role_name = role['name']
person_role_name

'PRINCIPAL_INVESTIGATOR'

We can then iterate through this object in a for loop and convert the results to RDF, as shown in section 2, below.

###Publication and related terms

Define the 'publications' object:

In [498]:
publications = data['projectOverview']['projectComposition']['project']['publication']
publications

[{'author': [{'id': '6e2b4917d74658949c7636ab43a87b39',
    'otherNames': 'McLellan J',
    'url': 'http://gtr.rcuk.ac.uk:80/person/6e2b4917d74658949c7636ab43a87b39'}],
  'date': '2012-01-01',
  'firstAuthorName': 'McLellan J',
  'id': 'B4EDCF26-08D9-4192-9625-2C52633EA845',
  'parentPublicationTitle': 'History Workshop Journal',
  'title': 'Glad to be Gay Behind the Wall: Gay and Lesbian Activism in 1970s East Germany',
  'url': 'http://gtr.rcuk.ac.uk:80/publication/B4EDCF26-08D9-4192-9625-2C52633EA845'},
 {'author': [{'id': '7135b2b3b18c98e2306058bde50ce08d',
    'otherNames': 'Fürst J.',
    'url': 'http://gtr.rcuk.ac.uk:80/person/7135b2b3b18c98e2306058bde50ce08d'}],
  'date': '2014-01-01',
  'firstAuthorName': 'Fürst J.',
  'id': 'D687F1A6-ED35-4F61-B976-1566BB5270E8',
  'parentPublicationTitle': 'Contemporary European History',
  'title': "Love, Peace and Rock'n Roll on Gorky Street: The 'Emotional Style' of the Soviet Hippie Community",
  'url': 'http://gtr.rcuk.ac.uk:80/publicat

In [499]:
type(publications)

safeJSON.SafeList

Where there are no publications, the value of publications is '[]'. The type of this object is always safeJSON.SafeList.

We can then iterate through this object in a for loop and convert the results to RDF, as shown in section 2, below.

###Collaborating Organisations and related terms

Define the 'collaborators' object

In [500]:
collaborators = data['projectOverview']['projectComposition']['collaborator']
collaborators

[{'id': 'E6BE1E2D-6D42-47FB-AB0A-A2FC1914A896',
  'name': 'Sciences Po',
  'url': 'http://gtr.rcuk.ac.uk:80/organisation/E6BE1E2D-6D42-47FB-AB0A-A2FC1914A896'},
 {'address': {'country': 'United Kingdom',
   'line1': 'University of Exeter',
   'line2': 'Clydesdale House',
   'line3': 'Clydesdale Road',
   'line4': 'Exeter',
   'postCode': 'EX4 4QX',
   'region': 'South West'},
  'id': '961756BF-E31F-4A13-836F-0A09BA02385C',
  'name': 'University of Exeter',
  'url': 'http://gtr.rcuk.ac.uk:80/organisation/961756BF-E31F-4A13-836F-0A09BA02385C'}]

We can then iterate through this object in a for loop and convert the results to RDF, as shown in section 2, below.

###Outputs

Define the 'outputs' object:

In [508]:
outputs = data['projectOverview']['projectComposition']['project']['output']
outputs

{'artisticAndCreativeProductOutput': [{'description': "The Estonian film maker Terje Toomistu became interested in the former Soviet hippies in Tallinn. In the course of research she met the PI, Juliane F&uuml;rst, who became a consultant and co-producer on her documentary film of the Soviet hippie community. The documentary was filmed as a part road movie, part animation, part archival footage in 2015 and relies in its historical narrative heavily on F&uuml;rst's work carried out under the aegis of the project 'Dropping out of Socialism&quot;. It is due to be completed later this year.",
   'id': '5F946473-52E8-4A06-A51A-056139D74EF6',
   'impact': 'The film has already incurred great interest in the film-making community and been taken on by the television station Arte for future dissemination. The crowd-funding campaign as well as publicity for the film online has already raised awareness of the topic among an interested audience worldwide.',
   'title': "Documentary 'Soviet Hippies

In [509]:
research_database_model = data['projectOverview']['projectComposition']['project']['output']['researchDatabaseAndModelOutput']
research_database_model

[{'description': "This data base contains a large collection of photos relating to the research project 'Dropping out of Socialism', which have been acquired and scanned by the Wende Museum. The database is designed to be open to further additions and will support the Wende Museum LA in its educational work.",
  'id': '9F1EE7D6-E5EB-4486-95CE-422CE6A20EDC',
  'impact': 'This is the first time that a large number of hitherto inaccessible and unknown photographic material of little known aspects of Soviet life will be made available to the wider public and in an easily navigable format with English titles and explanations.',
  'title': "Digital Photo Archive 'Dropping out of Socialism'",
  'type': 'Database/Collection of data'}]

In [521]:
for output_category in outputs:
    print(outputs[output_category])
    print(type(outputs[output_category]))
    #output_id = outputs[output_category]['id']
    #if (type(output_id) != safeJSON.SafeNoneClass):
        #print(output_id)

[]
<class 'safeJSON.SafeList'>
[]
<class 'safeJSON.SafeList'>
[{'title': "Digital Photo Archive 'Dropping out of Socialism'", 'description': "This data base contains a large collection of photos relating to the research project 'Dropping out of Socialism', which have been acquired and scanned by the Wende Museum. The database is designed to be open to further additions and will support the Wende Museum LA in its educational work.", 'type': 'Database/Collection of data', 'impact': 'This is the first time that a large number of hitherto inaccessible and unknown photographic material of little known aspects of Soviet life will be made available to the wider public and in an easily navigable format with English titles and explanations.', 'id': '9F1EE7D6-E5EB-4486-95CE-422CE6A20EDC'}]
<class 'safeJSON.SafeList'>
[]
<class 'safeJSON.SafeList'>
[{'title': "Documentary 'Soviet Hippies' by Terje Toomistu", 'description': "The Estonian film maker Terje Toomistu became interested in the former So

As the above shows, the JSON syntax for keyFindingsOutput renders it as a dictionary rather than a list, presumably because only one key findings output is permitted. This means it needs to be treated differently from the other output categories.

In [542]:
for output_category in outputs:
    if (output_category != 'keyFindingsOutput'):
        output_list = outputs[output_category]
        for output in output_list:
            output_class = output_category[0].upper() + output_category[1:]
            output_class_uri_string = ahproject_base_uri + output_class
            output_id = output['id']
            output_uri_string = ahproject_base_uri + 'output/' + output_id
            output_description = output['description']
            output_title = output['title']
            output_impact = output['impact']
            output_url = output['url']
            output_type = output['type']
            output_sector = output['sector']
            output_geographic_reach = output['geographicReach']
            output_year_first_provided = output['yearFirstProvided']
            if (type(output_id) != safeJSON.SafeNoneClass):
                print(output_title)

Digital Photo Archive 'Dropping out of Socialism'
Documentary 'Soviet Hippies' by Terje Toomistu
SafeNone
SafeNone
SafeNone
SafeNone


We can then iterate through these objects in a for loop and convert the results to RDF, as shown in section 2, below.

##2. Use RDFLib to construct relationships between variables from the JSON file and external ontologies

In [502]:
import rdflib

In [549]:
from rdflib import Graph, Literal, BNode, Namespace, RDF, URIRef, XSD
from rdflib.namespace import DC, FOAF, SKOS

#Import namespaces
ahproject = Namespace('http://data.open.ac.uk/meta/ontology/ahproject#')
dataopen = Namespace('http://data.open.ac.uk/meta/ontology/')
doap = Namespace('http://usefulinc.com/ns/doap#')
fabio = Namespace('http://purl.org/spar/fabio/') 
frapo = Namespace('http://purl.org/cerif/frapo/') 
gr = Namespace('http://purl.org/goodrelations/v1#')
org = Namespace('http://www.w3.org/ns/org#')
prism = Namespace('http://prismstandard.org/namespaces/basic/2.0/')  
projectfunding = Namespace("http://vocab.ox.ac.uk/projectfunding#")
vcard = Namespace('http://www.w3.org/2006/vcard/ns#')
vivo = Namespace('http://vivoweb.org/ontology/core#')

g = Graph()
project = URIRef(project_uri)
fund = URIRef(fund_uri)
funder = URIRef(funder_uri)
lead_research_org = URIRef(lead_research_org_uri)
lead_research_org_address = URIRef(lead_research_org_address_uri)

# Add triples using store's add method.

#Terms directly linked to Project
g.add( (project, RDF.type, projectfunding.Project ))
if (type(project_url) != safeJSON.SafeNoneClass):
    g.add( (project, FOAF.homepage, Literal(project_url,datatype=XSD.string) ))
if (type(project_status) != safeJSON.SafeNoneClass):
    g.add( (project, dataopen.status, Literal(project_status,datatype=XSD.string) ))
if (type(project_title) != safeJSON.SafeNoneClass):
    g.add( (project, DC.title, Literal(project_title,datatype=XSD.string) ))
if (type(project_abstract) != safeJSON.SafeNoneClass):
    g.add( (project, DC.abstract, Literal(project_abstract,datatype=XSD.string) ))
if (type(project_potential_impact) != safeJSON.SafeNoneClass):
    g.add( (project, ahproject.potentialImpact, Literal(project_potential_impact,datatype=XSD.string) ))

#Subject and Topic keywords
for research_topic in research_topics:
    research_topic_id = research_topic['id']
    research_topic_text = research_topic['text']
    if (type(research_topic_id) != safeJSON.SafeNoneClass):
        research_topic_uri_string = ahproject_base_uri + 'topic/' + research_topic_id
        research_topic_uri = URIRef(research_topic_uri_string)
        g.add( (research_topic_uri, RDF.type, SKOS.Concept ))
        g.add( (project, DC.subject, research_topic_uri ))
        if (type(research_topic_text) != safeJSON.SafeNoneClass):
            g.add( (research_topic_uri, SKOS.prefLabel, Literal(research_topic_text,datatype=XSD.string) ))

for research_subject in research_subjects:
    research_subject_id = research_subject['id']
    research_subject_text = research_subject['text']
    if (type(research_subject_id) != safeJSON.SafeNoneClass):
        research_subject_uri_string = ahproject_base_uri + 'subject/' + research_subject_id
        research_subject_uri = URIRef(research_subject_uri_string)  
        g.add( (research_subject_uri, RDF.type, SKOS.Concept ))
        g.add( (project, DC.subject, research_subject_uri ))
        if (type(research_subject_text) != safeJSON.SafeNoneClass):
            g.add( (research_subject_uri, SKOS.prefLabel, Literal(research_subject_text,datatype=XSD.string) ))

#Fund and related terms
g.add( (fund, RDF.type, projectfunding.Funding ))
g.add( (fund, projectfunding.funds, project ))
g.add( (fund, projectfunding.grantNumber, Literal(grant_reference,datatype=XSD.string) ))
if (type(fund_type) != safeJSON.SafeNoneClass):
    g.add( (fund, gr.category, Literal(fund_type,datatype=XSD.string) ))
if (type(grant_category) != safeJSON.SafeNoneClass):
    g.add( (fund, doap.category, Literal(grant_category,datatype=XSD.string) ))
g.add( (fund, projectfunding.startDate, Literal(fund_start_datetime,datatype=XSD.dateTime) ))
g.add( (fund, projectfunding.endDate, Literal(fund_end_datetime,datatype=XSD.dateTime) ))

#Funder and related terms
g.add( (funder, RDF.type, projectfunding.FundingBody ))
g.add( (funder, projectfunding.provides, fund ))
if (type(funder_name) != safeJSON.SafeNoneClass):
    g.add( (funder, vcard.hasOrganizationName, Literal(funder_name,datatype=XSD.string) ))
if (type(funder_url) != safeJSON.SafeNoneClass):
    g.add( (funder, FOAF.homepage, Literal(funder_url,datatype=XSD.string) ))

#Lead Research Organisation and related terms
g.add( (lead_research_org, RDF.type, org.Organization ))
g.add( (lead_research_org, org.HeadOf, project ))
if (type(lead_research_org_name) != safeJSON.SafeNoneClass):
    g.add( (lead_research_org, vcard.hasOrganizationName, Literal(lead_research_org_name,datatype=XSD.string) ))
if (type(lead_research_org_dept) != safeJSON.SafeNoneClass):
    g.add( (lead_research_org, vcard.hasOrganizationUnit, Literal(lead_research_org_dept,datatype=XSD.string) ))
if (type(lead_research_org_type) != safeJSON.SafeNoneClass):
    g.add( (lead_research_org, org.classification, Literal(lead_research_org_type,datatype=XSD.string) ))
if (type(lead_research_org_url) != safeJSON.SafeNoneClass):
    g.add( (lead_research_org, FOAF.homepage, Literal(lead_research_org_url,datatype=XSD.string) ))
g.add( (lead_research_org, org.siteAddress, lead_research_org_address ))
g.add( (lead_research_org_address, RDF.type, vcard.Address ))
if (type(lead_research_org_address_lines) != safeJSON.SafeNoneClass):
    g.add( (lead_research_org_address, frapo.hasPostalAddressLine, Literal(lead_research_org_address_lines,datatype=XSD.string) ))
if (type(lead_research_org_postcode) != safeJSON.SafeNoneClass):
    g.add( (lead_research_org_address, vcard.hasPostalCode, Literal(lead_research_org_postcode,datatype=XSD.string) ))
if (type(lead_research_org_region) != safeJSON.SafeNoneClass):
    g.add( (lead_research_org_address, vcard.region, Literal(lead_research_org_region,datatype=XSD.string) ))
if (type(lead_research_org_country) != safeJSON.SafeNoneClass):
    g.add( (lead_research_org_address, vcard.hasCountryName, Literal(lead_research_org_country,datatype=XSD.string) ))

#People
for person in people:
    person_id = person['id']
    person_firstname = person['firstName']
    person_surname = person['surname']
    person_url = person['url']
    person_role = person['role']
    if (type(person_id) != safeJSON.SafeNoneClass):
        person_uri_string = ahproject_base_uri + 'person/' + person_id
        person_uri = URIRef(person_uri_string)
        for role in person_role:
            person_role_name = role['name']
            if (type(person_role_name) != safeJSON.SafeNoneClass):
                if (person_role_name == 'PRINCIPAL_INVESTIGATOR'):
                    g.add( (project, projectfunding.hasPrincipalInvestigator, person_uri ))
                else:
                    g.add( (project, projectfunding.hasCoInvestigator, person_uri ))
        g.add( (person_uri, RDF.type, FOAF.Person ))
        if (type(person_firstname) != safeJSON.SafeNoneClass):
            g.add( (person_uri, FOAF.givenName, Literal(person_firstname,datatype=XSD.string) ))
        if (type(person_surname) != safeJSON.SafeNoneClass):
            g.add( (person_uri, FOAF.familyName, Literal(person_surname,datatype=XSD.string) ))
        if (type(person_url) != safeJSON.SafeNoneClass):
            g.add( (person_uri, FOAF.homepage, Literal(person_url,datatype=XSD.string) ))

#Publications
for publication in publications:
    publication_id = publication['id']
    publication_title = publication['title']
    publication_url = publication['url']
    publication_parent = publication['parentPublicationTitle']
    publication_isbn = publication['isbn']
    publication_date_str = publication['date']
    if (type(publication_id) != safeJSON.SafeNoneClass):
        publication_uri_string = ahproject_base_uri + 'publication/' + publication_id
        publication_uri = URIRef(publication_uri_string)
        g.add( (project, frapo.hasOutput, publication_uri ))
        g.add( (publication_uri, RDF.type, DC.BibliographicResource ))
        if (type(publication_title) != safeJSON.SafeNoneClass):
            g.add( (publication_uri, DC.title, Literal(publication_title,datatype=XSD.string) ))
        if (type(publication_url) != safeJSON.SafeNoneClass):
            g.add( (publication_uri, fabio.hasURL, Literal(publication_url,datatype=XSD.string) ))
        if (type(publication_parent) != safeJSON.SafeNoneClass):
            g.add( (publication_uri, DC.isPartOf, Literal(publication_parent,datatype=XSD.string) ))
        if (type(publication_isbn) != safeJSON.SafeNoneClass):
            g.add( (publication_uri, prism.isbn, Literal(publication_isbn,datatype=XSD.string) ))
        if (type(publication_date_str) != safeJSON.SafeNoneClass):
            publication_date = datetime.strptime(publication_date_str,'%Y-%m-%d')
            g.add( (publication_uri, DC.issued, Literal(publication_date,datatype=XSD.dateTime) ))
        
#Collaborating Organisations
for collab_org in collaborators:
    collab_org_id = collab_org['id']
    collab_org_name = collab_org['name']
    collab_org_url = collab_org['url']
    collab_org_address_line1 = collab_org['address']['line1']
    collab_org_address_line2 = collab_org['address']['line2']
    collab_org_address_line3 = collab_org['address']['line3']
    collab_org_address_line4 = collab_org['address']['line4']
    collab_org_address_line5 = collab_org['address']['line5']
    collab_org_postcode = collab_org['address']['postCode']
    collab_org_region = collab_org['address']['region']
    collab_org_country = collab_org['address']['country']
    if (type(collab_org_id) != safeJSON.SafeNoneClass):
        collab_org_uri_string = ahproject_base_uri + 'organisation/' + collab_org_id
        collab_org_uri = URIRef(collab_org_uri_string)
        g.add( (project, vivo.hasCollaborator, collab_org_uri ))
        g.add( (collab_org_uri, RDF.type, org.Organization ))
        if (type(collab_org_name) != safeJSON.SafeNoneClass):
            g.add( (collab_org_uri, vcard.hasOrganizationName, Literal(collab_org_name,datatype=XSD.string) ))
        if (type(collab_org_url) != safeJSON.SafeNoneClass):
            g.add( (collab_org_uri, FOAF.homepage, Literal(collab_org_url,datatype=XSD.string) ))
        if ((type(collab_org_address_line1) != safeJSON.SafeNoneClass) or (type(collab_org_address_line2) != safeJSON.SafeNoneClass) or (type(collab_org_address_line3) != safeJSON.SafeNoneClass) or (type(collab_org_address_line4) != safeJSON.SafeNoneClass) or (type(collab_org_address_line5) != safeJSON.SafeNoneClass) or (type(collab_org_postcode) != safeJSON.SafeNoneClass) or (type(collab_org_region) != safeJSON.SafeNoneClass) or (type(collab_org_country) != safeJSON.SafeNoneClass)):
            collab_org_address_uri_string = collab_org_uri_string + '#address'
            collab_org_address_uri = URIRef(collab_org_address_uri_string)
            g.add( (collab_org_uri, org.siteAddress, collab_org_address_uri ))
            g.add( (collab_org_address_uri, RDF.type, vcard.Address ))
            if ((type(collab_org_address_line1) != safeJSON.SafeNoneClass) or (type(collab_org_address_line2) != safeJSON.SafeNoneClass) or (type(collab_org_address_line3) != safeJSON.SafeNoneClass) or (type(collab_org_address_line4) != safeJSON.SafeNoneClass) or (type(collab_org_address_line5) != safeJSON.SafeNoneClass)):
                collab_org_address_lines = ''
                if (type(collab_org_address_line1) != safeJSON.SafeNoneClass):
                    collab_org_address_lines = collab_org_address_lines + collab_org_address_line1
                    if ((type(collab_org_address_line2) != safeJSON.SafeNoneClass) or (type(collab_org_address_line3) != safeJSON.SafeNoneClass) or (type(collab_org_address_line4) != safeJSON.SafeNoneClass) or (type(collab_org_address_line5) != safeJSON.SafeNoneClass)):
                        collab_org_address_lines = collab_org_address_lines + ', '
                if (type(collab_org_address_line2) != safeJSON.SafeNoneClass):
                    collab_org_address_lines = collab_org_address_lines + collab_org_address_line2
                    if ((type(collab_org_address_line3) != safeJSON.SafeNoneClass) or (type(collab_org_address_line4) != safeJSON.SafeNoneClass) or (type(collab_org_address_line5) != safeJSON.SafeNoneClass)):
                        collab_org_address_lines = collab_org_address_lines + ', '
                if (type(collab_org_address_line3) != safeJSON.SafeNoneClass):
                    collab_org_address_lines = collab_org_address_lines + collab_org_address_line3
                    if ((type(collab_org_address_line4) != safeJSON.SafeNoneClass) or (type(collab_org_address_line5) != safeJSON.SafeNoneClass)):
                        collab_org_address_lines = collab_org_address_lines + ', '
                if (type(collab_org_address_line4) != safeJSON.SafeNoneClass):
                    collab_org_address_lines = collab_org_address_lines + collab_org_address_line4
                    if (type(collab_org_address_line5) != safeJSON.SafeNoneClass):
                        collab_org_address_lines = collab_org_address_lines + ', '
                if (type(collab_org_address_line5) != safeJSON.SafeNoneClass):
                    collab_org_address_lines = collab_org_address_lines + collab_org_address_line5
                collab_org_address_lines = collab_org_address_lines.replace("\r\n", ", ")
                g.add( (collab_org_address_uri, frapo.hasPostalAddressLine, Literal(collab_org_address_lines,datatype=XSD.string) ))
            if (type(collab_org_postcode) != safeJSON.SafeNoneClass):
                g.add( (collab_org_address_uri, vcard.hasPostalCode, Literal(collab_org_postcode,datatype=XSD.string) ))
            if (type(collab_org_region) != safeJSON.SafeNoneClass):
                g.add( (collab_org_address_uri, vcard.region, Literal(collab_org_region,datatype=XSD.string) ))
            if (type(collab_org_country) != safeJSON.SafeNoneClass):
                g.add( (collab_org_address_uri, vcard.hasCountryName, Literal(collab_org_country,datatype=XSD.string) ))

#Outputs
for output_category in outputs:
    if (output_category != 'keyFindingsOutput'):
        output_list = outputs[output_category]
        for output in output_list:
            output_id = output['id']
            if (type(output_id) != safeJSON.SafeNoneClass):
                output_uri_string = ahproject_base_uri + 'output/' + output_id
                output_uri = URIRef(output_uri_string)
                g.add( (project, frapo.hasOutput, output_uri ))
                output_class = output_category[0].upper() + output_category[1:]
                output_class_uri_string = ahproject_base_uri + output_class
                output_class_uri = URIRef(output_class_uri_string)
                g.add( (output_uri, RDF.type, output_class_uri ))
                output_description = output['description']
                if (type(output_description) != safeJSON.SafeNoneClass):
                    g.add( (output_uri, DC.description, Literal(output_description,datatype=XSD.string )))
                output_title = output['title']
                if (type(output_title) != safeJSON.SafeNoneClass):
                    g.add( (output_uri, DC.title, Literal(output_title,datatype=XSD.string )))
                output_impact = output['impact']
                if (type(output_impact) != safeJSON.SafeNoneClass):
                    g.add( (output_uri, ahproject.Impact, Literal(output_impact,datatype=XSD.string )))
                output_url = output['url']
                if (type(output_url) != safeJSON.SafeNoneClass):
                    g.add( (output_uri, FOAF.homepage, Literal(output_url,datatype=XSD.string )))
                output_type = output['type']
                if (type(output_type) != safeJSON.SafeNoneClass):
                    g.add( (output_uri, DC.type, Literal(output_type,datatype=XSD.string )))
                output_sector = output['sector']
                if (type(output_sector) != safeJSON.SafeNoneClass):
                    g.add( (output_uri, ahproject.sector, Literal(output_sector,datatype=XSD.string )))
                output_geographic_reach = output['geographicReach']
                if (type(output_geographic_reach) != safeJSON.SafeNoneClass):
                    g.add( (output_uri, ahproject.geographicReach, Literal(output_geographic_reach,datatype=XSD.string )))
                output_year_first_provided = output['yearFirstProvided']
                if (type(output_year_first_provided) != safeJSON.SafeNoneClass):
                    g.add( (output_uri, DC.available, Literal(output_year_first_provided,datatype=XSD.integer )))

# Iterate over triples in store and print them out.
print("--- printing raw triples ---")
for s, p, o in g:
    print((s, p, o))

--- printing raw triples ---
(rdflib.term.URIRef('http://data.open.ac.uk/meta/ontology/ahproject/project/EB8275BC-FF3C-4A89-8A50-043C60520F29'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/homepage'), rdflib.term.Literal('http://gtr.rcuk.ac.uk:80/projects?ref=AH%2FI002502%2F1', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))
(rdflib.term.URIRef('http://data.open.ac.uk/meta/ontology/ahproject/project/EB8275BC-FF3C-4A89-8A50-043C60520F29'), rdflib.term.URIRef('http://purl.org/cerif/frapo/hasOutput'), rdflib.term.URIRef('http://data.open.ac.uk/meta/ontology/ahproject/publication/D687F1A6-ED35-4F61-B976-1566BB5270E8'))
(rdflib.term.URIRef('http://data.open.ac.uk/meta/ontology/ahproject/organisation/1291772D-DFCE-493A-AEE7-24F7EEAFE0E9'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/homepage'), rdflib.term.Literal('http://gtr.rcuk.ac.uk:80/organisation/1291772D-DFCE-493A-AEE7-24F7EEAFE0E9', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))


Write the output to a Turtle file:

In [551]:
file = open("AHRCDataToRDF_V2_20161125.ttl", "w+b")

file.write(g.serialize(format='turtle'))

file.close()