In [1]:
import json

In [2]:
with open('nomenclature_all.json') as json_file:
    chenhall = json.load(json_file)

In [3]:
len(chenhall)

14922

Taking a look at a "random" entry -- #42.

In [4]:
random_item = chenhall[42]
random_item

{'@context': {'nomo': 'http://nomenclature.info/nom/ontology/',
  'skos': 'http://www.w3.org/2004/02/skos/core#',
  'skos-xl': 'http://www.w3.org/2008/05/skos-xl#',
  'dct': 'http://purl.org/dc/terms/',
  'dc': 'http://purl.org/dc/elements/1.1/',
  'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
  'foaf': 'http://xmlns.com/foaf/0.1/',
  'owl': 'http://www.w3.org/2002/07/owl#',
  'cs': 'http://purl.org/vocab/changeset/schema'},
 '@id': 'http://nomenclature.info/nom/14888',
 '@type': ['http://www.w3.org/2004/02/skos/core#Concept'],
 'dc:identifier': [{'@type': 'http://www.w3.org/2001/XMLSchema#string',
   '@value': '14888'}],
 'skos:prefLabel': [{'@language': 'en', '@value': 'Pinwheel'},
  {'@language': 'fr', '@value': 'Virevent'}],
 'nomo:English-Term-Contributor': [{'@language': 'en',
   '@value': 'American Association for State and Local History (AASLH)'},
  {'@language': 'fr',
   '@value': 'American Association for State and Local History (AASLH)'}],
 'nomo:French-Term-Contribu

We know that the fully-formed "Chair" example has the id "http://nomenclature.info/nom/1090", so loop through the entire dataset and check each one to see if it has this id. Print out the index if it matches.

In [5]:
for index, item in enumerate(chenhall):
    item_id = item['@id']
    if item_id == 'http://nomenclature.info/nom/1090':
        print(index)

13843


Now that we know the index of chair, print out the contents of that item.

In [6]:
chair = chenhall[13843]
chair

{'@context': {'nomo': 'http://nomenclature.info/nom/ontology/',
  'skos': 'http://www.w3.org/2004/02/skos/core#',
  'skos-xl': 'http://www.w3.org/2008/05/skos-xl#',
  'dct': 'http://purl.org/dc/terms/',
  'dc': 'http://purl.org/dc/elements/1.1/',
  'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
  'foaf': 'http://xmlns.com/foaf/0.1/',
  'owl': 'http://www.w3.org/2002/07/owl#',
  'cs': 'http://purl.org/vocab/changeset/schema'},
 '@id': 'http://nomenclature.info/nom/1090',
 '@type': ['http://www.w3.org/2004/02/skos/core#Concept'],
 'dc:identifier': [{'@type': 'http://www.w3.org/2001/XMLSchema#string',
   '@value': '1090'}],
 'skos:prefLabel': [{'@language': 'en', '@value': 'Chair'},
  {'@language': 'es', '@value': 'Silla'},
  {'@language': 'fr', '@value': 'Chaise'}],
 'skos:hiddenLabel': [{'@language': 'fr', '@value': 'siège'}],
 'nomo:Definition-Source': [{'@language': 'en',
   '@value': 'Parks Canada Descriptive and Visual Dictionary of Objects'},
  {'@language': 'fr',
   '@value

It looks like the "Other references" are listed in the "skos:exactMatch" section of the item.

In [7]:
chair_links = chair['skos:exactMatch']
chair_links

[{'@id': 'http://fr.dbpedia.org/resource/Chaise'},
 {'@id': 'http://dbpedia.org/resource/Chair'},
 {'@id': 'http://vocab.getty.edu/aat/300037772'},
 {'@id': 'http://data.culture.fr/thesaurus/resource/ark:/67717/T69-33'},
 {'@id': 'http://www.wikidata.org/entity/Q15026'},
 {'@id': 'https://d-nb.info/gnd/4058247-4'},
 {'@id': 'https://data.bnf.fr/ark:/12148/cb12467435t'}]

Split each URL by back-slashes, so that we can pull out just the domain.

In [8]:
for link in chair_links:
    url = link['@id']
    url_split = url.split('/')
    url_domain = url_split[2]
    print(url, url_domain)

http://fr.dbpedia.org/resource/Chaise fr.dbpedia.org
http://dbpedia.org/resource/Chair dbpedia.org
http://vocab.getty.edu/aat/300037772 vocab.getty.edu
http://data.culture.fr/thesaurus/resource/ark:/67717/T69-33 data.culture.fr
http://www.wikidata.org/entity/Q15026 www.wikidata.org
https://d-nb.info/gnd/4058247-4 d-nb.info
https://data.bnf.fr/ark:/12148/cb12467435t data.bnf.fr


Now that we're able to pull out the domains for the chair example, run through all items in the list and do the same thing. I created a "domain_count" dictionary that will keep track of the number of times that each domain is encountered.

In [9]:
domain_count = {}
for item in chenhall:
    if 'skos:exactMatch' in item:
        item_links = item['skos:exactMatch']
        for link in item_links:
            url = link['@id']
            url_split = url.split('/')
            url_domain = url_split[2]
            if url_domain not in domain_count:
                domain_count[url_domain] = 1
            else:
                domain_count[url_domain] += 1
print(domain_count)

{'vocab.getty.edu': 14795, 'fr.dbpedia.org': 1, 'dbpedia.org': 1, 'data.culture.fr': 1, 'www.wikidata.org': 1, 'd-nb.info': 1, 'data.bnf.fr': 1}


***Uh oh, it looks like the "chair" example is literally the only item in the entire dataset that has any outside reference other than AAT. Is this true?***