Given a list of labels in a CSV file, this notebook creates, from each label, a SKOS concept with the given label as pref label. The notebook expects that the CSV file has a single row header, and one label per row, in the first column. The concepts are created in the specified namespace, with the row number of the concept as its ID. The pref label is given the specified language tag. The resulting RDF is exported as a turtle file.

In [None]:
from rdflib.namespace import SKOS, RDF
from rdflib import Graph, URIRef, Literal
import csv
from rdflib.namespace import Namespace, NamespaceManager

inputFilename = "example.csv"  # name of CSV file containing the labels
outputFilename = "output.ttl"  # name of turtle file where the RDF output will be written to
chosenNamespacePath = "http://www.example.org/"  # namespace in which the concepts will be created. The IDs will start with this
prefix = "ex"  # prefix for the namespace. This makes the generated RDF more readable, as e.g. the concept http://www.example.org/1 is written as ex:1
languageTag = "en"  # language tag for the labels, e.g. 'en' for English, 'nl' for Dutch. See https://en.wikipedia.org/wiki/IETF_language_tag

In [None]:
with open(inputFilename, 'r', encoding="utf-8") as csvfile:
    reader = csv.reader(csvfile, delimiter='#')

    graph = Graph()
        
    chosenNamespace = Namespace(chosenNamespacePath)
    namespaceManager = NamespaceManager(Graph())
    namespaceManager.bind(prefix, chosenNamespace)
    namespaceManager.bind("skos", SKOS)
    namespaceManager.bind("rdf", RDF) 

    graph.namespace_manager = namespaceManager
    
    i = 0
    print("converting...")
    for row in reader:
        if i > 0:
            #print("row %s"%i)  # delete the '#' before 'print' if you want to keep track of which row is being processed
            if len(row) > 0 and row[0]:
                graph.add((URIRef(chosenNamespacePath + str(i)), RDF.type, SKOS.Concept))
                graph.add((URIRef(chosenNamespacePath + str(i)), SKOS.prefLabel, Literal(row[0], lang=languageTag)))
        i+=1
    print("done")
    
    graph.serialize(outputFilename, format="turtle")