## Populate an RDF database

This notebook reports the main steps to download CSV files, process them and create an RDF dataset from them accordingly to an ontology. 

To measure execution time in Jupyter notebooks: <code>pip install ipython-autotime</code>

In [1]:
# required libraries
import pandas as pd
import os
import ast
import unicodedata
import re
from pathlib import Path

In [2]:
# parameters and URLs
path = str(Path(os.path.abspath(os.getcwd())).parent.absolute())
grammyCategoriesUrl = path + '/ProvaPython/GrammyCategories.csv'


# saving folder
savePath =  path + '/ProvaPython/rdf/'

## Grammy

In [3]:
# Load the CSV files in memory
categories = pd.read_csv(grammyCategoriesUrl, sep=',')

print(categories)

# Converte la colonna 'Sub Genres' da stringa a una lista Python
categories['Sub Genres'] = categories['Sub Genres'].apply(ast.literal_eval)

print(categories.head())

# Verifica i tipi di dati del DataFrame
print(categories.dtypes)

                                          Macro Genre  \
0                              Pop & Dance/Electronic   
1                           Rock, Metal & Alternative   
2                       R&B, Rap & Spoken Word Poetry   
3   Jazz, Traditional Pop, Contemporary Instrument...   
4                            Country & American Roots   
5                     Gospel & Contemporary Christian   
6   Latin, Global, Reggae & New Age, Ambient or Chant   
7   Children's, Comedy, Audio Books, Visual Media ...   
8                         Package, Notes & Historical   
9   Production, Engineering, Composition & Arrange...   
10                                          Classical   

                                           Sub Genres  
0   ['Best Pop Solo Performance', 'Best Pop Duo/Gr...  
1   ['Best Rock Performance', 'Best Metal Performa...  
2   ['Best R&B Performance', 'Best Traditional R&B...  
3   ['Best Jazz Performance', 'Best Jazz Vocal Alb...  
4   ['Best Country Solo Performance

In [None]:
def normalize_uri(name):
    # Rimuove accenti e caratteri speciali
    name = unicodedata.normalize('NFKD', name).encode('ASCII', 'ignore').decode('ASCII')
    name = name.replace(" ", "-")
    name = name.replace(",", "").replace("'", "")
    return name

We need to install <code>RDFLib</code>

<code>pip3 install rdflib </code> [Documentation](https://rdflib.readthedocs.io/en/stable/gettingstarted.html)

In [5]:
# Load the required libraries
from rdflib import Graph, Literal, RDF, URIRef, Namespace
# rdflib knows about some namespaces, like FOAF
from rdflib.namespace import FOAF, XSD, SKOS, RDFS



In [None]:
# Construct the country and the movie ontology namespaces not known by RDFlib
ME = Namespace("http://www.dei.unipd.it/~gdb/ontology/melody#")

#create the graph
g = Graph()

# Bind the namespaces to a prefix for more readable output
g.bind("xsd", XSD)
g.bind("mel", ME)
g.bind("skos", SKOS)
g.bind("rdfs", RDFS)


In [None]:
SCHEME_URI = URIRef(ME.GrammyCategorySchema)
#  individuo GrammyCategorySchema type ConceptScheme
g.add((SCHEME_URI, RDF.type, SKOS.ConceptScheme))


for _, row in categories.iterrows():
    macro_genre = row['Macro Genre']
    sub_genres = row['Sub Genres']

    # URI del macrogenere
    macro_genre_uri = URIRef(ME + normalize_uri(macro_genre))
    print(f"Macro Genre URI: {macro_genre_uri}")
    
    # Aggiungi il macrogenere come SKOS Concept
    g.add((macro_genre_uri, RDF.type, SKOS.Concept))
    g.add((macro_genre_uri, RDF.type, ME.GrammyCategory))
    g.add((macro_genre_uri, RDFS.label, Literal(macro_genre, lang="en")))
    g.add((macro_genre_uri, SKOS.inScheme, SCHEME_URI))
    

    # Aggiungi ciascun sottogenere come SKOS Concept e GrammyCategory
    for sub_genre in sub_genres:
        sub_genre_uri = URIRef(ME + normalize_uri(sub_genre))
        print(f"Sub-Genre URI: {sub_genre_uri}")
        g.add((sub_genre_uri, RDF.type, SKOS.Concept))
        g.add((sub_genre_uri, RDF.type, URIRef(ME.GrammyCategory)))
        g.add((sub_genre_uri, RDFS.label, Literal(sub_genre, lang="en")))
        g.add((sub_genre_uri, SKOS.broader, macro_genre_uri))

Macro Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#Pop-&-Dance/Electronic
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#Best-Pop-Solo-Performance
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#Best-Pop-Duo/Group-Performance
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#Best-Pop-Vocal-Album
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#Best-Dance/Electronic-Recording
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#Best-Dance-Pop-Recording
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#Best-Dance/Electronic-Album
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#Best-Remixed-Recording
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#Best-Female-Pop-Vocal-Performance
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#Best-Male-Pop-Vocal-Performance
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#Best-Pop-Performance-by-a-Duo-or-Group-with-Vocals
Sub-Genre UR

In [8]:
g.serialize(destination="GrammyCategories.ttl", format="turtle")
print("Creato file Turtle globale: GrammyCategories.ttl")

Creato file Turtle globale: GrammyCategories.ttl
