## Populate an RDF database

This notebook reports the main steps to download CSV files, process them and create an RDF dataset from them accordingly to an ontology. 

To measure execution time in Jupyter notebooks: <code>pip install ipython-autotime</code>

In [63]:
# required libraries
import pandas as pd
import os
import ast
import unicodedata
import re
from pathlib import Path

In [64]:
# parameters and URLs
path = str(Path(os.path.abspath(os.getcwd())).parent.parent.absolute())
print(path)
grammyCategoriesUrl = path + '/csv/GrammyCategoriesUppercase.csv'


# saving folder
savePath = path + '\\PopulateRDFdb\\PopulateGrammyCategories\\'

c:\Users\fgall\Desktop\MELODY


## Grammy

In [65]:
# Load the CSV files in memory
categories = pd.read_csv(grammyCategoriesUrl, sep=',')

print(categories)

# Converte la colonna 'Sub Genres' da stringa a una lista Python
categories['Sub Genres'] = categories['Sub Genres'].apply(ast.literal_eval)

print(categories.head())

# Verifica i tipi di dati del DataFrame
print(categories.dtypes)

                                          Macro Genre  \
0                                           OfTheYear   
1                                  PopDanceElectronic   
2                                RockMetalAlternative   
3                              RnBRapSpokenWordPoetry   
4   JazzTraditionalPopContemporaryInstrumentalMusi...   
5                                CountryAmericanRoots   
6                         GospelContemporaryChristian   
7                 LatinGlobalReggaeNewAgeAmbientChant   
8   ChildrensComedyAudioBooksVisualMediaMusicVideo...   
9                              PackageNotesHistorical   
10        ProductionEngineeringCompositionArrangement   
11                                          Classical   
12                                              Other   

                                           Sub Genres  
0   ['ProducerOfTheYearClassical', 'BestNewArtist'...  
1   ['BestMalePopVocalPerformance', 'BestTradition...  
2   ['BestRockSong', 'BestUrban/a

In [66]:
def normalize_uri(name):
    # Rimuove accenti e caratteri speciali
    name = unicodedata.normalize('NFKD', name).encode('ASCII', 'ignore').decode('ASCII')
    name = name.replace(" ", "-")
    name = name.replace(",", "").replace("'", "")
    return name

We need to install <code>RDFLib</code>

<code>pip3 install rdflib </code> [Documentation](https://rdflib.readthedocs.io/en/stable/gettingstarted.html)

In [67]:
# Load the required libraries
from rdflib import Graph, Literal, RDF, URIRef, Namespace
# rdflib knows about some namespaces, like FOAF
from rdflib.namespace import FOAF, XSD, SKOS, RDFS



In [68]:
# Construct the country and the movie ontology namespaces not known by RDFlib
ME = Namespace("http://www.dei.unipd.it/~gdb/ontology/melody#")

#create the graph
g = Graph()

# Bind the namespaces to a prefix for more readable output
g.bind("xsd", XSD)
g.bind("mel", ME)
g.bind("skos", SKOS)
g.bind("rdfs", RDFS)


In [69]:
SCHEME_URI = URIRef(ME.GrammyCategorySchema)
#  individuo GrammyCategorySchema type ConceptScheme
g.add((SCHEME_URI, RDF.type, SKOS.ConceptScheme))


for _, row in categories.iterrows():
    macro_genre = row['Macro Genre']
    sub_genres = row['Sub Genres']

    # URI del macrogenere
    macro_genre_uri = URIRef(ME + normalize_uri(macro_genre))
    #print(macro_genre)
    print(f"Macro Genre URI: {macro_genre_uri}")
    
    # Aggiungi il macrogenere come SKOS Concept
    g.add((macro_genre_uri, RDF.type, SKOS.Concept))
    g.add((macro_genre_uri, RDF.type, ME.GrammyCategory))
    g.add((macro_genre_uri, RDFS.label, Literal(macro_genre, lang="en")))
    g.add((macro_genre_uri, SKOS.inScheme, SCHEME_URI))
    

    # Aggiungi ciascun sottogenere come SKOS Concept e GrammyCategory
    for sub_genre in sub_genres:
        sub_genre_uri = URIRef(ME + normalize_uri(sub_genre))
        print(f"Sub-Genre URI: {sub_genre_uri}")
        g.add((sub_genre_uri, RDF.type, SKOS.Concept))
        g.add((sub_genre_uri, RDF.type, URIRef(ME.GrammyCategory)))
        g.add((sub_genre_uri, RDFS.label, Literal(sub_genre, lang="en")))
        g.add((sub_genre_uri, SKOS.broader, macro_genre_uri))

Macro Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#OfTheYear
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#ProducerOfTheYearClassical
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#BestNewArtist
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#AlbumOfTheYear
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#AlbumOfTheYearClassical
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#SongOfTheYear
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#RecordOfTheYear
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#ProducerOfTheYearNonClassical
Macro Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#PopDanceElectronic
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#BestMalePopVocalPerformance
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#BestTraditionalPopVocalAlbum
Sub-Genre URI: http://www.dei.unipd.it/~gdb/ontology/melody#BestFemalePopVocalPerformance
Sub-Genre URI: http:

In [70]:
g.serialize(destination= savePath + "GrammyCategoriesUppercase.ttl", format="turtle")
print("Creato file Turtle globale: GrammyCategoriesUppercase.ttl")

Creato file Turtle globale: GrammyCategoriesUppercase.ttl
