# GDMA Project
Author: Julian Schelb (1069967)

In [2]:
from neo4j import GraphDatabase
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

### Connection to the database instance

In [3]:
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "subatomic-shrank-Respond"))
database_name = "cddb"

### Task 1: Import Data

*Task description:* In this project you are given the data of FreeDB (cddb) that contains information about music records. The data is stored in a relational database, i.e.,
PostgreSQL. You will find the sql dump file to load the data into your local
PostgresSQl at the course website. After importing the data into Postgres you
need to accomplish the following tasks

Write code or describe the operations in order to load the data into Neo4j. Note
that it is not obligatory to write SQL Code. It is totally acceptable if you decide
to export the database to CSV files and then use the CSV files to import the
data into Neo4j. However, you need to provide a sufficient amount of details on
how you modelled the data in Neo4j and elaborate on all the implementation
and design decisions you took. Note that for this task, you are not allowed to
use any external tools, e.g. the ETL tool of Neo4j.

![](Figures/cddb_as_graph-Default.png)

**Notes:**

- Relation between artist and cd? 
- Create Indexes?
- Create Contrains?
- Data Cleaning? -> Remove Linebreaks from e.g. songtitles

#### Remove all Nodes

In [5]:
query = """
match (a) delete a
"""

with driver.session(database=database_name) as session:
    result = session.run(query)
    
    
query = """
match (a) -[r] -> () delete a, r
"""

with driver.session(database=database_name) as session:
    result = session.run(query)

#### Importing Albums

In [6]:
query = """
USING PERIODIC COMMIT 1000
LOAD CSV FROM 'file:///albums_202206062116.csv' AS row FIELDTERMINATOR ','
WITH toInteger(row[0]) as albumid, row[1] as album SKIP 0 LIMIT 100000
CREATE (n:Album {id: albumid, album: album})
//RETURN albumid, album
"""

with driver.session(database=database_name) as session:
    result = session.run(query)

#### Importing Genres

In [7]:
query = """
USING PERIODIC COMMIT 1000
LOAD CSV FROM 'file:///genres_202206062119.csv' AS row FIELDTERMINATOR ','
WITH toInteger(row[0]) as genreid, row[1] as genre SKIP 0 LIMIT 150000
CREATE (n:Genre {id: genreid, genre: genre})
"""

with driver.session(database=database_name) as session:
    result = session.run(query)

#### Importing Artists

In [8]:
query = """
USING PERIODIC COMMIT 1000
LOAD CSV FROM 'file:///artists_202206062119.csv' AS row FIELDTERMINATOR ','
WITH toInteger(row[0]) as artistid, row[1] as artist SKIP 0 LIMIT 150000
CREATE (n:Artist {id: artistid, artist: artist})
"""

with driver.session(database=database_name) as session:
    result = session.run(query)

#### Importing Songs

Songs with a trailing backslash will couse problems because the closing quote will be escaped. To mitigate this, I added an extra whitespace character to those song titles.

``` sql
SELECT * 
from cddb.songs s 
where song like '%\\' 
  and song not like '%\\\\'
```

Statement to update the rows:

``` sql
UPDATE cddb.songs 
SET song = song || ' ' 
WHERE song LIKE '%\\' 
  AND song NOT LIKE '%\\\\'
```

In [10]:
query = """
USING PERIODIC COMMIT 5000
LOAD CSV FROM 'file:///songs_202206072212.csv' AS row FIELDTERMINATOR ','
WITH toInteger(row[0]) as songid, row[1] as song
CREATE (n:Song {id: songid, song: song})
"""

with driver.session(database=database_name) as session:
    result = session.run(query)