# GDMA Project
Author: Julian Schelb (1069967)

In [2]:
from neo4j import GraphDatabase
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

### Connection to the database instance

In [3]:
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "subatomic-shrank-Respond"))
database_name = "cddb"
session = driver.session(database = database_name)

***

### Task 2: Cypher Queries
Write the Cypher code required for the following queries.

**Query 1: Given an artist name, e.g. 'radiohead', list all the albums of that artist.**


In [3]:
query = """
MATCH (ar:Artist)-[r:CONTRIBUTED_TO]->(ab:Album)
WHERE ar.artist = "radiohead"
RETURN ab.album as album
"""
        
dtf_data = pd.DataFrame([dict(_) for _ in session.run(query)])
dtf_data

Unnamed: 0,album
0,there there [single]
1,toronto live 06-06
2,"there, there"
3,the bends
4,hail to the thief
5,you can scream
6,ok computer
7,kid a
8,street spirit (fade out); (2 meter sessionssin...


***

**Query 2: Given a song name, e.g. 'family reunion', list the artist and album name of CDs released in 1999 which contain that song.**

In [4]:
query = """
MATCH (s:Song)<-[r:CONTAINS]-(c:CD)-[r2:CONTAINS]->(ab:Album)
MATCH (c)-[r3:CONTAINS]->(ar:Artist)
WHERE s.song = "family reunion" AND c.ayear = 1999
RETURN s.song as song, c.ayear as year, collect(ab.album) as albums, collect(ar.artist) as artists
"""
        
dtf_data = pd.DataFrame([dict(_) for _ in session.run(query)])
dtf_data

Unnamed: 0,song,year,albums,artists
0,family reunion,1999,[what's my age again (australia maxi-single);],[blink 182]


***

**Query 3: Given a track number, e.g. 12, show (non-duplicated) artist, song, and album names of songs that are located on that track on a cd.**

In [5]:
query = """
MATCH (s:Song)<-[r:CONTAINS]-(c:CD)-[r2:CONTAINS]->(ab:Album)
MATCH (c)-[r3:CONTAINS]->(ar:Artist)
WHERE r.track = 12
RETURN r.track, 
collect(distinct s.song) as songs,
collect(distinct ab.album) as albums,
collect(distinct ar.artist) as artists  
"""
        
dtf_data = pd.DataFrame([dict(_) for _ in session.run(query)])
dtf_data

Unnamed: 0,r.track,songs,albums,artists
0,12,"[the love of my man, walk over god's heaven, o...","[love songs, the gospel soul of etta james, en...","[etta james, fats domino, tenacious d, jimmy r..."


***

**Query 4:  Augment a list of all artists with self-titled albums (i.e., an album with the same name as the artist who releases it). Artists with no self-titled albums have to be included with null as album.**

In [11]:
query = """
MATCH (ar:Artist)
OPTIONAL MATCH (ar)-[r:CONTRIBUTED_TO]->(ab:Album)
WHERE ar.artist = ab.album
RETURN DISTINCT ar.artist, 
CASE collect(ab.album) WHEN [] THEN null ELSE collect(ab.album) END as albums
"""
        
dtf_data = pd.DataFrame([dict(_) for _ in session.run(query)])
dtf_data

Unnamed: 0,ar.artist,albums
0,etta james,
1,fats domino,[fats domino]
2,tenacious d,[tenacious d]
3,jimmy reed,
4,fats waller,
...,...,...
61170,camelia ashbach,
61171,shearwater,
61172,steve macdonald,
61173,danielle ahart,


***

**Query 5: Find how many albums (not CDs) have been issued each year in the 20th
century (years 19XY)?**

In [7]:
query = """
MATCH (c:CD)-[r:CONTAINS]->(ab:Album)
WHERE 1900 <= c.ayear <=1999 
RETURN DISTINCT c.ayear as year, count(ab) as albums 
ORDER BY year DESC
"""
        
dtf_data = pd.DataFrame([dict(_) for _ in session.run(query)])
dtf_data

Unnamed: 0,year,albums
0,1999,6961
1,1998,6138
2,1997,5559
3,1996,5349
4,1995,5133
...,...,...
93,1904,4
94,1903,1
95,1902,1
96,1901,7


***

**Query 6: Find all artists that have released at least 5 albums that are associated
with the genre ’rock’.**

In [8]:
query = """
MATCH (ar:Artist)-[r:CONTRIBUTED_TO]->(ab:Album),
(g:Genre)<-[r2:BELONGS_TO]-(cd:CD)-[r3:CONTAINS]->(ab)
WHERE g.genre = "rock"
WITH DISTINCT ar.artist as artist, count(DISTINCT ab.id) as albums_released
WHERE albums_released >= 5
RETURN artist, albums_released
ORDER BY albums_released DESC
"""
        
dtf_data = pd.DataFrame([dict(_) for _ in session.run(query)])
dtf_data

Unnamed: 0,artist,albums_released
0,queen,30
1,jimi hendrix,30
2,eric clapton,26
3,u2,23
4,b'z,22
...,...,...
147,godiego,5
148,guns n' roses,5
149,tmn,5
150,the police,5
