## Spotify Query

In this notebook we perform some query on our property graph about Spotify. In particular the queries will be divided into three parts:
1. ***Example queries***: we perform two example queries where we show how it is possibile to use the new added information such as *record label* and *instruments*
1. ***Italian tracks and Italian artists from 2017 to 2020:*** we perfom some queries about italian tracks and artists present in the TOP 100 Italy.
1. ***Italian tracks abroad:*** we want to discover if italian tracks are listened also outside Italy.

In [1]:
# required libraries
import pandas as pd
import os
from pathlib import Path
import datetime

### Connection to Neo4j

In [2]:
# Neo4J params class
class Neo4jParams:
  def __init__(self, user, psw,dbname,db_psw,uri):
    self.user = user
    self.psw = psw
    self.dbname = dbname
    self.dbpsw = dbpsw
    self.uri = uri

In [3]:
#DB parameters
user="neo4j"
psw="neo4j"
dbname="SpotifyDB"
dbpsw="SpotifyDB"
uri = "bolt://localhost:7687"

params = Neo4jParams(user,psw,dbname,dbpsw,uri)

In [4]:
from neo4j import GraphDatabase

# test class

class Driver:

    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        self.driver.close()

    def print_greeting(self, message):
        with self.driver.session() as session:
            greeting = session.write_transaction(self._create_and_return_greeting, message)
            print(greeting)

    @staticmethod
    def _create_and_return_greeting(tx, message):
        result = tx.run("CREATE (a:Greeting) "
                        "SET a.message = $message "
                        "RETURN a.message + ', from node ' + id(a)", message=message)
        return result.single()[0]


if __name__ == "__main__":
    greeter = Driver("bolt://localhost:7687", "neo4j", "SpotifyDB")
    greeter.print_greeting("hello, world")
    greeter.close()

hello, world, from node 44364


## Queries

#### Example query
1. Show artists of the same discographic house

2. Show the most common played instrument in rock groups

#### Italian tracks and Italian artists from 2017 to 2020
1. On average how many tracks from italian artists are present in Top 100 for each year. (Grafico a barre)

1. Who is the Artist with the highest number of tracks present in Top 100 Italy for each year. (Nomi degli artisti)

1. How many different italian artist enter at least once in Top 100 Italy for each Year (Grafico a barre)

1. On average how many tracks from italian artists through the different months (Grafico a linea)

1. Show the top 3 artists with more tracks in Top 100 at the same time (nomi artisti)

1. Show the youngest artist who entered in the first 10 positions of Top 100 Italy for each year


#### Italian tracks abroad
1. How many tracks from italian artist are present in a Top 100 of a different Country (grafico barre)

1. Show the top 3 countries that listen the most to italiang tracks (show Names and numbers)

1. Who is the artist with more tracks ppresent in a Top 100 of a different Country (show Name)

### Example Queries

#### Query 1 

#### Query 2

### Italian Tracks and Italian Artists

#### On average how many tracks from italian artists are present in Top 100 for each year. (Grafico a barre)

In [5]:
# On average how many tracks from italian artists are present in Top 100 for each year. (Grafico a barre)

# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
    MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist)-[:partecipateIn]->(t:Track)-[:isPositionedIn]->(ch:Chart)-[:isReferredTo]->(c2:Country{id:"IT"})
    WITH ch,ch.date.year AS year, COUNT(DISTINCT t) as numTracks
    RETURN year,avg(numTracks)
    ORDER BY year
""")

for r in result:
    returnedData = r.values()
    print("Year: {}".format(returnedData[0]))
    print("avgNumItalianTracks: {}".format(returnedData[1]))
    print("")

session.close()
driver.close()

Year: 2017
avgNumItalianTracks: 31.11320754716981

Year: 2018
avgNumItalianTracks: 59.01923076923076

Year: 2019
avgNumItalianTracks: 66.80769230769234

Year: 2020
avgNumItalianTracks: 61.45454545454546



#### How many tracks were released in Italy from 2017 to 2020

In [6]:
# How many tracks were produced in Italy from 2017 to 2020

# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
    MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist)-[:partecipateIn]->(t:Track)-[:isPartOf]->(alb:Album)
    WHERE alb.releaseDate.year >= 2017
    WITH alb.releaseDate.year AS year, COUNT(DISTINCT t) as numTracks
    RETURN year,numTracks
    ORDER BY year
""")

for r in result:
    returnedData = r.values()
    print("Year: {}".format(returnedData[0]))
    print("numItalianTracks: {}".format(returnedData[1]))
    print("")

session.close()
driver.close()


Year: 2017
numItalianTracks: 171

Year: 2018
numItalianTracks: 324

Year: 2019
numItalianTracks: 330

Year: 2020
numItalianTracks: 310



#### How many different italian artist enter at least once in Top 100 Italy for each Year (Grafico a barre)

In [7]:
# How many different italian artist enter at least once in Top 100 Italy for each Year 

# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
    MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist)-[:partecipateIn]->(t:Track)-[:isPositionedIn]->(ch:Chart)-[:isReferredTo]->(c2:Country{id:"IT"})
    WITH ch.date.year AS year, COUNT(DISTINCT a) as numArtists
    RETURN year,numArtists
    ORDER BY year
""")

for r in result:
    returnedData = r.values()
    print("Year: {}".format(returnedData[0]))
    print("numItalianArtist: {}".format(returnedData[1]))
    print("")

session.close()
driver.close()


Year: 2017
numItalianArtist: 75

Year: 2018
numItalianArtist: 99

Year: 2019
numItalianArtist: 112

Year: 2020
numItalianArtist: 125



#### Ratio between #numItalianTracks / #numItalianArtist 

In [8]:
years = [2017,2018,2019,2020]
numItalianTracks = [171,324,330,310]
numItalianArtists = [75,99,112,125]

for i in range(0,len(years)):
    print("Year: {}".format(years[i]))
    print("Ratio: {:.2f}".format(numItalianTracks[i]/numItalianArtists[i]))
    print("")

Year: 2017
Ratio: 2.28

Year: 2018
Ratio: 3.27

Year: 2019
Ratio: 2.95

Year: 2020
Ratio: 2.48



#### On average how many tracks from italian artists through the different months (Grafico a linea)

In [9]:
# How many different italian artist enter at least once in Top 100 Italy for each Year 

# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
    MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist)-[:partecipateIn]->(t:Track)-[:isPositionedIn]->(ch:Chart)-[:isReferredTo]->(c2:Country{id:"IT"})
    WITH ch,ch.date.month AS month, COUNT(DISTINCT t) as numTracks
    RETURN month,avg(numTracks)
    ORDER BY month
""")

for r in result:
    returnedData = r.values()
    print("Year: {}".format(returnedData[0]))
    print("numItalianArtist: {}".format(returnedData[1]))
    print("")

session.close()
driver.close()


Year: 1
numItalianArtist: 46.58823529411764

Year: 2
numItalianArtist: 52.31249999999999

Year: 3
numItalianArtist: 51.611111111111114

Year: 4
numItalianArtist: 47.555555555555564

Year: 5
numItalianArtist: 52.35294117647059

Year: 6
numItalianArtist: 54.64705882352941

Year: 7
numItalianArtist: 56.833333333333336

Year: 8
numItalianArtist: 54.294117647058826

Year: 9
numItalianArtist: 57.22222222222222

Year: 10
numItalianArtist: 59.58823529411764

Year: 11
numItalianArtist: 61.230769230769226

Year: 12
numItalianArtist: 58.53333333333333



#### Who is the Artist with the highest number of tracks present in Top 100 Italy for each year. (Nomi degli artisti)


In [10]:
# Who is the Artist with the highest number of tracks present in Top 100 Italy for each year. (Nomi degli artisti)
# Show the top 3 artists with more tracks in Top 100 at the same time (nomi artisti)

# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
    MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist)-[:partecipateIn]->(t:Track)-[:isPositionedIn]->(ch:Chart)-[:isReferredTo]->(c2:Country{id:"IT"})
    WITH a,ch.date.year AS year, COUNT(DISTINCT t) as numTracks
    ORDER BY numTracks DESC
    WITH year,COLLECT(a) AS artists, COLLECT(numTracks) as orderedNumTracks
    RETURN DISTINCT year,artists[0],orderedNumTracks[0]
    ORDER BY year
""")

for r in result:
    returnedData = r.values()
    print("Year: {}".format(returnedData[0]))
    print("Artist: {}".format(returnedData[1]["name"]))
    print("numTracks: {}".format(returnedData[2]))
    print("")
session.close()
driver.close()



Year: 2017
Artist: Guè
numTracks: 20

Year: 2018
Artist: Gemitaiz
numTracks: 28

Year: 2019
Artist: MadMan
numTracks: 30

Year: 2020
Artist: tha Supreme
numTracks: 35



####  Show the top 3 artists with more tracks in Top 100 at the same time (nomi artisti)

In [11]:
# Show the top 3 artists with more tracks in Top 100 at the same time (nomi artisti)

# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
    MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist)-[:partecipateIn]->(t:Track)-[:isPositionedIn]->(ch:Chart)-[:isReferredTo]->(c2:Country{id:"IT"})
    WITH a,ch, COUNT(DISTINCT t) as numTracks
    ORDER BY numTracks DESC
    WITH a,COLLECT(ch) AS charts, COLLECT(numTracks) as orderedNumTracks
    RETURN DISTINCT a,charts[0],orderedNumTracks[0]
    LIMIT 3
""")

for r in result:
    returnedData = r.values()
    print("Artist: {}".format(returnedData[0]["name"]))
    print("Chart: {}".format(returnedData[1]["id"]))
    print("numTracks: {}".format(returnedData[2]))
    print("")
session.close()
driver.close()



Artist: tha Supreme
Chart: top-100-IT-2019-11-17
numTracks: 24

Artist: Marracash
Chart: top-100-IT-2019-11-17
numTracks: 19

Artist: Ultimo
Chart: top-100-IT-2019-04-07
numTracks: 18



#### Show the youngest artist who entered in the first 10 positions of Top 100 Italy for each year

In [12]:
#Show the youngest artist who entered in the first 20 positions of Top 100 Italy for each year

# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
    MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist)-[:partecipateIn]->(t:Track)-[r:isPositionedIn]->(ch:Chart)-[:isReferredTo]->(c2:Country{id:"IT"})
    WHERE r.position <=10 AND p.birthDate IS NOT NULL
    WITH ch,a,p
    ORDER BY p.birthDate DESC
    WITH ch.date.year AS year, COLLECT(a.name) AS artistsNames, COLLECT(p.birthDate) AS artistsBirthDates
    RETURN year,artistsNames[0],artistsBirthDates[0]
""")

for r in result:
    returnedData = r.values()
    print("Year: {}".format(returnedData[0]))
    print("Artist: {}".format(returnedData[1]))
    print("BirthDate: {}".format(str(returnedData[2])))
    print("")
session.close()
driver.close()

Year: 2020
Artist: Rondodasosa
BirthDate: 2002-04-29

Year: 2018
Artist: Martina Attili
BirthDate: 2001-07-11

Year: 2019
Artist: tha Supreme
BirthDate: 2001-03-17

Year: 2017
Artist: Måneskin
BirthDate: 2001-01-18



### Italian tracks abroad

#### How many tracks from italian artist are present in a Top 100 of a different Country (grafico barre)

In [13]:
#How many tracks from italian artist are present in a Top 100 of a different Country (grafico barre)
# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
    MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist)-[:partecipateIn]->(t:Track)-[:isPositionedIn]->(ch:Chart)-[:isReferredTo]->(c2:Country)
    WHERE c2.id<>"IT"
    WITH ch.date.year AS year, COUNT(DISTINCT t) AS numItalianTracks
    RETURN year,numItalianTracks
    ORDER BY year
""")

for r in result:
    returnedData = r.values()
    print("Year: {}".format(returnedData[0]))
    print("numItalianTracks: {}".format(returnedData[1]))
    print("")
session.close()
driver.close()

Year: 2017
numItalianTracks: 24

Year: 2018
numItalianTracks: 39

Year: 2019
numItalianTracks: 34

Year: 2020
numItalianTracks: 37



#### Show the top 5 countries that listen the most to italiang tracks (show Names and numbers)

In [15]:
#How many tracks from italian artist are present in a Top 100 of a different Country (grafico barre)
# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
    MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist)-[:partecipateIn]->(t:Track)-[:isPositionedIn]->(ch:Chart)-[:isReferredTo]->(c2:Country)
    WHERE c2.id<>"IT"
    RETURN ch.name, COUNT(DISTINCT t) AS numItalianTracks
    ORDER BY numItalianTracks DESC
    LIMIT 5
""")

for r in result:
    returnedData = r.values()
    print("Chart: {}".format(returnedData[0]))
    print("numItalianTracks: {}".format(returnedData[1]))
    print("")
session.close()
driver.close()

Chart: TOP 100 Denmark
numItalianTracks: 51

Chart: TOP 100 Switzerland
numItalianTracks: 26

Chart: TOP 100 Norway
numItalianTracks: 17

Chart: TOP 100 Austria
numItalianTracks: 10

Chart: TOP 100 France
numItalianTracks: 10



#### Who is the artist present in the highest number of different countries

In [18]:
# Who is the artist present in the highest number of different countries
# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
    MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist)-[:partecipateIn]->(t:Track)-[:isPositionedIn]->(ch:Chart)-[:isReferredTo]->(c2:Country)
    WHERE c2.id<>"IT"
    RETURN a, COUNT(DISTINCT c2) AS numCountries, COLLECT(DISTINCT ch.name) AS charts
    ORDER BY numCountries DESC
    LIMIT 1
""")

for r in result:
    returnedData = r.values()
    print("Artist: {}".format(returnedData[0]["name"]))
    print("numCountries: {}".format(returnedData[1]))
    print("Countries: {}".format(returnedData[2]))
    print("")
session.close()
driver.close()

Artist: Gigi D'Agostino
numCountries: 30
Countries: ['TOP 100 Brazil', 'TOP 100 Poland', 'TOP 100 Switzerland', 'TOP 100 Belgium', 'TOP 100 Norway', 'TOP 100 Australia', 'TOP 100 Taiwan', 'TOP 100 Mexico', 'TOP 100 Germany', 'TOP 100 Ireland', 'TOP 100 Singapore', 'TOP 100 Sweden', 'TOP 100 Finland', 'TOP 100 USA', 'TOP 100 Portugal', 'TOP 100 France', 'TOP 100 Austria', 'TOP 100 New Zealand', 'TOP 100 UK', 'TOP 100 Malaysia', 'TOP 100 Netherlands', 'TOP 100 Spain', 'TOP 100 Turkey', 'TOP 100 Canada', 'TOP 100 Philippines', 'TOP 100 Denmark', 'TOP 100 Indonesia', 'TOP 100 Costa Rica', 'TOP 100 Ecuador', 'TOP 100 Colombia']



Who is the artist with more tracks present in a Top 100 of a different Country (show Name)

In [19]:
# Who is the artist with more tracks present in a Top 100 of a different Country (show Name)

# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist)-[:partecipateIn]->(t:Track)-[:isPositionedIn]->(ch:Chart)-[:isReferredTo]->(c2:Country)
WHERE c2.id<>"IT"
RETURN a, COUNT(DISTINCT t) AS numTracks
ORDER BY numTracks DESC
LIMIT 5
""")

for r in result:
    returnedData = r.values()
    print("Artist: {}".format(returnedData[0]["name"]))
    print("numItalianTracks: {}".format(returnedData[1]))
    print("")
session.close()
driver.close()

Artist: NODE
numItalianTracks: 46

Artist: Sfera Ebbasta
numItalianTracks: 11

Artist: Morgan Sulele
numItalianTracks: 8

Artist: Mahmood
numItalianTracks: 3

Artist: Muti
numItalianTracks: 3



In [None]:
# Who is the artist with more tracks present in a Top 100 of a different Country (show Name)

# connect to the DB
driver = GraphDatabase.driver(params.uri, auth=(params.user, params.dbpsw))
# create a session
session = driver.session()

result = session.run("""
    MATCH (c1:Country{id:"IT"})<-[:hasNationality]-(p:Person)-[:isMemberOf]->(a:Artist{name:"NODE"})-[:partecipateIn]->(t:Track)-[:isPositionedIn]->(ch:Chart)-[:isReferredTo]->(c2:Country)
    WHERE c2.id<>"IT"
    RETURN a.name,ch.name, COUNT(DISTINCT t) AS numItalianTracks
    ORDER BY numItalianTracks DESC
""")

for r in result:
    returnedData = r.values()
    print("Artist: {}".format(returnedData[0]["name"]))
    print("numItalianTracks: {}".format(returnedData[1]))
    print("")
session.close()
driver.close()


