# GDMA Project
Author: Julian Schelb (1069967)

In [248]:
from neo4j import GraphDatabase
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

### Connection to the database instance

In [249]:
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "subatomic-shrank-Respond"))
database_name = "cddb"
session = driver.session(database = database_name)

### Task 4: Searching and Ranking

Implement a simple search engine that enables search by artist, album and
song name/title. The results must be ranked based on importance. It is up to
you to come up with how the importance of each result is computed and you
must justify your decision (it goes without saying that you need to come up
with a meaningful definition). However, the importance should ideally take into
account user preferences/likes. As such, this task is split in two parts:

**1. Write a Cypher query that adds a relationship :LIKES between a node with**

label :User and an artist, album, or song. Every user should be identified
just by a numerical userID (no more information is necessary). If a user
already exists in the system, no additional node should be added. After
coming up with the necessary Cypher query, add a significant number of
users and likes.

**2. Implement a simple Python function that has the following arguments:**

- the userID of the user submitting the search (the user ID may not
exist in the database),
- a string that contains one or more keywords for the search, and
- an optional argument that indicates whether the search is on all or
a specific field, i.e., artist, album, song.
The search must return exactly 10 results.

Python must only be used to call the database. You should not write any
code in Python that implements functionality necessary for the task. However,
submitting multiple queries in the same function call is allowed. Also, for this
task of the project, you are not only allowed but also encouraged to use functions
from the GDS library of Neo4j. Hence, before making any decisions, have a
careful look at the available functions. Again, you have to justify the use of any
function that you employ

#### Adding Example Users

**Function to create Example User:**

In [250]:
def createExampleUser(user_id = 1, genre = "rock", limit = 50):
    
    ####### DELETE USER NODE #######
    query1 = """
    MATCH (u:User)
    WHERE u.id = $user_id
    DETACH DELETE u
    """
    
    ####### CREATE USER NODE #######
    query2 = """
    MERGE (u:User {id:  $user_id})
    RETURN u.id as user_id
    """
    
    ####### LINK TO LIKED SONGS #######
    query3 = """
    MATCH (g:Genre)<-[r:BELONGS_TO]-(c:CD)
    MATCH (c)-[r2:CONTAINS]->(s:Song)
    WHERE g.genre = $genre
    WITH s 
    LIMIT $limit
    MATCH (u:User)
    WHERE u.id = $user_id
    MERGE (u)-[r3:LIKES]->(s)
    """
    
    ####### LINK TO LIKED ARTISTS #######
    query4 = """
    MATCH (g:Genre)<-[r:BELONGS_TO]-(c:CD)
    MATCH (c)-[r2:CONTAINS]->(t:Artist)
    WHERE g.genre = $genre
    WITH t 
    LIMIT $limit
    MATCH (u:User)
    WHERE u.id = $user_id
    MERGE (u)-[r3:LIKES]->(t)
    """
    
    ####### LINK TO LIKED ALBUMS #######
    query5 = """
    MATCH (g:Genre)<-[r:BELONGS_TO]-(c:CD)
    MATCH (c)-[r2:CONTAINS]->(t:Album)
    WHERE g.genre = $genre
    WITH t 
    LIMIT $limit
    MATCH (u:User)
    WHERE u.id = $user_id
    MERGE (u)-[r3:LIKES]->(t)
    """

    with driver.session(database = database_name) as session:
        session.run(query1, user_id = user_id)
        session.run(query2, user_id = user_id)
        session.run(query3, user_id = user_id, genre = genre, limit = limit)
        session.run(query4, user_id = user_id, genre = genre, limit = limit)
        session.run(query5, user_id = user_id, genre = genre, limit = limit)
    

**Adding some Likes to model User Preference:**

In [282]:
# User 1 likes "Rock" music
createExampleUser(user_id = 1, genre = "rock", limit = 50)
# User 2 likes "classic" music
createExampleUser(user_id = 2, genre = "classic", limit = 50)
# User 3 likes "pop" music
# createExampleUser(user_id = 3, genre = "pop", limit = 50)
# User 4 likes "hip-hop" music
# createExampleUser(user_id = 4, genre = "hip-hop", limit = 50)
# User 5 likes "hip-hop" music
# createExampleUser(user_id = 5, genre = "hard rock", limit = 50)

***

#### Implementing Search Engine

In [255]:
def createUser(user_id: int = 1):
    
    query_create_user = """
    MERGE (u:User {id:  $user_id})
    RETURN u.id as user_id
    """
    
    session.run(query_create_user, user_id = user_id)

In [256]:
createUser(100)

##### Feature 1: User Preference

Delete Projection:

In [257]:
def deletePrefProj():
    
    query_pref_delete_proj = """
    // DELETE EXISTING PROJECTION
    CALL gds.graph.drop('searchdomain_preference', false) 
    YIELD graphName 
    RETURN graphName
    """
    
    session.run(query_pref_delete_proj)

In [258]:
deletePrefProj()

Create Projection:

In [259]:
def createPrefProj(user_id: int = 1):

    query_pref_create_proj = f"""
    // CREATE NEW PROJECTION WITH SEARCH RELEVANT SUB GRAPH
    CALL gds.graph.project.cypher(
      'searchdomain_preference',
      ' // Liked Artists, Albums and Songs
        MATCH (u:User)-[:LIKES]->(n) 
        WHERE u.id = {user_id}
            AND (n:Song OR n:Album OR n:Artist) 
        RETURN id(n) AS id, labels(n) AS labels 
        LIMIT 100000

        UNION

        // CDs linked to liked Artists, Albums and Songs
        MATCH (u:User)-[:LIKES]->(x)-[:APPEARED_ON]->(n:CD) 
        WHERE u.id = {user_id}
        RETURN id(n) AS id, labels(n) AS labels 
        LIMIT 100000',

        'MATCH (u:User)-[:LIKES]->(n)
        WHERE u.id = {user_id}
        AND (n:CD OR n:Song OR n:Album OR n:Artist) 
        MATCH (n)-[r:APPEARED_ON]->(m:CD) 
        RETURN id(n) AS source, id(m) AS target, type(r) AS type 
        LIMIT 100000' 
    )
    YIELD
      graphName, nodeCount AS nodes, relationshipCount AS rels
    RETURN graphName, nodes, rels
    """

    session.run(query_pref_create_proj)

In [260]:
createPrefProj()

Calculate Centrality:

In [261]:
def calcPrefScore():
    
    # https://neo4j.com/docs/graph-data-science/current/algorithms/eigenvector-centrality/
    query_pref_calc_score = """
    CALL gds.eigenvector.mutate('searchdomain_preference',  {
      mutateProperty: 'score_eig'
    })
    YIELD centralityDistribution, nodePropertiesWritten, ranIterations
    RETURN centralityDistribution.min AS minimumScore, centralityDistribution.mean AS meanScore, nodePropertiesWritten
    """
    
    session.run(query_pref_calc_score)

In [262]:
calcPrefScore()

***

##### Feature 2: Content Match with Search Input

Delete Projection:

In [263]:
def deleteContProj():
    
    query_cont_delete_proj = """
    // DELETE EXISTING PROJECTION
    CALL gds.graph.drop('searchdomain_content', false) 
    YIELD graphName 
    RETURN graphName
    """
    
    session.run(query_cont_delete_proj)

In [264]:
deleteContProj()

Create Projection:

In [265]:
def createContProj(search_input: str = "", search_mask: str = ""):
    
    query_cont_create_proj = f"""
    // CREATE NEW PROJECTION WITH SEARCH RELEVANT SUB GRAPH
    CALL gds.graph.project.cypher(
      "searchdomain_content",

      " // Artists, Albums and Songs which match query

            CALL {{
                CALL db.index.fulltext.queryNodes('artists', '{search_input}') 
                YIELD node, score
                WHERE '{search_mask}' = 'all' or '{search_mask}' = 'artists'
                RETURN node, score
                UNION 
                CALL db.index.fulltext.queryNodes('songs', '{search_input}') 
                YIELD node, score
                WHERE '{search_mask}' = 'all' or '{search_mask}' = 'songs'
                RETURN node, score
                UNION 
                CALL db.index.fulltext.queryNodes('albums', '{search_input}') 
                YIELD node, score
                WHERE '{search_mask}' = 'all' or '{search_mask}' = 'albums'
                RETURN node, score
            }}

            WITH node, score
            ORDER BY score desc
            LIMIT 100
            MATCH (node)-[r:APPEARED_ON]->(c:CD)
            RETURN id(node) AS id, labels(node) AS labels 

        UNION

        // CDS linked to Artists, Albums and Songs which match query

            CALL {{
                CALL db.index.fulltext.queryNodes('artists', '{search_input}') 
                YIELD node, score
                WHERE '{search_mask}' = 'all' or '{search_mask}' = 'artists'
                RETURN node, score
                UNION 
                CALL db.index.fulltext.queryNodes('songs', '{search_input}') 
                YIELD node, score
                WHERE '{search_mask}' = 'all' or '{search_mask}' = 'songs'
                RETURN node, score
                UNION 
                CALL db.index.fulltext.queryNodes('albums', '{search_input}') 
                YIELD node, score
                WHERE '{search_mask}' = 'all' or '{search_mask}' = 'albums'
                RETURN node, score
            }}

            WITH node, score
            ORDER BY score desc
            LIMIT 100
            MATCH (node)-[r:APPEARED_ON]->(c:CD)
            RETURN id(c) AS id, labels(c) AS labels 
            ",

        "  CALL {{
                CALL db.index.fulltext.queryNodes('artists', '{search_input}') 
                YIELD node, score
                WHERE '{search_mask}' = 'all' or '{search_mask}' = 'artists'
                RETURN node, score
                UNION 
                CALL db.index.fulltext.queryNodes('songs', '{search_input}') 
                YIELD node, score
                WHERE '{search_mask}' = 'all' or '{search_mask}' = 'songs'
                RETURN node, score
                UNION 
                CALL db.index.fulltext.queryNodes('albums', '{search_input}') 
                YIELD node, score
                WHERE '{search_mask}' = 'all' or '{search_mask}' = 'albums'
                RETURN node, score
            }}

            WITH node, score
            ORDER BY score desc
            LIMIT 100
            MATCH (node)-[r:APPEARED_ON]->(c:CD)
            RETURN id(node) AS source, id(c) AS target, type(r) AS type 
        "
    )
    YIELD
      graphName, nodeCount AS nodes, relationshipCount AS rels
    RETURN graphName, nodes, rels
    """

    session.run(query_cont_create_proj)

In [266]:
createContProj(search_input, search_mask)

Calulcate Centrality:

In [267]:
def calcContScore():

    # https://neo4j.com/docs/graph-data-science/current/algorithms/eigenvector-centrality/
    query_cont_calc_score = """
    CALL gds.eigenvector.mutate('searchdomain_content',  {
      mutateProperty: 'score_eig'
    })
    YIELD centralityDistribution, nodePropertiesWritten, ranIterations
    RETURN centralityDistribution.min AS minimumScore, centralityDistribution.mean AS meanScore, nodePropertiesWritten
    """
    
    session.run(query_cont_calc_score)

In [268]:
calcContScore()

***

Combined Search Results:

In [273]:
def getResults():

    query_results = """
    CALL {

        CALL {
            CALL gds.eigenvector.stream('searchdomain_content')
            YIELD nodeId, score
            WHERE gds.util.asNode(nodeId):CD
            WITH gds.util.asNode(nodeId).id AS nodeId, score as score_cont
            RETURN nodeId, score_cont
        }
        RETURN nodeId, score_cont, 0 as score_pref

        UNION 

        CALL {
            CALL gds.eigenvector.stream('searchdomain_preference')
            YIELD nodeId, score
            WHERE gds.util.asNode(nodeId):CD
            WITH gds.util.asNode(nodeId).id AS nodeId, score as score_pref
            RETURN nodeId, score_pref
        }
        RETURN nodeId, 0 as score_cont, score_pref

    }

    WITH DISTINCT nodeId, count(*) as count, sum(score_cont) as score_cont, sum(score_pref) as score_pref

    MATCH (n:CD)
    WHERE n.id = nodeId
    OPTIONAL MATCH (n)-[:CONTAINS]->(ar:Artist)
    OPTIONAL MATCH (n)-[:CONTAINS]->(ab:Album)
    OPTIONAL MATCH (n)-[:CONTAINS]->(so:Song)

    RETURN nodeId, 
    count,
    score_cont, score_pref,
    collect(DISTINCT ar.artist) as artists,
    collect(DISTINCT ab.album) as albums, 
    collect(DISTINCT so.song) as songs


    ORDER BY score_cont DESC, score_pref DESC
    LIMIT 20
    """

    dtf_data = pd.DataFrame([dict(_) for _ in session.run(query_results)])
    return dtf_data


In [275]:
results = getResults()
#results.head(30)

***

### Search:

In [279]:
def searchInGraph(user_id: int = 1, search_input: str = "", search_mask: str = "all"):
    
    # Make sure user exists
    createUser(user_id)
    
    # Preference
    deletePrefProj()
    createPrefProj()
    calcPrefScore()
    
    # Match with query 
    deleteContProj()
    createContProj(search_input, search_mask)
    calcContScore()
    
    return getResults()     

In [284]:
user_id = 1
search_input = "Jimi Hendrix purple haze are you experienced"
search_mask = "all"

results = searchInGraph(user_id = 1, search_input = search_input, search_mask= search_mask)
results.head(30)

Unnamed: 0,nodeId,count,score_cont,score_pref,artists,albums,songs
0,7923,1,0.265149,0.0,[signature licks],[jimi hendrix],"[fire (verse);, foxey lady (solo, slow);, fire..."
1,162186,2,0.133564,0.034758,[jimi hendrix],[experience hendrix: the best of jimi hendrix],"[if six was nine, foxey lady, bold as love, ni..."
2,136907,2,0.133564,0.034758,[jimi hendrix],[are you experienced],"[love or confusion, remember, highway chile, m..."
3,677,2,0.133564,0.034758,[jimi hendrix],[are you experienced],"[third stone from the sun, remember, fire, can..."
4,46232,2,0.133564,0.034758,[jimi hendrix],[experience hendrix the best of jimi hendrix],"[foxey lady, dolly dagger, bold as love, if 6 ..."
5,2023,2,0.133564,0.034758,[jimi hendrix],[experience hendrix - the best of jimi hendrix],"[bold as love, night bird flying, if 6 was 9, ..."
6,35734,2,0.133564,0.034758,[jimi hendrix],[are you experienced],"[i don't live today, 51st anniversary, stone f..."
7,33321,2,0.133564,0.017759,[jimi hendrix],[astro man(alchemy); - studio outtakes 1966-68],"[purple haze 1 (4);, 51st anniversary (5);, la..."
8,30138,2,0.133564,0.017759,[jimi hendrix],[astro man box set],"[can you see me (pre unre);, purple haze (pre ..."
9,20222,2,0.133564,0.017759,[jimi hendrix],[best of jimi hendrix],"[blues, blues, free spirit, star spangled bann..."


In [283]:
user_id = 2
search_input = "Ludwig van Beethoven Für Elise Symphony"
search_mask = "all"

results = searchInGraph(user_id = 1, search_input = search_input, search_mask= search_mask)
results.head(30)

Unnamed: 0,nodeId,count,score_cont,score_pref,artists,albums,songs
0,140152,1,0.14467,0.0,[pollini maurizio],[kaiser 5.2],[erläuterungen von joachim kaiser mit musikbei...
1,141988,1,0.108912,0.0,[philip jones bläserensemble],[trumpet voluntary],"[sonata pian' e forte (giovanni gabrieli);, ea..."
2,124071,1,0.073155,0.0,[ludwig van beethoven],[beethoven for meditation],"[piano and wind quintet andante cantabile, sep..."
3,117454,1,0.073155,0.0,[ludwig van beethoven],"[greatest hits, beethoven]","[i. adagio sostenuto from moonlight sonata, ii..."
4,126561,1,0.073155,0.0,[ludwig van beethoven],[ludwig van beethoven],"[lettre a elise, bagatelle en la mineur, opus ..."
5,130422,1,0.073155,0.0,[ludwig van beethoven],[bebekler ve çocuklar için beethoven],"[piano sonata. adagio cantablie. op.13 no. 8, ..."
6,117642,1,0.073155,0.0,[ludwig van beethoven],[masters of classical music],[symphony no. 8 in f major allegretto scherzan...
7,126848,1,0.073155,0.0,[ludwig van beethoven],[beethoven for meditation],"[cello sonata no. 3 adagio cantabile, violin s..."
8,113520,1,0.073155,0.0,[ludwig van beethoven],[the beethoven treasury (reader's digest);],"[minuet in g, für elise]"
9,114751,1,0.073155,0.0,[ludwig van beethoven],[beethoven greatest hits],"[choral fantasy conclusion, moonlight sonata a..."
