# Visualizing FastRP Embeddings

In this notebook we will visualize graph embeddings created by the FastRP algorithm. 

In [1]:
from neo4j import GraphDatabase
from sklearn.manifold import TSNE
import numpy as np
import altair as alt
import pandas as pd

### The following establishes a connection to the Neo4j database.

This can be hosted locally or in the cloud.  Adjust the `uri` and `pwd` variables below appropriately.

In [2]:
class Neo4jConnection:
    
    def __init__(self, uri, user, pwd):
        self.__uri = uri
        self.__user = user
        self.__pwd = pwd
        self.__driver = None
        try:
            self.__driver = GraphDatabase.driver(self.__uri, auth=(self.__user, self.__pwd))
        except Exception as e:
            print("Failed to create the driver:", e)
        
    def close(self):
        if self.__driver is not None:
            self.__driver.close()
        
    def query(self, query, parameters=None, db=None):
        assert self.__driver is not None, "Driver not initialized!"
        session = None
        response = None
        try: 
            session = self.__driver.session(database=db) if db is not None else self.__driver.session() 
            response = list(session.run(query, parameters))
        except Exception as e:
            print("Query failed:", e)
        finally: 
            if session is not None:
                session.close()
        return response

In [13]:
uri = ''
pwd = ''

conn = Neo4jConnection(uri=uri, user="neo4j", pwd=pwd)
conn.query('MATCH (n) RETURN COUNT(n) AS count')

[<Record count=933>]

## Import the data from the database

Here we choose a limited number of countries for the sake of the ease of visualization.

In [28]:
query = '''MATCH (p:Place)-[:IN_COUNTRY]->(country)
           WHERE country.code IN ["E", "GB", "F", "TR", "I", "D", "GR"]
           RETURN p.name AS place, p.embedding AS embedding, country.code AS country
'''

df = pd.DataFrame([dict(_) for _ in conn.query(query)])
df.head()

Unnamed: 0,place,embedding,country
0,Manchester,"[-0.015820825472474098, 0.03900183364748955, -...",GB
1,Dover,"[0.028780877590179443, -0.08504947274923325, 0...",GB
2,Craigavon,"[0.05607718229293823, -0.0077710277400910854, ...",GB
3,Kingston upon Hull,"[0.09304574877023697, 0.03708397597074509, -0....",GB
4,Fishguard,"[0.07234527170658112, -0.13793089985847473, 0....",GB


## Visualization through t-Distributed Stochastic Neighbor Embedding (t-SNE)

In [29]:
X_embedded = TSNE(n_components=2, random_state=6).fit_transform(list(df.embedding))

places = df.place
tsne_df = pd.DataFrame(data = {
    "place": places,
    "country": df.country,
    "x": [value[0] for value in X_embedded],
    "y": [value[1] for value in X_embedded]
})
tsne_df.head()

Unnamed: 0,place,country,x,y
0,Manchester,GB,34.92033,-9.860373
1,Dover,GB,21.83075,-12.030138
2,Craigavon,GB,-14.447485,-21.973227
3,Kingston upon Hull,GB,34.278549,-12.137051
4,Fishguard,GB,23.143465,-6.587614


In [30]:
alt.Chart(tsne_df).mark_circle(size=60).encode(
    x='x',
    y='y',
    color='country',
    tooltip=['place', 'country']
).properties(width=700, height=400)

## A comment on the above

The visible separation of the clusters changes with the number of embedding dimensions used.  You should try different values to see how the above plot changes.