# Exercise 0: Creating a Neo4j Sandbox and and answer some basic graph questions

## Introduction

We will be using free Neo4j Sandbox database instances for this course.  We will get started with a very basic graph of movies as a means of quickly introducing the Cypher query language.  The following references will help to quickly learn Cypher:

- [Neo4j Cheat Sheet and Quick Reference](https://dev.neo4j.com/neo4j_cheatsheet)
- [Cypher Reference Card](https://dev.neo4j.com/cypher_ref_card)

## Create a Sandbox

You will need to create an instance by going to [this link](dev.neo4j.com/try), authenticating, and click on "New Project."  From here, select the Movies graph and "Launch Project."

<img src="images/select_project.png" width="600">

Once the instance starts, you will need the connection details for the instance as shown below:

<img src="images/connection_details.png" width="600">

In particular, we will need the Bolt URL and the password.

## Connecting to the instance

You have the option for this exercise of how to connect to the instance.  You can choose to work with Neo4j directly in the browser by clicking on the "Open" button on the Sandbox webpage.  Another option is to work directly in Python.  We will use both throughout the course, but typically the browser is good for EDA while Python is good for more in-depth problem solving.  However, the questions in this exercise can be solved using either.  The remainder of this notebook will illustrate how to do this exercise completely within Python.

In [None]:
from py2neo import Graph

## Connection details

You will need your specific Bolt URL and password to make the connection as shown below.

In [None]:
uri = ""
username = ""
pwd = ""

conn = Graph(uri, auth=(username, pwd))

## Basic concepts around notation

The main features of graphs are nodes and relationships like this:

<img src="images/basic_graph.png" width="600">

Nodes are represented in Cypher with `( )` while relationships are represented as `[ ]`.  

## What is the schema of this graph?

(This is run via `CALL db.schema.visualization()` in the browser.)

<img src="images/movie_schema.png" width="300">

## Count the number of nodes in the graph

In [None]:
query = """MATCH (n) RETURN COUNT(n)"""
result = conn.query(query)
result

## What information is present about each node?

In [None]:
query = """MATCH (p:Person) RETURN p LIMIT 5"""
result = conn.query(query).data()
result

In [None]:
query = """MATCH (p:Person) RETURN p LIMIT 5"""
result_df = conn.query(query).to_data_frame()
result_df.head()

## _Question:_ How would you modify the above to return just the name and birth year of the person?

In [None]:
# your code here...

## Identifying some relationships

In [None]:
query = """MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) 
           RETURN m.title
"""

# Alternatively:
#
#query = """MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
#           WHERE p.name = 'Tom Hanks'
#           RETURN m.title
###"""

result = conn.query(query).data()

for record in result:
    print(record['m.title'])

## Match co-actors

Note that in Cypher `COLLECT()` returns the values as a list.  The opposite action is `UNWIND()`.

In [None]:
query = """
    MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(p:Person)
    RETURN m.title AS title, COLLECT(p.name) as coActors
"""

result_df = conn.query(query).to_data_frame()
result_df.head()

## _Question:_ How many distinct co-actors are there for Tom Hanks?

In [None]:
# your code here...

## Searching n-hops from a target node

In [None]:
query = """
    MATCH (p:Person {name: 'Tom Hanks'})-[*1..3]-(p2:Person) 
    RETURN DISTINCT(p2.name) AS name
    ORDER BY p2.name
    LIMIT 10
"""

result = conn.query(query).data()

for record in result:
    print(record['name'])

## _Question:_ How many actors are within 3 degrees (hops) from Tom Hanks?

In [None]:
# your code here...