---
# Neo4j with Python

---

<img src="images/neo4j-python.png">

**Neo4j** is a one of the popular Graph Databases and CQL stands for _Cypher Query Language_. Neo4j is written in Java.

_Graph Database_ is a database which stores data in the form of graph structures. It stores our application's data in terms of nodes, relationships and properties. Just like RDBMS (Relational DataBase Management System) stores data in the form of "rows,columns" of Tables, GDBMS stores data in the form of "graphs".

A Graph is a set of nodes and the relationships that connect those nodes. Graphs stores data in _nodes_ and _relationships_ in the form of _properties_. Properties are key-value pairs to represent data. In Graph theory, we can represent a node with a circle and relationship between nodes is represented with an arrow mark.

## Fundamental building blocks of Neo4j

Neo4j is a graph database, adopting a labeled property graph model. In Neo4j terminology, vertices are called nodes, and edges are called relationships.

**Nodes:**

- Nodes are typically used to represent entities (or complex value types).
- Nodes can have properties, which are key/value pairs. Values can be primitives or collections of primitives.
- Nodes can have zero or more relationships connecting them to other nodes.

**Relationships:**

- Relationships are used to represent the relationships between nodes; to provide context to the nodes.
- Relationships must have a start and end node, thus relationships must have a direction. Direction can be ignored at query time, so the fact that direction is there does not mean it must be used.
- Relationships must have a relationship type.
- Relationships can have properties (key/value pairs. values can be primitives or collections of primitives).

**Properties:**

- Nodes and relationships can have properties (key/value pairs. values can be primitives or collections of primitives).
- Properties can quantify relationships.

**Labels:**

- Nodes can have zero or more labels.
- Labels can represent roles, categories or types.
- Labels are used to define indexes and constraints.

Here is a visual representaion how a RDBMS can be transform to a graph database:

<img src="images/RDBMS_vs_GRAPHDB.png">

## Installing Neo4j

Head to http://neo4j.com/download/ and click on the link to download. You'll need to have [Java 7](http://java.com/en/download/windows_xpi.jsp?locale=en) installed as well. On Mac or Linux, untar the download to the folder of your choice, and then run `bin/neo4j` start from the folder where you put it. Windows users will receive an installer package, and you can run the service from the dashboard that starts up after installing. Once you're done, you should be able to visit http://localhost:7474/ to test your server.

# Interaction of Python and Neo4j through `py2neo` Library

[`py2neo`](http://py2neo.org/) is a client library and comprehensive toolkit for working with Neo4j from within Python applications and from the command line. The core library has no external dependencies and has been carefully designed to be easy and intuitive to use. The simplest way of installation `py2neo` is using `pip`
    
    pip install py2neo
    
The simplest way to try out a connection to the Neo4j server is via the console. You need type `neo4j` in console and then confirm the connection to the Neo4j server. 

Neo4j introduces optional authentication for database servers, enabled by default. To use a server with authentication enabled, a user name and password must be specified for the host:port combination. This can either be passed in code using the authenticate() function or specified in the NEO4J_AUTH environment variable. By default the user name and password are neo4j and neo4j respectively. This default password generally requires an initial change before the database can be used.

There are two ways to set up authentication for a new server installation:

* Set an initial password for the neo4j user.
* Copy auth details from another (initialised) server.

`py2neo` provides a command line tool to help with changing user passwords as well as checking whether a password change is required. For a new installation, use:

    $ neoauth neo4j neo4j my-p4ssword
    Password change succeeded

After a password has been set, the tool can also be used to validate credentials:

    $ neoauth neo4j my-p4ssword
    Password change not required

Once you have started a local Neo4j server, open a new Python console and enter the following:

In [3]:
from py2neo import authenticate, Graph

# set up authentication parameters
authenticate("localhost:7474", "YOUR_USERNAME", "YOUR_PASSWORD")

# connect to authenticated graph database
try:
    graph = Graph()
    print "Connected successfully!"
except: 
    print "Something went wrong."

Connected successfully!


In [4]:
graph.neo4j_version

(2, 3, 1)

This imports the Graph class from `py2neo` and creates a instance bound to the default Neo4j server URI http://localhost:7474/db/data/. To connect to a server at an alternative address, simply pass in the URI value as a string argument to the Graph constructor.

Now, you may open Neo4j local server by the URL http://localhost:7474/browser/ or using command:

In [5]:
graph.open_browser()

As we said above, nodes and relationships are the fundamental building blocks of a Neo4j graph and both have a corresponding class in `py2neo`.

Let's create a new database, where we will collect data about movies, actors, directors, etc. with various additional information. Below scheme represents a graph for ["Forrest Gump"](https://en.wikipedia.org/wiki/Forrest_Gump). 

<img src="images/scheme.jpg" width="70%">

Of course, all fields and data from this graph may be collected with the help of relational databases. The variant of MySQL usage are shown below: 

<img src="images/MySQL_scheme.jpg" width="60%">

In [11]:
from py2neo import Node, Relationship

tom_hanks = Node("Person", name="Tom Hanks", born=1956, country="USA")
gary_sinise = Node("Person", name="Gary Sinise", born=1955, country="USA")
robert_zemeckis = Node("Person", name="Robert Zemeckis", born=1952, country="USA")
forrest_gump = Node("Movie", title="Forrest Gump", released=1994, duration_min=142, 
                    country="USA", lang="English")

tom_hanks_acted_in_forrest_gump = Relationship(tom_hanks, "ACTED_IN", forrest_gump, role="Forrest Gump")
gary_sinise_acted_in_forrest_gump = Relationship(gary_sinise, "ACTED_IN", forrest_gump, role="Lieutenant Dan Taylor")
robert_zemeckis_directed_forrest_gump = Relationship(robert_zemeckis, "DIRECTED", forrest_gump)

graph.create(tom_hanks_acted_in_forrest_gump)
graph.create(gary_sinise_acted_in_forrest_gump)
graph.create(robert_zemeckis_directed_forrest_gump)

(<Relationship graph=u'http://localhost:7474/db/data/' ref=u'relationship/93' start=u'node/95' end=u'node/93' type=u'DIRECTED' properties={}>,)

When first created, `Node` and `Relationship` objects exist only in the client; nothing has yet been written to the server. The `Graph.create` method shown above creates corresponding server-side objects and automatically binds each local object to its remote counterpart.

In [12]:
# After graph's Node or Relation creation you may add a new property in such way 
forrest_gump.properties["box_office_Mdol"] = 677.9
forrest_gump.push()

After the running of provided above code you will see the following graph in "Database information" window in Neo4j browser 

<img src="images/graph.jpg">

Available properties became visible when you hover or type on some `Node` or `Relationship`. 

Basic information about the graph:

In [70]:
print "Relationships amount:"
print graph.size

print "\nRelationships types:"
print graph.relationship_types

print "\nNodes amount:"
print graph.order

print "\nExisting Labels:"
print graph.node_labels

print "\nInfo about Node with id = 11:"
# You may have another id of the same Node or Relationship
print graph.node(11)

print "\nThe number of relationships attached to the node:"
print "tom_hanks:", tom_hanks.degree
print "forrest_gump:", forrest_gump.degree

# Properties and Labels of a Node (the same for Relationships) can be obtained also in such way:
print "\ntom_hanks Label:", tom_hanks.labels
print "tom_hanks was born:", tom_hanks.properties['born']

print "\nInfo about Relationship with id = 21:"
print graph.relationship(21)

print "\nRelationship's nodes:"
print tom_hanks_acted_in_forrest_gump.nodes

Relationships amount:
8

Relationships types:
frozenset([u'DIRECTED', u'ACTED_IN', u'BASED_ON'])

Nodes amount:
8

Existing Labels:
frozenset([u'Movie', u'Person'])

Info about Node with id = 11:
(n11:Movie {box_office_Mdol:677.9,country:"USA",duration_min:142,lang:"English",released:1994,title:"Forrest Gump"})

The number of relationships attached to the node:
tom_hanks: 2
forrest_gump: 3

tom_hanks Label: LabelSet(['Person'])
tom_hanks was born: 1956

Info about Relationship with id = 21:
(:Person {born:1952,country:"USA",name:"Robert Zemeckis"})-[r21:DIRECTED]->(:Movie {box_office_Mdol:677.9,country:"USA",duration_min:142,lang:"English",released:1994,title:"Forrest Gump"})

Relationship's nodes:
(<Node graph=u'http://localhost:7474/db/data/' ref=u'node/24' labels=set(['Person']) properties={'country': u'USA', 'name': u'Tom Hanks', 'born': 1956}>, <Node graph=u'http://localhost:7474/db/data/' ref=u'node/25' labels=set(['Movie']) properties={'lang': u'English', 'title': u'Forrest Gump

The last output line represets Cypher code. We will consider it further.

Let's extend our graph database with one new movie ["The Green Mile"](https://en.wikipedia.org/wiki/The_Green_Mile):

In [13]:
michael_clarke_duncan = Node("Person", name="Michael Clarke Duncan", born=1957, country="USA")
frank_darabont = Node("Person", name="Frank Darabont", born=1959, country="France")
stephen_king = Node("Person", name="Stephen King", born=1947, country="USA")
green_mile  = Node("Movie", title="The Green Mile", released=1999, duration_min=188, 
                    country="USA", lang="English", box_office_Mdol=290.7)

graph.create(Relationship(tom_hanks, "ACTED_IN", green_mile, role="Paul Edgecomb"))
graph.create(Relationship(gary_sinise, "ACTED_IN", green_mile, role="Burt Hammersmith"))
graph.create(Relationship(michael_clarke_duncan, "ACTED_IN", green_mile, role="John Coffey"))
graph.create(Relationship(frank_darabont, "DIRECTED", green_mile))
graph.create(Relationship(stephen_king, "BASED_ON", green_mile))

(<Relationship graph=u'http://localhost:7474/db/data/' ref=u'relationship/98' start=u'node/99' end=u'node/96' type=u'BASED_ON' properties={}>,)

Update http://localhost:7474/browser/ page and look at renewed graph.

# A quick Cypher introduction

**Cypher** is a pattern-oriented, declarative query language; a mix of SQL and graph traversal patterns. If you know SQL well, you'll probably quickly see the parallels. This is just a brief introduction to get you started — if you want more complete documentation, see the documentation [here](http://neo4j.com/docs/stable/cypher-query-lang.html) and [here](http://neo4j.com/developer/cypher-query-language/). Note that much of Cypher is case-insensitive, like SQL. Notable exceptions to this rule include identifiers, labels, property keys, and relationship types.

`py2neo` provides Cypher execution functionality via the HTTP transactional endpoint. Method `execute()` allows using pure Cypher inside Python code.

### `CREATE` clause for new data insertion:

Neo4j CQL `CREATE` command is used to create Nodes without and with properties, to create Relationships between Nodes without and with Properties and to create single or multiple labels to a Node or a Relationship.

_**Basic syntax**_:
* for a single Node: 
 
    CREATE (
            <node_name>:<label_name>: ... :<label_name_N> 
            {
                <property_1_name>:<property_1_value>, ..., <property_M_name>:<property_M_value>
             }
            )
* for Relationship between nodes:
    
    CREATE (
            <node_1_name>:<label_1_name>: ... :<label_1_name_N1> 
            {
                <property_1_name>:<property_1_value>, ..., <property_M1_name>:<property_M1_value>
             }
            )-
           [ (<relationship_name>:<relationship_label_name_1>: ... :<relationship_label_name_K>) ]
           ->(
            <node_2_name>:<label_2_name>: ... :<label_2_name_N2> 
            {
                <property_1_name>:<property_1_value>, ..., <property_M2_name>:<property_M2_value>
             }
           )

_**Analogy with SQL**_:

    INSERT INTO <table_name> (<value_of_field_1>, ..., <value_of_field_N>);

Let's create a new Person, a new film ["Inception"](https://en.wikipedia.org/wiki/Inception) and "Matrix" trilogy using `py2neo.Graph.cypher` attribute:

In [81]:
graph.cypher.execute("CREATE (single_actor:Person { name:'Sylvester Stallone', born:1946, country:'USA' })")



In [15]:
graph.cypher.execute("""
                      CREATE (actor:«label_1» { name:'Leonardo DiCaprio', born:1974, country:'USA' })-
                      [:«rel»]->
                      (film:«label_2» { title:"Inseption", released:2010, duration_min:148, 
                                        country:"USA", lang:"English", box_office_Mdol:825.5 })
                     """,
                     actor="leonardo_diCaprio", label_1="Person",
                     rel="ACTED_IN",
                     film="Inception", label_2="Movie"
                    )



In [16]:
graph.cypher.execute(
    """ 
    CREATE (matrix1:Movie { title: 'The Matrix', released: 1999, duration_min:136, box_office_Mdol:463.5 })
    CREATE (matrix2:Movie { title: 'The Matrix Reloaded', released: 2003, duration_min:138, box_office_Mdol:742.1 })
    CREATE (matrix3:Movie { title: 'The Matrix Revolutions', released: 2003, duration_min:129, box_office_Mdol:427.3 })
    CREATE (keanu:Person { name:'Keanu Reeves', born:1964, country:"Canada" })
    CREATE (laurence:Person { name:'Laurence Fishburne', born:1961, country:"USA" })
    CREATE (carrieanne:Person { name:'Carrie-Anne Moss', born:1967, country:"Canada" })
    CREATE (keanu)-[:ACTED_IN { role: 'Neo' }]->(matrix1)
    CREATE (keanu)-[:ACTED_IN { role: 'Neo' }]->(matrix2)
    CREATE (keanu)-[:ACTED_IN { role: 'Neo' }]->(matrix3)
    CREATE (laurence)-[:ACTED_IN { role: 'Morpheus' }]->(matrix1)
    CREATE (laurence)-[:ACTED_IN { role: 'Morpheus' }]->(matrix2)
    CREATE (laurence)-[:ACTED_IN { role: 'Morpheus' }]->(matrix3)
    CREATE (carrieanne)-[:ACTED_IN { role: 'Trinity' }]->(matrix1)
    CREATE (carrieanne)-[:ACTED_IN { role: 'Trinity' }]->(matrix2)
    CREATE (carrieanne)-[:ACTED_IN { role: 'Trinity' }]->(matrix3)
    """
)



### `RETURN` clause for returning query result:

Neo4j CQL `RETURN` clause is used to retrieve some or all properties of a Node, of Nodes and associated Relationships. We should use it with either `MATCH` or `CREATE` Commands.

_**Basic syntax**_:
    
    RETURN <node_name>.<property_1_name>, ... , <node_name>.<property_N_name>

### `MATCH` clause for data selection:

Neo4j CQL `MATCH` command is used to get data about nodes, relationships and properties from database. We can use `MATCH` command with `RETURN` clause or an update clause.

_**Basic syntax**_:
    
    MATCH (<node_name>:<label_name>)
    RETURN <node_name>.<property_1_name>, ... , <node_name>.<property_N_name>
    
_**Analogy with SQL**_:

    SELECT <field_1>, ..., <field_N> FROM <table_name>;

In [32]:
# Return all items
graph.cypher.execute("MATCH (n) RETURN n")
# is equivalent to 
# SELECT * FROM table_1
# UNION
# SELECT * FROM table_2
# ...

    | n                                                                                                                     
----+------------------------------------------------------------------------------------------------------------------------
  1 | (n92:Person {born:1956,country:"USA",name:"Tom Hanks"})                                                               
  2 | (n93:Movie {box_office_Mdol:677.9,country:"USA",duration_min:142,lang:"English",released:1994,title:"Forrest Gump"})  
  3 | (n94:Person {born:1955,country:"USA",name:"Gary Sinise"})                                                             
  4 | (n95:Person {born:1952,country:"USA",name:"Robert Zemeckis"})                                                         
  5 | (n96:Movie {box_office_Mdol:290.7,country:"USA",duration_min:188,lang:"English",released:1999,title:"The Green Mile"})
  6 | (n97:Person {born:1957,country:"USA",name:"Michael Clarke Duncan"})                                                   

In [33]:
# Return specific Movie object 
graph.cypher.execute("MATCH (movie:Movie { title:'The Matrix' }) RETURN movie")

   | movie                                                                                 
---+----------------------------------------------------------------------------------------
 1 | (n103:Movie {box_office_Mdol:463.5,duration_min:136,released:1999,title:"The Matrix"})

In [34]:
# Return the title and date of the film
graph.cypher.execute("MATCH (movie:Movie { title:'The Matrix' }) RETURN movie.title, movie.released")

   | movie.title | movie.released
---+-------------+----------------
 1 | The Matrix  |           1999

In [35]:
# Return the title and date of all films
graph.cypher.execute("MATCH (movie:Movie) RETURN movie.title, movie.released")
# Analogy with SQL:
# SELECT title, released FROM movie;

   | movie.title            | movie.released
---+------------------------+----------------
 1 | Forrest Gump           |           1994
 2 | The Green Mile         |           1999
 3 | Inseption              |           2010
 4 | The Matrix             |           1999
 5 | The Matrix Reloaded    |           2003
 6 | The Matrix Revolutions |           2003

In [36]:
# Return Person names and year of birth, and order them by year in descending order:
graph.cypher.execute("""
    MATCH (person:Person)
    RETURN person.name, person.born
    ORDER BY person.born DESC
""")
# Analogy with SQL:
# SELECT name, born FROM person ORDER BY born DESC;

    | person.name           | person.born
----+-----------------------+-------------
  1 | Leonardo DiCaprio     |        1974
  2 | Carrie-Anne Moss      |        1967
  3 | Keanu Reeves          |        1964
  4 | Laurence Fishburne    |        1961
  5 | Frank Darabont        |        1959
  6 | Michael Clarke Duncan |        1957
  7 | Tom Hanks             |        1956
  8 | Gary Sinise           |        1955
  9 | Robert Zemeckis       |        1952
 10 | Stephen King          |        1947
 11 | Sylvester Stallone    |        1946

In [37]:
# Count all objects:
graph.cypher.execute("MATCH (n) RETURN COUNT(*)")

   | count(*)
---+----------
 1 |       17

In [38]:
# Count Person's:
graph.cypher.execute("MATCH (n:Person) RETURN COUNT(*)")
# Analogy with SQL:
# SELECT COUNT(*) FROM person

   | count(*)
---+----------
 1 |       11

In [39]:
# Count relationship types:
graph.cypher.execute("MATCH (n)-[r]->() RETURN TYPE(r), COUNT(*)")

   | type(r)  | count(*)
---+----------+----------
 1 | BASED_ON |        1
 2 | ACTED_IN |       15
 3 | DIRECTED |        2

In [101]:
# List all nodes and their relationships:
graph.cypher.execute("""
    MATCH (n)-[r]->(m)
    RETURN n.name AS FROM, type(r) AS `->`, m.title AS TO
    LIMIT 10
""")

    | FROM                  | ->       | TO                 
----+-----------------------+----------+---------------------
  1 | Tom Hanks             | ACTED_IN | The Green Mile     
  2 | Tom Hanks             | ACTED_IN | Forrest Gump       
  3 | Gary Sinise           | ACTED_IN | The Green Mile     
  4 | Gary Sinise           | ACTED_IN | Forrest Gump       
  5 | Robert Zemeckis       | DIRECTED | Forrest Gump       
  6 | Michael Clarke Duncan | ACTED_IN | The Green Mile     
  7 | Frank Darabont        | DIRECTED | The Green Mile     
  8 | Stephen King          | BASED_ON | The Green Mile     
  9 | Leonardo DiCaprio     | ACTED_IN | Inseption          
 10 | Keanu Reeves          | ACTED_IN | The Matrix Reloaded

### `WHERE` clause:

Like SQL, Neo4j CQL has provided `WHERE` clause in CQL `MATCH` command to filter the results of a `MATCH` query.


_**Basic syntax**_:
    
    MATCH (<node_name>:<label_name>)
    WHERE <condition> <boolean_operator> <condition>
    RETURN <node_name>.<property_1_name>, ... , <node_name>.<property_N_name>
    
_**Analogy with SQL**_:

    SELECT <field_1>, ..., <field_N> 
    FROM <table_name>
    WHERE <condition> <boolean_operator> <condition>;
    
Cypher suppots the same operators as SQL: "=", "<>"	"<", ">", "<=", ">=", "`AND`", "`OR`", "`NOT`", "`XOR`".

In [41]:
# Get only those persons who was born not in USA:
graph.cypher.execute("""
    MATCH (person:Person)
    WHERE person.country <> "USA"
    RETURN person.name, person.country
""")
# Analogy with SQL:
# SELECT name FROM person WHERE country <> "USA";

   | person.name      | person.country
---+------------------+----------------
 1 | Frank Darabont   | France        
 2 | Keanu Reeves     | Canada        
 3 | Carrie-Anne Moss | Canada        

In [42]:
# Get only the persons whose names end with “s”:
graph.cypher.execute("""
    MATCH (person:Person)
    WHERE person.name =~ ".*s$" 
    RETURN person.name
""")
# "WHERE person.name =~ '.*s$'" is equivalent to "WHERE person.name ENDS WITH 's'"

# Analogy with SQL:
# SELECT name FROM person WHERE name LIKE '%s';

   | person.name     
---+------------------
 1 | Tom Hanks       
 2 | Robert Zemeckis 
 3 | Keanu Reeves    
 4 | Carrie-Anne Moss

In [105]:
# Get those persons who was born after 1955 in USA or France OR whose name starts with 'S' and contains 'St'
graph.cypher.execute("""
    MATCH (person:Person)
    WHERE (person.born > 1955 AND person.country IN ["USA", "France"]) 
    OR (person.name STARTS WITH 'S' AND person.name CONTAINS 'St')
    RETURN person.name, person.born
""")
# Analogy with SQL:
# SELECT name FROM person 
# WHERE (born > 1955 AND country IN ["USA", "France"]) OR (name LIKE 'S%' AND LIKE '%St%');

   | person.name           | person.born
---+-----------------------+-------------
 1 | Tom Hanks             |        1956
 2 | Michael Clarke Duncan |        1957
 3 | Frank Darabont        |        1959
 4 | Stephen King          |        1947
 5 | Leonardo DiCaprio     |        1974
 6 | Laurence Fishburne    |        1961

In [44]:
# Calculate sum of total box office and the duration average value of "The Matrix" trilogy:
graph.cypher.execute("""
    MATCH (movie:Movie)
    WHERE movie.title STARTS WITH 'The Matrix'
    RETURN SUM(movie.box_office_Mdol) AS total, AVG(movie.duration_min)
""")
# Analogy with SQL:
# SELECT SUM(box_office_Mdol), AVG(duration_min) FROM movie 
# WHERE title LIKE 'The Matrix%';

   | total  | avg(movie.duration_min)
---+--------+-------------------------
 1 | 1632.9 |           134.333333333

In [128]:
# Find all movies, which Tom Hanks was acted in
graph.cypher.execute("""
    MATCH (movie:Movie)<-[:ACTED_IN]-(actor:Person { name: "Tom Hanks" })
    RETURN actor.name, movie.title
""")

   | actor.name | movie.title   
---+------------+----------------
 1 | Tom Hanks  | The Green Mile
 2 | Tom Hanks  | Forrest Gump  

In [122]:
# All other movies that actors in “The Matrix” acted in ordered by occurrence:
graph.cypher.execute("""
    MATCH (:Movie { title: "The Matrix" })<-[:ACTED_IN]-(actor)-[:ACTED_IN]->(movie)
    RETURN movie.title, COUNT(*)
    ORDER BY COUNT(*) DESC
""")

   | movie.title            | count(*)
---+------------------------+----------
 1 | The Matrix Revolutions |        3
 2 | The Matrix Reloaded    |        3

Pay attention how we can filter movies without `WHERE` clause and only determining additionally some properties.

In [124]:
# Let’s see who acted in each of these movies:
graph.cypher.execute("""
    MATCH (:Movie { title: "The Matrix" })<-[:ACTED_IN]-(actor)-[:ACTED_IN]->(movie)
    RETURN movie.title, COLLECT(actor.name), COUNT(*) AS count
    ORDER BY COUNT(*) DESC
""")

   | movie.title            | COLLECT(actor.name)                                           | count
---+------------------------+---------------------------------------------------------------+-------
 1 | The Matrix Revolutions | [u'Carrie-Anne Moss', u'Keanu Reeves', u'Laurence Fishburne'] |     3
 2 | The Matrix Reloaded    | [u'Carrie-Anne Moss', u'Keanu Reeves', u'Laurence Fishburne'] |     3

In [126]:
# What about co-acting, that is actors that acted together:
graph.cypher.execute("""
    MATCH (:Movie { title: "The Matrix" })<-[:ACTED_IN]-(actor)-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(colleague)
    RETURN actor.name, COLLECT(DISTINCT colleague.name)
""")

   | actor.name         | COLLECT(DISTINCT colleague.name)            
---+--------------------+----------------------------------------------
 1 | Carrie-Anne Moss   | [u'Laurence Fishburne', u'Keanu Reeves']    
 2 | Laurence Fishburne | [u'Carrie-Anne Moss', u'Keanu Reeves']      
 3 | Keanu Reeves       | [u'Laurence Fishburne', u'Carrie-Anne Moss']

In [127]:
# What about co-acting, that is actors that acted together:
graph.cypher.execute("""
    MATCH (:Movie { title: "The Matrix" })<-[:ACTED_IN]-(actor)-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(colleague)
    RETURN actor.name, COLLECT(DISTINCT colleague.name)
""")

   | actor.name         | COLLECT(DISTINCT colleague.name)            
---+--------------------+----------------------------------------------
 1 | Carrie-Anne Moss   | [u'Laurence Fishburne', u'Keanu Reeves']    
 2 | Laurence Fishburne | [u'Carrie-Anne Moss', u'Keanu Reeves']      
 3 | Keanu Reeves       | [u'Laurence Fishburne', u'Carrie-Anne Moss']

Let's list some of the important and frequently used functions:

**String functions:**

* `UPPER` - it is used to change all letters into upper case letters;
* `LOWER` - it is used to change all letters into lower case letters;
* `SUBSTRING` - it is used to get substring of a given string;
* `REPLACE`	- it is used to replace a substring with give substring of a string;
* `LENGTH(string)` - it returns the length of a string;
* `TRIM` - it returns the original string with whitespace removed from both sides;
* `SPLIT(original, splitPattern)` - it returns the sequence of strings witch are delimited by split patterns.
* `REVERSE` - it returns the original string reversed.

**Relationship functions:**

* `STARTNODE` - it is used to know the Start Node of a Relationship;
* `ENDNODE` - it is used to know the End Node of a Relationship;
* `ID` - it is used to know the `ID` of a Relationship;
* `TYPE` -it is used to know the `TYPE` of a Relationship in string representation.

**Aggregation functions:**

* `COUNT` - it returns the number of rows returned by `MATCH` command;
* `MAX` - it returns the maximum value from a set of rows returned by `MATCH` command;
* `MIN` - it returns the minimum value from a set of rows returned by `MATCH` command;
* `SUM` - it returns the summation value of all rows returned by `MATCH` command;
* `AVG` - it returns the average value of all rows returned by `MATCH` command;
* `COLLECT` - it collects all the values into a list. It will ignore `NULL`s;
* `DISTINCT` - it removes duplicates from the values.

**Predicates:**

* `ALL` - it tests whether a predicate holds for all element of this collection collection;
* `ANY` - it tests whether a predicate holds for at least one element in the collection;
* `EXISTS` - it returns true if a match for the pattern exists in the graph, or the property exists in the node, relationship or map.

**Collection functions:**

* `NODES` - it returns all nodes in a path;
* `RELATIONSHIPS` - it returns all relationships in a path;
* `LABELS` - it returns a collection of string representations for the labels attached to a node;
* `KEYS` - it returns a collection of string representations for the property names of a node, relationship, or map;
* `RANGE(start, end [, step])` - it returns numerical values in a range with a non-zero step value step;
* `HEAD` - it returns the first element in a collection;
* `LAST` - it returns the last element in a collection.

In [115]:
graph.cypher.execute("""
    MATCH (a)-[movie:DIRECTED]->(b) 
    RETURN STARTNODE(movie), LENGTH(STARTNODE(movie).name) AS length
""")

   | STARTNODE(movie)                                                | length
---+-----------------------------------------------------------------+--------
 1 | (n95:Person {born:1952,country:"USA",name:"Robert Zemeckis"})   |     15
 2 | (n98:Person {born:1959,country:"France",name:"Frank Darabont"}) |     14

In [112]:
graph.cypher.execute("""
    MATCH (a:Movie)
    RETURN a.title AS title, EXISTS((a)<-[:DIRECTED]-()) AS director_is_known
""")   

   | title                  | director_is_known
---+------------------------+-------------------
 1 | Forrest Gump           |              True
 2 | The Green Mile         |              True
 3 | Inseption              |             False
 4 | The Matrix             |             False
 5 | The Matrix Reloaded    |             False
 6 | The Matrix Revolutions |             False

In [120]:
graph.cypher.execute("""
    MATCH (a:Movie)
    WHERE a.released IN RANGE(1999, 2005) AND ANY (x IN SPLIT(a.title, ' ') WHERE LOWER(x) = "the")
    RETURN REVERSE(a.title), HEAD(KEYS(a)), LABELS(a)
""")       

   | REVERSE(a.title)       | HEAD(KEYS(a)) | LABELS(a) 
---+------------------------+---------------+------------
 1 | eliM neerG ehT         | lang          | [u'Movie']
 2 | xirtaM ehT             | title         | [u'Movie']
 3 | dedaoleR xirtaM ehT    | title         | [u'Movie']
 4 | snoituloveR xirtaM ehT | title         | [u'Movie']

### `DELETE` and `REMOVE` clauses:

Neo4j CQL `DELETE` clause is used to delete a Node, a Node and associated Nodes and Relationships.

_**Basic syntax**_:
    
    MATCH (<node_name>:<label_name>)
    DELETE <node_name_list>
    
_**Analogy with SQL**_:

    DELETE FROM <table_name>
    WHERE <some_column> = <some_value>;

Neo4j CQL `REMOVE` command is used to remove labels and properties of a Node or a Relationship

_**Basic syntax**_:

    MATCH (<node_name>:<label_name>)
    REMOVE <node_name>.<property_1_name>, ..., <node_name>.<property_N_name> 
    
_**Analogy with SQL**_:

    ALTER TABLE <table_name>
    DROP COLUMN <column_name>;
    
Besides, the following code clear the whole database:

    MATCH (n)
    OPTIONAL MATCH (n)-[r]-()
    DELETE n, r

In [82]:
# Look at the single Person Node (without any connection)
graph.cypher.execute("""
    MATCH (a:Person) 
    WHERE NOT (a)-[]->()
    RETURN a
""")

   | a                                                                
---+-------------------------------------------------------------------
 1 | (n111:Person {born:1946,country:"USA",name:"Sylvester Stallone"})

In [87]:
# Remove *born* property
graph.cypher.execute("""
    MATCH (a:Person) 
    WHERE NOT (a)-[]->()
    REMOVE a.born
    RETURN a.name, a.born, a.country, labels(a)
""")

   | a.name             | a.born | a.country | labels(a)  
---+--------------------+--------+-----------+-------------
 1 | Sylvester Stallone |        | USA       | [u'Person']

In [88]:
# Remove label
graph.cypher.execute("""
    MATCH (a:Person) 
    WHERE NOT (a)-[]->()
    REMOVE a:Person
    RETURN a.name, a.born, a.country, labels(a)
""")

   | a.name             | a.born | a.country | labels(a)
---+--------------------+--------+-----------+-----------
 1 | Sylvester Stallone |        | USA       | []       

In [89]:
# Delete the Node for Sylvester Stallone
graph.cypher.execute("""
    MATCH (a:Person) 
    WHERE NOT (a)-[]->()
    DELETE a
""")



In [90]:
# See if  the Node for Sylvester Stallone remained
graph.cypher.execute("""
    MATCH (a:Person) 
    WHERE NOT (a)-[]->()
    RETURN a
""")

  | a
--+---

### `SET` clause for adding of new properties:

Neo4j CQL has provided SET clause to add new properties to existing Node or Relationship, add or update Properties values.

_**Basic syntax**_:
    
    MATCH (<node_name>:<label_name>)
    SET <node_label_name>.<property_1_name>, ..., <node_label_name>.<property_N_name>
   
_**Analogy with SQL**_:
    
    ALTER TABLE <table_name>
    ADD <column_name> <datatype>

In [132]:
# Let's add new property *IMDb_rating* to all movies; default value is 9
graph.cypher.execute("""
    MATCH (a:Movie) 
    SET a.IMDb_rating = 9
    RETURN a
    SKIP 2
""")
# SKIP command allows missing N the first rows

   | a                                                                                                                               
---+----------------------------------------------------------------------------------------------------------------------------------
 1 | (n102:Movie {IMDb_rating:9,box_office_Mdol:825.5,country:"USA",duration_min:148,lang:"English",released:2010,title:"Inseption"})
 2 | (n103:Movie {IMDb_rating:9,box_office_Mdol:463.5,duration_min:136,released:1999,title:"The Matrix"})                            
 3 | (n104:Movie {IMDb_rating:9,box_office_Mdol:742.1,duration_min:138,released:2003,title:"The Matrix Reloaded"})                   
 4 | (n105:Movie {IMDb_rating:9,box_office_Mdol:427.3,duration_min:129,released:2003,title:"The Matrix Revolutions"})                

In [107]:
# Let's update property *IMDb_rating* of 'Inseption'
graph.cypher.execute("""
    MATCH (a:Movie {title: 'Inseption'})
    SET a.IMDb_rating = 8.7
    RETURN a.title, a.IMDb_rating
""")

   | a.title   | a.IMDb_rating
---+-----------+---------------
 1 | Inseption |           8.7

### `MERGE` clause (`CREATE + MATCH` together):

Neo4j CQL `MERGE` command is used to create nodes, relationships and properties and to retrieve data from database. `MERGE` command is a combination of `CREATE` command and `MATCH` command. `MERGE` command searches for given pattern in the graph, if it exists then it returns the results. If it does NOT exist in the graph, then it creates new node/relationship and returns the results.

`MERGE` command is equivalent to `merge` or `merge_one` functions of `py2neo` library. We will show its usage below.

_**Basic syntax**_:
    
    MERGE (<node_name>:<label_name> { <property_1_name>:<property_1_value>, ..., <property_N_name>:<property_N_value> })

In [129]:
# Create a new Node
graph.cypher.execute("""
    MERGE (a { name:'Robert De Niro', age:72 })
    RETURN a
""")

   | a                                    
---+---------------------------------------
 1 | (n112 {age:72,name:"Robert De Niro"})

### `UNION` clause:

It combines and returns common rows from two set of results into a single set of results. It does not return duplicate rows from two nodes. Result column types and names from two set of results have to match that means column names should be same and column's data types should be same.


_**Basic syntax**_:
    
    <MATCH Command_1>
        UNION
    <MATCH Command_2>
   
_**Analogy with SQL**_:
    
    SELECT <selection_1>
        UNION
    SELECT <selection_2>;

In [134]:
graph.cypher.execute("""
    MATCH (n:Person)
    RETURN n.name AS name
    UNION ALL MATCH (n:Movie)
    RETURN n.title AS name
""")    

    | name                  
----+------------------------
  1 | Tom Hanks             
  2 | Gary Sinise           
  3 | Robert Zemeckis       
  4 | Michael Clarke Duncan 
  5 | Frank Darabont        
  6 | Stephen King          
  7 | Leonardo DiCaprio     
  8 | Keanu Reeves          
  9 | Laurence Fishburne    
 10 | Carrie-Anne Moss      
 11 | Forrest Gump          
 12 | The Green Mile        
 13 | Inseption             
 14 | The Matrix            
 15 | The Matrix Reloaded   
 16 | The Matrix Revolutions

---
>### Exersice:
> On of the ways of creation of a new Neo4j database is the following:
> Before Starting neo4j community click the browse option and choose an other directory or create a new one. After that you should to connect to this directory.
> <img src="images/new_db_1.jpg">
> <img src="images/new_db_2.jpg">

> 1\. Create a new Neo4j database and call it as "imdb". We will collect data scrapped earlier from IMDB web site. Connect to this database.

> 2\. Read "imdb_movies_1500.json" file to the `data` variable, that contains the 1500 records of scrapped data about most popular movies, its main actors and director(s).

> 3\. It is very easy to save JSON data to Neo4j database. The following code demonstrates how it can be done for the case of "imdb_movies_1500.json" file (Please, look at the this JSON file content and available fields before working with the following code).

> <span style="margin-left:4.5em"></span><code style="color: darkblue"># Create a new \`py2neo\` graph object</code><br></br>
> <span style="margin-left:4.5em"></span>`graph = Graph()`<br></br>

> <span style="margin-left:4.5em"></span><code style="color: darkblue"># To create a constraint that makes sure that your database will never contain  more</code><br></br>
> <span style="margin-left:4.5em"></span><code style="color: darkblue"># than one node with a specific label and one property value, use the IS UNIQUE syntax.</code><br></br>
> <span style="margin-left:4.5em"></span><code style="color: darkblue"># The following construction allows define a new label class with one unique property</code><br></br>
> <span style="margin-left:4.5em"></span>`graph.cypher.execute("CREATE CONSTRAINT ON (m:Movie) ASSERT m.title IS UNIQUE")`<br></br>
> <span style="margin-left:4.5em"></span>`graph.cypher.execute("CREATE CONSTRAINT ON (a:Actor) ASSERT a.name IS UNIQUE")`<br></br>
> <span style="margin-left:4.5em"></span>`graph.cypher.execute("CREATE CONSTRAINT ON (d:Director) ASSERT d.name IS UNIQUE")`<br></br>

> <span style="margin-left:4.5em"></span>`for row in data:`<br></br>
> <span style="margin-left:6.5em"></span>`actors = row["actors"]`<code style="margin-left:5.8em; color: darkblue"># Collect all actors</code><br></br>
> <span style="margin-left:6.5em"></span>`directors = row["directors"]`<code style="margin-left:2.5em; color: darkblue"># Collect all directors</code><br></br>

> <span style="margin-left:6.5em"></span><code style="color: darkblue"># Add a new record to "Movie" label with the unique "title" value using \`merge_one()\` function</code><br></br>
> <span style="margin-left:6.5em"></span>`movie = graph.merge_one("Movie", "title", row["title"])`<br></br>
> <span style="margin-left:6.5em"></span><code style="color: darkblue"># All other fields are movie's properties</code><br></br>
> <span style="margin-left:6.5em"></span>`movie.properties["description"] = row["description"]`<br></br>
> <span style="margin-left:6.5em"></span>`movie.properties["genres"]      = row["genres"]`<br></br>
> <span style="margin-left:6.5em"></span>`movie.properties["rating"]      = row["rating"]`<br></br>
> <span style="margin-left:6.5em"></span>`movie.properties["released"]    = row["released"]`<br></br>
> <span style="margin-left:6.5em"></span>`movie.properties["runtime"]     = row["runtime"]`<br></br>
> <span style="margin-left:6.5em"></span><code style="color: darkblue"># Save properties</code><br></br>
> <span style="margin-left:6.5em"></span>`movie.push()`<br></br>

> <span style="margin-left:6.5em"></span><code style="color: darkblue"># Add data about actor(s) to the database as before</code><br></br>
> <span style="margin-left:6.5em"></span>`for person in actors:`<br></br>
> <span style="margin-left:8.5em"></span>`actor = graph.merge_one("Actor", "name", person["name"])`<br></br>
> <span style="margin-left:8.5em"></span>`actor.properties["born"] = person["born"]`<br></br>
> <span style="margin-left:8.5em"></span>`actor.properties["city"] = person["city"]`<br></br>
> <span style="margin-left:8.5em"></span>`actor.properties["country"] = person["country"]`<br></br>
> <span style="margin-left:8.5em"></span>`actor.properties["died"] = person["died"]`<br></br>
> <span style="margin-left:8.5em"></span>`actor.properties["image_url"] = person["image_url"]`<br></br>
       
> <span style="margin-left:8.5em"></span><code style="color: darkblue"># Define a relationship between the actor and the movie</code><br></br>
> <span style="margin-left:8.5em"></span>`graph.create_unique(Relationship(actor, "ACTED_IN", movie))`<br></br>

> <span style="margin-left:6.5em"></span><code style="color: darkblue"># Add data about director(s) to the database as before</code><br></br>
> <span style="margin-left:6.5em"></span>`for person in directors:`<br></br>
> <span style="margin-left:8.5em"></span>`director = graph.merge_one("Director", "name", person["name"])`<br></br>
> <span style="margin-left:8.5em"></span>`director.properties["born"] = person["born"]`<br></br>
> <span style="margin-left:8.5em"></span>`director.properties["city"] = person["city"]`<br></br>
> <span style="margin-left:8.5em"></span>`director.properties["country"] = person["country"]`<br></br>
> <span style="margin-left:8.5em"></span>`director.properties["died"] = person["died"]`<br></br>
> <span style="margin-left:8.5em"></span>`director.properties["image_url"] = person["image_url"]`<br></br>
        
> <span style="margin-left:8.5em"></span><code style="color: darkblue"># Define a relationship between the director and the movie</code><br></br>
> <span style="margin-left:8.5em"></span>`graph.create_unique(Relationship(director, "DIRECTED", movie))`<br></br>

> Use above code to fill your database with data from "imdb_movies_1500.json" file. 

> 4\. Display the total links and nodes amount and also nodes and links amount for each label category.

> 5\. Display in chronological order the list of movies (show only its title and released year) where Keanu Reeves was acted.

> 6\. Find the amount of all actors (unique!) in the database and amount of actors (also unique) that were acted in movies released after 1990 and with rating from 3 to 4.

> 7\. Display all movies with at least two known directors and movies (call them as "free movies") that have no relationships with other movies, i.e. its actors were not acted in other movies from the database and there are also no movies with the director of a "free movie". "Free movies" look in the Neo4j browser like

> <img src="images/free_movies.jpg">