# About

This notebook will introduce the list data structure, `range()` function, `collect()` function, and `UNWIND` keyword in Neo4j.

In [1]:
from neo4j import GraphDatabase, Record, ResultSummary, EagerResult
from neo4j.time import Date

import pandas as pd
pd.set_option('display.max_colwidth', 100)

import os 
import sys
from dotenv import load_dotenv 
load_dotenv()

# Add the utils directory to sys.path
sys.path.append(os.path.abspath("../utils"))

from Neo4jParser import Neo4jParser


NEO4J_URI = os.getenv("NEO4J_URI")
NEO4J_USERNAME = os.getenv("NEO4J_USERNAME")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

## List, Range, and Collect

* A list in Neo4j is a structure like this: `[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]` or `['spencer', 'joe', 'myer', 'sarah']`
* We can generate lists in Neo4j with `range()`.
    * `range()` has three arguments: start, end, step.
        * i.e. `range(0, 10, 2)` returns: [2, 4, 6, 8, 10]
    * Works similar to `range()` in python, except the ending value is inclusive.
    * We can index lists with either a single number `[0]` or series `[50..60]`
        * NOTE: The last value in a series of indexed values does not include the last value.
* Can calculate the number of elements in a list, use `size()`
* Can filter a list through list comprehension. Similar to python.
    * i.e. `RETURN [x in range(0, 10) where x % 2 = 0 | x]`
        * This query returns all even numbers in the list.
* `collect()` is used to return a single aggregated result of an expression.
    * ```
      MATCH (n:Person)
      RETURN collect(n.name);
      ```
    * This query would return a list of names

In [2]:
# Return a list from 0 to 10 by a factor of 2.
result = driver.execute_query(
    """ 
    RETURN range(0, 10, 2) as lst
    """,
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 1 records after 2 ms and completed after 4 ms.

Query executed against database: 'neo4j':  
    RETURN range(0, 10, 2) as lst
    


{'lst': [0, 2, 4, 6, 8, 10]}

In [3]:
# Return a list from -100 to 0 by a factor of 10
result = driver.execute_query(
    """ 
    RETURN range(-100, 0, 10) as lst
    """,
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 1 records after 1 ms and completed after 2 ms.

Query executed against database: 'neo4j':  
    RETURN range(-100, 0, 10) as lst
    


{'lst': [-100, -90, -80, -70, -60, -50, -40, -30, -20, -10, 0]}

In [4]:
# Use ranges inside square brackets to access elements of a list
result = driver.execute_query(
    """ 
    RETURN range(0, 100)[50..60];
    """,
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 1 records after 1 ms and completed after 2 ms.

Query executed against database: 'neo4j':  
    RETURN range(0, 100)[50..60];
    


{'range(0, 100)[50..60]': [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]}

In [5]:
# Use list comprehension to return all factors of 9 to 100
result = driver.execute_query(
    """ 
    RETURN [x in range(0, 100) where x % 9 = 0 | x] AS list;
    """,
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 1 records after 2 ms and completed after 3 ms.

Query executed against database: 'neo4j':  
    RETURN [x in range(0, 100) where x % 9 = 0 | x] AS list;
    


{'list': [0, 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99]}

In [6]:
# Practice list comprehension
result = driver.execute_query(
    """ 
    RETURN [x in range(0, 5) | {original_x: x, x_factor_of_x: x^x}] AS list;
    """,
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 1 records after 2 ms and completed after 5 ms.

Query executed against database: 'neo4j':  
    RETURN [x in range(0, 5) | {original_x: x, x_factor_of_x: x^x}] AS list;
    


{'list': [{'original_x': 0, 'x_factor_of_x': 1.0},
  {'original_x': 1, 'x_factor_of_x': 1.0},
  {'original_x': 2, 'x_factor_of_x': 4.0},
  {'original_x': 3, 'x_factor_of_x': 27.0},
  {'original_x': 4, 'x_factor_of_x': 256.0},
  {'original_x': 5, 'x_factor_of_x': 3125.0}]}

In [None]:
# Practice pattern comprehension
result = driver.execute_query(
    """ 
    MATCH (p:Person {name: "Tom Hanks"})
    RETURN [(p)-->(m:Movie) | m.title] AS list;
    """,
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 1 records after 1 ms and completed after 3 ms.

Query executed against database: 'neo4j':  
    MATCH (p:Person {name: "Tom Hanks"})
    RETURN [(p)-->(m:Movie) | m.title] AS list;
    


{'list': ["You've Got Mail",
  'Sleepless in Seattle',
  'Joe Versus the Volcano',
  'That Thing You Do',
  'Cloud Atlas',
  'The Da Vinci Code',
  'The Green Mile',
  'Apollo 13',
  'Cast Away',
  "Charlie Wilson's War",
  'The Polar Express',
  'A League of Their Own',
  'That Thing You Do']}

In [15]:
# Practice pattern comprehension
result = driver.execute_query(
    """ 
    MATCH (n:Person)
    RETURN collect(n.name) as list_of_names;
    """,
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 1 records after 28 ms and completed after 29 ms.

Query executed against database: 'neo4j':  
    MATCH (n:Person)
    RETURN collect(n.name) as list_of_names;
    


{'list_of_names': ['Aaron Sorkin',
  'Al Pacino',
  'Angela Scope',
  'Annabella Sciorra',
  'Anthony Edwards',
  'Audrey Tautou',
  'Ben Miles',
  'Bill Paxton',
  'Bill Pullman',
  'Billy Crystal',
  'Bonnie Hunt',
  'Brooke Langton',
  'Bruno Kirby',
  'Cameron',
  'Cameron Crowe',
  'Carrie Fisher',
  'Carrie-Anne Moss',
  'Charlize Theron',
  'Chris Columbus',
  'Christian Bale',
  'Christina Ricci',
  'Christopher Guest',
  'Clint Eastwood',
  'Corey Feldman',
  'Cuba Gooding Jr.',
  'Danny DeVito',
  'Dave Chappelle',
  'David Mitchell',
  'David Morse',
  'Demi Moore',
  'Diane Keaton',
  'Dina Meyer',
  'Ed Harris',
  'Emil Eifrem',
  'Emile Hirsch',
  'Ethan Hawke',
  'Frank Darabont',
  'Frank Langella',
  'Gary Sinise',
  'Geena Davis',
  'Gene Hackman',
  'Greg Kinnear',
  'Halle Berry',
  'Helen Hunt',
  'Howard Deutch',
  'Hugo Weaving',
  'Ian McKellen',
  'Ice-T',
  'J.T. Walsh',
  'Jack Nicholson',
  'James Cromwell',
  'James L. Brooks',
  'James Marshall',
  'James 

## `UNWIND`

* `UNWIND` is executed over a list to convert the results back to row/column format.

In [23]:
# Unwind a collect aggregation
result = driver.execute_query(
    """ 
    MATCH (n:Person)
    WITH collect(n.name) as list_of_names
    UNWIND list_of_names as names
    RETURN names LIMIT 10;
    """,
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 10 records after 48 ms and completed after 52 ms.

Query executed against database: 'neo4j':  
    MATCH (n:Person)
    WITH collect(n.name) as list_of_names
    UNWIND list_of_names as names
    RETURN names LIMIT 10;
    


{'names': ['Aaron Sorkin',
  'Al Pacino',
  'Angela Scope',
  'Annabella Sciorra',
  'Anthony Edwards',
  'Audrey Tautou',
  'Ben Miles',
  'Bill Paxton',
  'Bill Pullman',
  'Billy Crystal']}

**NOTE:** This result looks the same as if `list_of_names` were returned. This is because of the way the `parse()` method works. To show the difference, let's output the results using `simple_parse()` instead.

In [26]:
# Unwind a collect aggregation
result = driver.execute_query(
    """ 
    MATCH (n:Person)
    WITH collect(n.name) as list_of_names
    UNWIND list_of_names as names
    RETURN names LIMIT 10;
    """,
    database_="neo4j"
)

data = Neo4jParser.simple_parse(result)
data

[{'names': 'Aaron Sorkin'},
 {'names': 'Al Pacino'},
 {'names': 'Angela Scope'},
 {'names': 'Annabella Sciorra'},
 {'names': 'Anthony Edwards'},
 {'names': 'Audrey Tautou'},
 {'names': 'Ben Miles'},
 {'names': 'Bill Paxton'},
 {'names': 'Bill Pullman'},
 {'names': 'Billy Crystal'}]

In [29]:
# Return a distinct result from a list
result = driver.execute_query(
    """ 
    WITH [1,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,1,2,3,4,4,5] as list
    UNWIND list as rows
    WITH DISTINCT rows
    RETURN collect(rows) as distinct_list;
    """,
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 1 records after 39 ms and completed after 40 ms.

Query executed against database: 'neo4j':  
    WITH [1,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,1,2,3,4,4,5] as list
    UNWIND list as rows
    WITH DISTINCT rows
    RETURN collect(rows) as distinct_list;
    


{'distinct_list': [1, 2, 3, 4, 5, 6]}

**NOTE:** We cannot return a distinct list directly, so the method is a bit more complicated. We have to convert to rows, use `DISTINCT` on rows, and then collect the final result.

In [33]:
# Combine two lists into one
result = driver.execute_query(
    """ 
    WITH [1,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,1,2,3,4,4,5] as a, [10, 11, 12, 10, 11, 12] as b
    WITH a + b as combined_list
    UNWIND combined_list as ab
    WITH DISTINCT ab
    RETURN collect(ab);
    """,
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 1 records after 37 ms and completed after 38 ms.

Query executed against database: 'neo4j':  
    WITH [1,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,1,2,3,4,4,5] as a, [10, 11, 12, 10, 11, 12] as b
    WITH a + b as combined_list
    UNWIND combined_list as ab
    WITH DISTINCT ab
    RETURN collect(ab);
    


{'collect(ab)': [1, 2, 3, 4, 5, 6, 10, 11, 12]}

In [39]:
# unwind a list of lists
result = driver.execute_query(
    """ 
    WITH [1,2,['a','b','c'],[3,[10,11,12,[69],'turkey'],3,4,5]] as L
    UNWIND L as L1
    UNWIND L1 as L2
    UNWIND L2 as L3
    UNWIND L3 as L4
    RETURN collect(L4);
    """,
    database_="neo4j"
)

data = Neo4jParser.parse(result, True, False)
data

Started streaming 1 records after 48 ms and completed after 49 ms.

Query executed against database: 'neo4j':  
    WITH [1,2,['a','b','c'],[3,[10,11,12,[69],'turkey'],3,4,5]] as L
    UNWIND L as L1
    UNWIND L1 as L2
    UNWIND L2 as L3
    UNWIND L3 as L4
    RETURN collect(L4);
    


{'collect(L4)': [1, 2, 'a', 'b', 'c', 3, 10, 11, 12, 69, 'turkey', 3, 4, 5]}