Skip to content

Commit

Permalink
Make sure results are not empty in the Cypher tutorial
Browse files Browse the repository at this point in the history
* Consistently use one-line headings for the GraphGist format.
* Add testing around headings.
* Add data so that no queries are performed on empty graphs.
* Add new example for WITH.
  • Loading branch information
nawroth committed Oct 2, 2015
1 parent 195baee commit 86ad848
Show file tree
Hide file tree
Showing 11 changed files with 110 additions and 64 deletions.
@@ -1,5 +1,4 @@
Importing CSV files with Cypher
===============================
= Importing CSV files with Cypher

//file:movies.csv
//file:roles.csv
Expand Down
@@ -1,5 +1,20 @@
= How to Compose Large Statements

Let's first get some data in to retrieve results from:

[source,cypher]
----
CREATE (matrix:Movie {title:"The Matrix",released:1997})
CREATE (cloudAtlas:Movie {title:"Cloud Atlas",released:2012})
CREATE (forrestGump:Movie {title:"Forrest Gump",released:1994})
CREATE (keanu:Person {name:"Keanu Reeves", born:1964})
CREATE (robert:Person {name:"Robert Zemeckis", born:1951})
CREATE (tom:Person {name:"Tom Hanks", born:1956})
CREATE (tom)-[:ACTED_IN {roles:["Forrest"]}]->(forrestGump)
CREATE (tom)-[:ACTED_IN {roles:['Zachry']} ]->(cloudAtlas)
CREATE (robert)-[:DIRECTED]->(forrestGump)
----

== Combine statements with UNION

A Cypher statement is usually quite compact.
Expand All @@ -11,11 +26,11 @@ For instance if you want to list both actors and directors without using the alt

[source,cypher]
----
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p,type(r) as rel,m
MATCH (actor:Person)-[r:ACTED_IN]->(movie:Movie)
RETURN actor.name AS name, type(r) AS acted_in, movie.title AS title
UNION
MATCH (p:Person)-[r:DIRECTED]->(m:Movie)
RETURN p,type(r) as rel,m
MATCH (director:Person)-[r:DIRECTED]->(movie:Movie)
RETURN director.name AS name, type(r) AS acted_in, movie.title AS title
----

//table
Expand All @@ -31,14 +46,23 @@ You use the `WITH` clause to combine the individual parts and declare which data
`WITH` is very much like `RETURN` with the difference that it doesn't finish a query but prepares the input for the next part.
You can use the same expressions, aggregations, ordering and pagination as in the `RETURN` clause.

The only difference is that you _have to_ alias all columns as they would otherwise not be accessible with an identifier.
Every column that you don't declare in your `WITH` clause is not available in subsequent query parts.
The only difference is that you _must_ alias all columns as they would otherwise not be accessible.
Only columns that you declare in your `WITH` clause is available in subsequent query parts.

See below for an example where we collect the movies someone appeared in, and then filter out those which appear in only one movie.

[source,cypher]
----
MATCH (person:Person)-[:ACTED_IN]->(m:Movie)
WITH person, count(*) as appearances, collect(m.title) as movies
WHERE appearances > 1
RETURN person.name, appearances, movies
----

//table

[TIP]
If you want to filter by an aggregated value in SQL or simlilar languages you would have to use `HAVING`.
If you want to filter by an aggregated value in SQL or similar languages you would have to use `HAVING`.
That's a single purpose clause for filtering aggregated information.
In Cypher, `WHERE` can be used in both cases.

// example to go here


@@ -1,16 +1,43 @@
= Utilizing Data Structures

//file:movies.csv
//file:roles.csv
//file:persons.csv
//file:movie_actor_roles.csv

Cypher can create and consume more complex data structures out of the box.
As already mentioned you can create literal lists (`[1,2,3]`) and maps (`{name: value}`) within a statement.

There is a number of functions that work with lists, from simple ones like `length(list)` that returns the size of a list to
There are a number of functions that work with lists.
They range from simple ones like `size(list)` that returns the size of a list to `reduce`, which runs an expression against the elements and accumulates the results.

// missing content here
Let's first load a bit of data into the graph.
If you want more details on how the data is loaded, see <<cypher-intro-importing-csv>>.

[source,cypher]
----
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
RETURN m.title as movie, collect(a.name)[0..5] as five_of_cast
LOAD CSV WITH HEADERS FROM "movies.csv" AS line
CREATE (m:Movie {id:line.id,title:line.title, released:toInt(line.year)});
LOAD CSV WITH HEADERS FROM "persons.csv" AS line
MERGE (a:Person {id:line.id}) ON CREATE SET a.name=line.name;
LOAD CSV WITH HEADERS FROM "roles.csv" AS line
MATCH (m:Movie {id:line.movieId})
MATCH (a:Person {id:line.personId})
CREATE (a)-[:ACTED_IN {roles:[line.role]}]->(m);
LOAD CSV WITH HEADERS FROM "movie_actor_roles.csv" AS line FIELDTERMINATOR ";"
MERGE (m:Movie {title:line.title}) ON CREATE SET m.released = toInt(line.released)
MERGE (a:Person {name:line.actor}) ON CREATE SET a.born = toInt(line.born)
MERGE (a)-[:ACTED_IN {roles:split(line.characters,",") }]->(m)
----

Now, let's try out data structures.

To begin with, collect the names of the actors per movie, and return two of them:

[source,cypher]
----
MATCH (movie:Movie)<-[:ACTED_IN]-(actor:Person)
RETURN movie.title as movie, collect(actor.name)[0..2] as two_of_cast
----

//table
Expand All @@ -26,9 +53,8 @@ There are list predicates to satisfy conditions for `all`, `any`, `none` and `si
[source,cypher]
----
MATCH path = (:Person)-->(:Movie)<--(:Person)
WHERE all(r in rels(path) WHERE type(r) = 'ACTED_IN')
AND any(n in nodes(path) WHERE n.name = 'Clint Eastwood')
RETURN path
WHERE any(n in nodes(path) WHERE n.name = 'Michael Douglas')
RETURN extract(n IN nodes(path)| coalesce(n.name, n.title))
----

//table
Expand Down Expand Up @@ -58,45 +84,30 @@ In a graph-query you can filter or aggregate collected values instead or work on
----
MATCH (m:Movie)<-[r:ACTED_IN]-(a:Person)
WITH m.title as movie, collect({name: a.name, roles: r.roles}) as cast
RETURN movie, extract(c2 IN filter(c1 IN cast WHERE c1.name =~ "T.*") | c2.roles )
----

//table

Cypher offers to create and consume more complex data structures out of the box.
As already mentioned you can create literal lists (`[1,2,3]`) and maps (`{name: value}`) within your statement.

There is a number of functions to work with lists, from simple ones like `length(list)` that returns the size of a list to

[source,cypher]
----
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
RETURN m.title as movie, collect(a.name)[0..5] as five_of_cast
RETURN movie, filter(actor IN cast WHERE actor.name STARTS WITH "M")
----

//table

You can also access individual elements or slices of a list quickly with `list[1]` or `list[5..-5]`.
Other functions to access parts of a list are `head(list)`, `tail(list)` and `last(list)`.

== Unwind Lists

Sometimes you have collected information into a list, but want to use each element individually as a row.
For instance, you might want to further match patterns in the graph.
Or you passed in a collection of values but now want to create or match a node or relationship for each element.
Then you can use the `UNWIND` clause to unroll a list into a sequence of rows again.

For instance, a query to find the top 5-co-actors and then follow their movies and again list the cast for each of those movies:
For instance, a query to find the top 3 co-actors and then follow their movies and again list the cast for each of those movies:

[source,cypher]
----
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(colleague:Person)
WITH colleague, count(*) as frequency, collect(distinct m) as movies
MATCH (actor:Person)-[:ACTED_IN]->(movie:Movie)<-[:ACTED_IN]-(colleague:Person)
WHERE actor.name < colleague.name
WITH actor, colleague, count(*) AS frequency, collect(movie) AS movies
ORDER BY frequency DESC
LIMIT 5
UNWIND movies as m
LIMIT 3
UNWIND movies AS m
MATCH (m)<-[:ACTED_IN]-(a)
RETURN m.title as movie, collect(a.name) as cast
RETURN m.title AS movie, collect(a.name) AS cast
----

//table
Expand Down
Expand Up @@ -11,12 +11,13 @@ Naturally in most cases you wouldn't want to write or generate huge statements t

That process not only includes creating completely new data but also integrating with existing structures and updating your graph.

[[cypher-intro-load-parameters]]
== Parameters

In general we recommend passing in varying literal values from the outside as named parameters.
This allows Cypher to reuse existing execution plans for the statements.

Of course you can also pass in parameters for data to be imported.
Of course you can also pass in parameters for data to be imported.
Those can be scalar values, maps, lists or even lists of maps.

In your Cypher statement you can then iterate over those values (e.g. with `UNWIND`) to create your graph structures.
Expand All @@ -42,6 +43,7 @@ FOREACH (role IN movie.cast |
)
----

[[cypher-intro-importing-csv]]
== Importing CSV

Cypher provides an elegant built-in way to import tabular CSV data into graph structures.
Expand All @@ -59,7 +61,7 @@ include::../../graphgists/intro/movies.csv[]

[source,cypher]
----
LOAD CSV WITH HEADERS FROM "movies.csv" AS line
LOAD CSV WITH HEADERS FROM "movies.csv" AS line
CREATE (m:Movie {id:line.id,title:line.title, released:toInt(line.year)});
----

Expand All @@ -71,7 +73,7 @@ include::../../graphgists/intro/persons.csv[]

[source,cypher]
----
LOAD CSV WITH HEADERS FROM "persons.csv" AS line
LOAD CSV WITH HEADERS FROM "persons.csv" AS line
MERGE (a:Person {id:line.id}) ON CREATE SET a.name=line.name;
----

Expand All @@ -83,7 +85,7 @@ include::../../graphgists/intro/roles.csv[]

[source,cypher]
----
LOAD CSV WITH HEADERS FROM "roles.csv" AS line
LOAD CSV WITH HEADERS FROM "roles.csv" AS line
MATCH (m:Movie {id:line.movieId})
MATCH (a:Person {id:line.personId})
CREATE (a)-[:ACTED_IN {roles:[line.role]}]->(m);
Expand Down
@@ -1,5 +1,4 @@
Uniqueness
==========
= Uniqueness

While pattern matching, Neo4j makes sure to not include matches where the same graph relationship is found multiple times in a single pattern.
In most use cases, this is a sensible thing to do.
Expand Down
8 changes: 4 additions & 4 deletions manual/cypher/cypher-docs/src/docs/intro/index.adoc
Expand Up @@ -42,10 +42,6 @@ include::../parsed-graphgists/intro/compose-statements.adoc[]

:leveloffset: 2

include::../parsed-graphgists/intro/data-structures.adoc[]

:leveloffset: 2

include::../parsed-graphgists/intro/labels.adoc[]

//include::indexes-and-constraints.adoc[]
Expand All @@ -56,6 +52,10 @@ include::../parsed-graphgists/intro/loading-data.adoc[]

:leveloffset: 2

include::../parsed-graphgists/intro/data-structures.adoc[]

:leveloffset: 2

include::../parsed-graphgists/sql/cypher-vs-sql.asciidoc[]


Expand Up @@ -51,10 +51,9 @@ enum BlockType
boolean isA( List<String> block )
{
int size = block.size();
return size > 0 && ( ( block.get( 0 )
.startsWith( "=" ) && !block.get( 0 )
.startsWith( "==" ) ) || size > 1 && block.get( 1 )
.startsWith( "=" ) );
return size > 0 &&
( ( block.get( 0 ).startsWith( "=" )
&& !block.get( 0 ).startsWith( "==" )));
}

@Override
Expand Down
Expand Up @@ -118,7 +118,7 @@ static List<Block> parseBlocks( String input )
String[] lines = input.split( EOL );
if ( lines.length < 3 )
{
throw new IllegalArgumentException( "To little content, only "
throw new IllegalArgumentException( "Not enough content, only "
+ lines.length + " lines." );
}
List<Block> blocks = new ArrayList<>();
Expand Down
Expand Up @@ -117,13 +117,22 @@ public void titleWithCharsToIgnore()
}

@Test
public void twoLineTitle()
public void ignore_second_level_heading()
{
Block block = Block.getBlock( Arrays.asList( "Title here", "==========" ) );
assertThat( block.type, sameInstance( BlockType.TITLE ) );
Block block = Block.getBlock( Arrays.asList( "== Title here" ) );
assertThat( block.type, sameInstance( BlockType.TEXT ) );
String output = block.process( state );
assertThat( output, containsString( "[[cypherdoc-title-here]]" ) );
assertThat( output, containsString( "= Title here =" ) );
assertThat( output, containsString( "== Title here" ) );
}

@Test
public void ignore_second_level_heading_with_id()
{
Block block = Block.getBlock( Arrays.asList( "[[my-id]]", "== Title here" ) );
assertThat( block.type, sameInstance( BlockType.TEXT ) );
String output = block.process( state );
assertThat( output, containsString( "[[my-id]]" ) );
assertThat( output, containsString( "== Title here" ) );
}

@Test
Expand Down
Expand Up @@ -54,11 +54,11 @@ public void fullDocumentBlockParsing() throws IOException
assertThat( types, equalTo( Arrays.asList( BlockType.TITLE, BlockType.TEXT, BlockType.HIDE,
BlockType.SETUP, BlockType.CYPHER, BlockType.QUERYTEST, BlockType.TABLE, BlockType.GRAPH, BlockType.TEXT,
BlockType.OUTPUT, BlockType.PARAMETERS, BlockType.CYPHER, BlockType.QUERYTEST, BlockType.PROFILE,
BlockType.GRAPH_RESULT, BlockType.SQL, BlockType.SQL_TABLE ) ) );
BlockType.GRAPH_RESULT, BlockType.SQL, BlockType.SQL_TABLE, BlockType.TEXT ) ) );
}

@Test
public void toLittleContentBlockParsing()
public void notEnoughContentBlockParsing()
{
expectedException.expect( IllegalArgumentException.class );
CypherDoc.parseBlocks( "x\ny\n" );
Expand Down
Expand Up @@ -57,3 +57,6 @@ VALUES(0)

// sqltable

[[my-id]]
== Second level heading

0 comments on commit 86ad848

Please sign in to comment.