Skip to content

Commit

Permalink
Make sure results are not empty in the Cypher tutorial
Browse files Browse the repository at this point in the history
* Consistently use one-line headings for the GraphGist format.
* Add testing around headings.
* Add data so that no queries are performed on empty graphs.
* Add new example for WITH.
  • Loading branch information
nawroth committed Oct 2, 2015
1 parent 195baee commit 86ad848
Show file tree
Hide file tree
Showing 11 changed files with 110 additions and 64 deletions.
@@ -1,5 +1,4 @@
Importing CSV files with Cypher = Importing CSV files with Cypher
===============================


//file:movies.csv //file:movies.csv
//file:roles.csv //file:roles.csv
Expand Down
@@ -1,5 +1,20 @@
= How to Compose Large Statements = How to Compose Large Statements


Let's first get some data in to retrieve results from:

[source,cypher]
----
CREATE (matrix:Movie {title:"The Matrix",released:1997})
CREATE (cloudAtlas:Movie {title:"Cloud Atlas",released:2012})
CREATE (forrestGump:Movie {title:"Forrest Gump",released:1994})
CREATE (keanu:Person {name:"Keanu Reeves", born:1964})
CREATE (robert:Person {name:"Robert Zemeckis", born:1951})
CREATE (tom:Person {name:"Tom Hanks", born:1956})
CREATE (tom)-[:ACTED_IN {roles:["Forrest"]}]->(forrestGump)
CREATE (tom)-[:ACTED_IN {roles:['Zachry']} ]->(cloudAtlas)
CREATE (robert)-[:DIRECTED]->(forrestGump)
----

== Combine statements with UNION == Combine statements with UNION


A Cypher statement is usually quite compact. A Cypher statement is usually quite compact.
Expand All @@ -11,11 +26,11 @@ For instance if you want to list both actors and directors without using the alt


[source,cypher] [source,cypher]
---- ----
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) MATCH (actor:Person)-[r:ACTED_IN]->(movie:Movie)
RETURN p,type(r) as rel,m RETURN actor.name AS name, type(r) AS acted_in, movie.title AS title
UNION UNION
MATCH (p:Person)-[r:DIRECTED]->(m:Movie) MATCH (director:Person)-[r:DIRECTED]->(movie:Movie)
RETURN p,type(r) as rel,m RETURN director.name AS name, type(r) AS acted_in, movie.title AS title
---- ----


//table //table
Expand All @@ -31,14 +46,23 @@ You use the `WITH` clause to combine the individual parts and declare which data
`WITH` is very much like `RETURN` with the difference that it doesn't finish a query but prepares the input for the next part. `WITH` is very much like `RETURN` with the difference that it doesn't finish a query but prepares the input for the next part.
You can use the same expressions, aggregations, ordering and pagination as in the `RETURN` clause. You can use the same expressions, aggregations, ordering and pagination as in the `RETURN` clause.


The only difference is that you _have to_ alias all columns as they would otherwise not be accessible with an identifier. The only difference is that you _must_ alias all columns as they would otherwise not be accessible.
Every column that you don't declare in your `WITH` clause is not available in subsequent query parts. Only columns that you declare in your `WITH` clause is available in subsequent query parts.

See below for an example where we collect the movies someone appeared in, and then filter out those which appear in only one movie.

[source,cypher]
----
MATCH (person:Person)-[:ACTED_IN]->(m:Movie)
WITH person, count(*) as appearances, collect(m.title) as movies
WHERE appearances > 1
RETURN person.name, appearances, movies
----

//table


[TIP] [TIP]
If you want to filter by an aggregated value in SQL or simlilar languages you would have to use `HAVING`. If you want to filter by an aggregated value in SQL or similar languages you would have to use `HAVING`.
That's a single purpose clause for filtering aggregated information. That's a single purpose clause for filtering aggregated information.
In Cypher, `WHERE` can be used in both cases. In Cypher, `WHERE` can be used in both cases.


// example to go here


@@ -1,16 +1,43 @@
= Utilizing Data Structures = Utilizing Data Structures


//file:movies.csv
//file:roles.csv
//file:persons.csv
//file:movie_actor_roles.csv

Cypher can create and consume more complex data structures out of the box. Cypher can create and consume more complex data structures out of the box.
As already mentioned you can create literal lists (`[1,2,3]`) and maps (`{name: value}`) within a statement. As already mentioned you can create literal lists (`[1,2,3]`) and maps (`{name: value}`) within a statement.


There is a number of functions that work with lists, from simple ones like `length(list)` that returns the size of a list to There are a number of functions that work with lists.
They range from simple ones like `size(list)` that returns the size of a list to `reduce`, which runs an expression against the elements and accumulates the results.


// missing content here Let's first load a bit of data into the graph.
If you want more details on how the data is loaded, see <<cypher-intro-importing-csv>>.


[source,cypher] [source,cypher]
---- ----
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person) LOAD CSV WITH HEADERS FROM "movies.csv" AS line
RETURN m.title as movie, collect(a.name)[0..5] as five_of_cast CREATE (m:Movie {id:line.id,title:line.title, released:toInt(line.year)});
LOAD CSV WITH HEADERS FROM "persons.csv" AS line
MERGE (a:Person {id:line.id}) ON CREATE SET a.name=line.name;
LOAD CSV WITH HEADERS FROM "roles.csv" AS line
MATCH (m:Movie {id:line.movieId})
MATCH (a:Person {id:line.personId})
CREATE (a)-[:ACTED_IN {roles:[line.role]}]->(m);
LOAD CSV WITH HEADERS FROM "movie_actor_roles.csv" AS line FIELDTERMINATOR ";"
MERGE (m:Movie {title:line.title}) ON CREATE SET m.released = toInt(line.released)
MERGE (a:Person {name:line.actor}) ON CREATE SET a.born = toInt(line.born)
MERGE (a)-[:ACTED_IN {roles:split(line.characters,",") }]->(m)
----

Now, let's try out data structures.

To begin with, collect the names of the actors per movie, and return two of them:

[source,cypher]
----
MATCH (movie:Movie)<-[:ACTED_IN]-(actor:Person)
RETURN movie.title as movie, collect(actor.name)[0..2] as two_of_cast
---- ----


//table //table
Expand All @@ -26,9 +53,8 @@ There are list predicates to satisfy conditions for `all`, `any`, `none` and `si
[source,cypher] [source,cypher]
---- ----
MATCH path = (:Person)-->(:Movie)<--(:Person) MATCH path = (:Person)-->(:Movie)<--(:Person)
WHERE all(r in rels(path) WHERE type(r) = 'ACTED_IN') WHERE any(n in nodes(path) WHERE n.name = 'Michael Douglas')
AND any(n in nodes(path) WHERE n.name = 'Clint Eastwood') RETURN extract(n IN nodes(path)| coalesce(n.name, n.title))
RETURN path
---- ----


//table //table
Expand Down Expand Up @@ -58,45 +84,30 @@ In a graph-query you can filter or aggregate collected values instead or work on
---- ----
MATCH (m:Movie)<-[r:ACTED_IN]-(a:Person) MATCH (m:Movie)<-[r:ACTED_IN]-(a:Person)
WITH m.title as movie, collect({name: a.name, roles: r.roles}) as cast WITH m.title as movie, collect({name: a.name, roles: r.roles}) as cast
RETURN movie, extract(c2 IN filter(c1 IN cast WHERE c1.name =~ "T.*") | c2.roles ) RETURN movie, filter(actor IN cast WHERE actor.name STARTS WITH "M")
----

//table

Cypher offers to create and consume more complex data structures out of the box.
As already mentioned you can create literal lists (`[1,2,3]`) and maps (`{name: value}`) within your statement.

There is a number of functions to work with lists, from simple ones like `length(list)` that returns the size of a list to

[source,cypher]
----
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
RETURN m.title as movie, collect(a.name)[0..5] as five_of_cast
---- ----


//table //table


You can also access individual elements or slices of a list quickly with `list[1]` or `list[5..-5]`.
Other functions to access parts of a list are `head(list)`, `tail(list)` and `last(list)`.

== Unwind Lists == Unwind Lists


Sometimes you have collected information into a list, but want to use each element individually as a row. Sometimes you have collected information into a list, but want to use each element individually as a row.
For instance, you might want to further match patterns in the graph. For instance, you might want to further match patterns in the graph.
Or you passed in a collection of values but now want to create or match a node or relationship for each element. Or you passed in a collection of values but now want to create or match a node or relationship for each element.
Then you can use the `UNWIND` clause to unroll a list into a sequence of rows again. Then you can use the `UNWIND` clause to unroll a list into a sequence of rows again.


For instance, a query to find the top 5-co-actors and then follow their movies and again list the cast for each of those movies: For instance, a query to find the top 3 co-actors and then follow their movies and again list the cast for each of those movies:


[source,cypher] [source,cypher]
---- ----
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(colleague:Person) MATCH (actor:Person)-[:ACTED_IN]->(movie:Movie)<-[:ACTED_IN]-(colleague:Person)
WITH colleague, count(*) as frequency, collect(distinct m) as movies WHERE actor.name < colleague.name
WITH actor, colleague, count(*) AS frequency, collect(movie) AS movies
ORDER BY frequency DESC ORDER BY frequency DESC
LIMIT 5 LIMIT 3
UNWIND movies as m UNWIND movies AS m
MATCH (m)<-[:ACTED_IN]-(a) MATCH (m)<-[:ACTED_IN]-(a)
RETURN m.title as movie, collect(a.name) as cast RETURN m.title AS movie, collect(a.name) AS cast
---- ----


//table //table
Expand Down
Expand Up @@ -11,12 +11,13 @@ Naturally in most cases you wouldn't want to write or generate huge statements t


That process not only includes creating completely new data but also integrating with existing structures and updating your graph. That process not only includes creating completely new data but also integrating with existing structures and updating your graph.


[[cypher-intro-load-parameters]]
== Parameters == Parameters


In general we recommend passing in varying literal values from the outside as named parameters. In general we recommend passing in varying literal values from the outside as named parameters.
This allows Cypher to reuse existing execution plans for the statements. This allows Cypher to reuse existing execution plans for the statements.


Of course you can also pass in parameters for data to be imported. Of course you can also pass in parameters for data to be imported.
Those can be scalar values, maps, lists or even lists of maps. Those can be scalar values, maps, lists or even lists of maps.


In your Cypher statement you can then iterate over those values (e.g. with `UNWIND`) to create your graph structures. In your Cypher statement you can then iterate over those values (e.g. with `UNWIND`) to create your graph structures.
Expand All @@ -42,6 +43,7 @@ FOREACH (role IN movie.cast |
) )
---- ----


[[cypher-intro-importing-csv]]
== Importing CSV == Importing CSV


Cypher provides an elegant built-in way to import tabular CSV data into graph structures. Cypher provides an elegant built-in way to import tabular CSV data into graph structures.
Expand All @@ -59,7 +61,7 @@ include::../../graphgists/intro/movies.csv[]


[source,cypher] [source,cypher]
---- ----
LOAD CSV WITH HEADERS FROM "movies.csv" AS line LOAD CSV WITH HEADERS FROM "movies.csv" AS line
CREATE (m:Movie {id:line.id,title:line.title, released:toInt(line.year)}); CREATE (m:Movie {id:line.id,title:line.title, released:toInt(line.year)});
---- ----


Expand All @@ -71,7 +73,7 @@ include::../../graphgists/intro/persons.csv[]


[source,cypher] [source,cypher]
---- ----
LOAD CSV WITH HEADERS FROM "persons.csv" AS line LOAD CSV WITH HEADERS FROM "persons.csv" AS line
MERGE (a:Person {id:line.id}) ON CREATE SET a.name=line.name; MERGE (a:Person {id:line.id}) ON CREATE SET a.name=line.name;
---- ----


Expand All @@ -83,7 +85,7 @@ include::../../graphgists/intro/roles.csv[]


[source,cypher] [source,cypher]
---- ----
LOAD CSV WITH HEADERS FROM "roles.csv" AS line LOAD CSV WITH HEADERS FROM "roles.csv" AS line
MATCH (m:Movie {id:line.movieId}) MATCH (m:Movie {id:line.movieId})
MATCH (a:Person {id:line.personId}) MATCH (a:Person {id:line.personId})
CREATE (a)-[:ACTED_IN {roles:[line.role]}]->(m); CREATE (a)-[:ACTED_IN {roles:[line.role]}]->(m);
Expand Down
@@ -1,5 +1,4 @@
Uniqueness = Uniqueness
==========


While pattern matching, Neo4j makes sure to not include matches where the same graph relationship is found multiple times in a single pattern. While pattern matching, Neo4j makes sure to not include matches where the same graph relationship is found multiple times in a single pattern.
In most use cases, this is a sensible thing to do. In most use cases, this is a sensible thing to do.
Expand Down
8 changes: 4 additions & 4 deletions manual/cypher/cypher-docs/src/docs/intro/index.adoc
Expand Up @@ -42,10 +42,6 @@ include::../parsed-graphgists/intro/compose-statements.adoc[]


:leveloffset: 2 :leveloffset: 2


include::../parsed-graphgists/intro/data-structures.adoc[]

:leveloffset: 2

include::../parsed-graphgists/intro/labels.adoc[] include::../parsed-graphgists/intro/labels.adoc[]


//include::indexes-and-constraints.adoc[] //include::indexes-and-constraints.adoc[]
Expand All @@ -56,6 +52,10 @@ include::../parsed-graphgists/intro/loading-data.adoc[]


:leveloffset: 2 :leveloffset: 2


include::../parsed-graphgists/intro/data-structures.adoc[]

:leveloffset: 2

include::../parsed-graphgists/sql/cypher-vs-sql.asciidoc[] include::../parsed-graphgists/sql/cypher-vs-sql.asciidoc[]




Expand Up @@ -51,10 +51,9 @@ enum BlockType
boolean isA( List<String> block ) boolean isA( List<String> block )
{ {
int size = block.size(); int size = block.size();
return size > 0 && ( ( block.get( 0 ) return size > 0 &&
.startsWith( "=" ) && !block.get( 0 ) ( ( block.get( 0 ).startsWith( "=" )
.startsWith( "==" ) ) || size > 1 && block.get( 1 ) && !block.get( 0 ).startsWith( "==" )));
.startsWith( "=" ) );
} }


@Override @Override
Expand Down
Expand Up @@ -118,7 +118,7 @@ static List<Block> parseBlocks( String input )
String[] lines = input.split( EOL ); String[] lines = input.split( EOL );
if ( lines.length < 3 ) if ( lines.length < 3 )
{ {
throw new IllegalArgumentException( "To little content, only " throw new IllegalArgumentException( "Not enough content, only "
+ lines.length + " lines." ); + lines.length + " lines." );
} }
List<Block> blocks = new ArrayList<>(); List<Block> blocks = new ArrayList<>();
Expand Down
Expand Up @@ -117,13 +117,22 @@ public void titleWithCharsToIgnore()
} }


@Test @Test
public void twoLineTitle() public void ignore_second_level_heading()
{ {
Block block = Block.getBlock( Arrays.asList( "Title here", "==========" ) ); Block block = Block.getBlock( Arrays.asList( "== Title here" ) );
assertThat( block.type, sameInstance( BlockType.TITLE ) ); assertThat( block.type, sameInstance( BlockType.TEXT ) );
String output = block.process( state ); String output = block.process( state );
assertThat( output, containsString( "[[cypherdoc-title-here]]" ) ); assertThat( output, containsString( "== Title here" ) );
assertThat( output, containsString( "= Title here =" ) ); }

@Test
public void ignore_second_level_heading_with_id()
{
Block block = Block.getBlock( Arrays.asList( "[[my-id]]", "== Title here" ) );
assertThat( block.type, sameInstance( BlockType.TEXT ) );
String output = block.process( state );
assertThat( output, containsString( "[[my-id]]" ) );
assertThat( output, containsString( "== Title here" ) );
} }


@Test @Test
Expand Down
Expand Up @@ -54,11 +54,11 @@ public void fullDocumentBlockParsing() throws IOException
assertThat( types, equalTo( Arrays.asList( BlockType.TITLE, BlockType.TEXT, BlockType.HIDE, assertThat( types, equalTo( Arrays.asList( BlockType.TITLE, BlockType.TEXT, BlockType.HIDE,
BlockType.SETUP, BlockType.CYPHER, BlockType.QUERYTEST, BlockType.TABLE, BlockType.GRAPH, BlockType.TEXT, BlockType.SETUP, BlockType.CYPHER, BlockType.QUERYTEST, BlockType.TABLE, BlockType.GRAPH, BlockType.TEXT,
BlockType.OUTPUT, BlockType.PARAMETERS, BlockType.CYPHER, BlockType.QUERYTEST, BlockType.PROFILE, BlockType.OUTPUT, BlockType.PARAMETERS, BlockType.CYPHER, BlockType.QUERYTEST, BlockType.PROFILE,
BlockType.GRAPH_RESULT, BlockType.SQL, BlockType.SQL_TABLE ) ) ); BlockType.GRAPH_RESULT, BlockType.SQL, BlockType.SQL_TABLE, BlockType.TEXT ) ) );
} }


@Test @Test
public void toLittleContentBlockParsing() public void notEnoughContentBlockParsing()
{ {
expectedException.expect( IllegalArgumentException.class ); expectedException.expect( IllegalArgumentException.class );
CypherDoc.parseBlocks( "x\ny\n" ); CypherDoc.parseBlocks( "x\ny\n" );
Expand Down
Expand Up @@ -57,3 +57,6 @@ VALUES(0)


// sqltable // sqltable


[[my-id]]
== Second level heading

0 comments on commit 86ad848

Please sign in to comment.