# Lecture 7: CQL 2
Gittu George, January 30 2022

_Attribution: This notebook is developed using some materials provided by neo4j._

## Today's Agenda
- Working with patterns in queries
- Aggregation in cypher
- Controlling the query chain
- Controlling results returned

## Learning objectives
- Using CQL to query the graph database

[Check the ticks in this](https://canvas.ubc.ca/files/19165620/download?download_frd=1) cheat sheet to see all topics we will cover by the end of this lecture. You only need these to finish your assignment. 

## Working with Patterns in Queries

- Traversal in a MATCH clause

We already understand this query from the last class, but understanding how the graph engine performed this traversal can help write an efficient CQL query.

```sql
MATCH (follower:Person)-[:FOLLOWS]->(reviewer:Person)-[:REVIEWED]->(m:Movie)
WHERE m.title = 'The Replacements'
RETURN follower.name, reviewer.name
```

### Specifying multiple MATCH patterns

```sql
MATCH (a:Person)-[:ACTED_IN]->(m:Movie),
(m)<-[:DIRECTED]-(d:Person)
WHERE m.released = 2000
RETURN a.name, m.title, d.name
```

```{note}
When multiple patterns are specified in a MATCH clause, no relationship is traversed more than once.
```

We can also specify the above query using a single pattern.

```sql
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
WHERE m.released = 2000
RETURN a.name, m.title, d.name
```

- Why multiple match patterns? 

Even though both queries mentioned above mean the same, there are situations when you can't represent it in a single match pattern, especially when looking at complex queries. For e.g.:

Suppose we want to retrieve the movies that Meg Ryan acted in and their respective directors and the other actors that acted in these movies. Here is the query to do this:

```sql
MATCH (meg:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person),
(other:Person)-[:ACTED_IN]->(m)
WHERE meg.name = 'Meg Ryan'
RETURN m.title as movie, d.name AS director , other.name AS `co-actors`
```

```{note}
You can specify aliases or column headings using AS; you will be seeing it a lot in all upcoming queries.
```

```{tip}
It's a good practice to use AS wherever you can, as it can guide your thoughts and print results in a meaningful way.
```

### Multiple MATCH clauses,

```sql
MATCH (valKilmer:Person)-[:ACTED_IN]->(m:Movie)
MATCH (actor:Person)-[:ACTED_IN]->(m)
WHERE valKilmer.name = 'Val Kilmer'
RETURN m.title as movie , actor.name
```

But we can also write the same using multiple patterns,

```sql
MATCH (valKilmer:Person)-[:ACTED_IN]->(m:Movie),
(actor:Person)-[:ACTED_IN]->(m)
WHERE valKilmer.name = 'Val Kilmer'
RETURN m.title as movie , actor.name as `Actor name`
```

A best practice is to traverse as few nodes as possible so in this example, using multiple MATCH patterns is best.

```{important}
Usually, the multiple MATCH clauses are used when dealing with Subqueries using WITH. We will discuss it soon.
```

### Specifying varying length paths

Syntax: 

```sql
//Retrieve all paths of any length with the relationship, :RELTYPE from nodeA to nodeB and beyond
(nodeA)-[:RELTYPE*]->(nodeB) 
//Retrieve the paths of length 2 with the relationship, :RELTYPE from nodeA to nodeB:
(nodeA)-[:RELTYPE*2]->(nodeB)
// Retrieve the paths of length 3 with the relationship, :RELTYPE from nodeA to nodeB:
(nodeA)-[:RELTYPE*3]->(nodeB)
```

Experiment with the below query by varying depths, and see how it behaves.

```sql
MATCH (follower:Person)-[:FOLLOWS*3]-(p:Person)
WHERE follower.name = 'Paul Blythe'
RETURN p.name
```

Now what if we want to retrieve the paths of lengths 1, 2, or 3 with the relationship, :RELTYPE from nodeA to nodeB, nodeB to nodeC, as well as, nodeC to _nodeD) (up to three hops):

Syntax: 

```
(nodeA)-[:RELTYPE*1..3]->(nodeB)
```

```sql
MATCH (follower:Person)-[:FOLLOWS*1..3]-(p:Person)
WHERE follower.name = 'Paul Blythe'
RETURN p.name
```

### Finding the shortest path

This shortestPath() function is very useful in many cases where there are many ways to reach from node A to node B.

```sql
MATCH p = shortestPath((m1:Movie)-[*]-(m2:Movie))
WHERE m1.title = 'A Few Good Men' AND
m2.title = 'The Matrix'
RETURN p
```

```{note}
Here we specify * for the relationship. This means to use any relationship for the traversal.
```

## Aggregation in Cypher

When you return results as values, Cypher automatically returns the values grouped by a common value.

```sql
MATCH (p:Person)-[:REVIEWED]->(m:Movie)
RETURN p.name, m.title
```

Aggregation in Cypher is different from aggregation in SQL. In Cypher, you need not specify a grouping key. As soon as an aggregation function is used, all non-aggregated result columns become grouping keys. The grouping is implicitly done, based upon the fields in the RETURN clause.

- count()

count() is a common way to aggregate your data. You can use count() to perform a 

- count of nodes
- relationships 
- paths
- rows during query processing

```sql
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
RETURN a.name, d.name, count(m)
```

The query engine processed all nodes and relationships in the pattern so that it could perform a count of all movies for a particular actor/director pair in the graph. Then the results returned grouped the results by the name of the director.

There are many places we can use count; check out this [document](https://neo4j.com/docs/cypher-manual/current/functions/aggregating/#functions-count)

More aggregating functions such as avg(),stDev(), min() or max(), sum(). For entire list check in [cypher manual.](https://neo4j.com/docs/cypher-manual/current/functions/aggregating/)

## Controlling the Query Chain

- Intermediate processing using WITH

```sql
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN a.name, count(a) AS numMovies
```

During the execution of a MATCH clause, you can specify that you want some intermediate calculations or values that will be used for further processing of the query. You use the WITH clause to perform intermediate processing that is not possible in a RETURN clause.

Example: Using WITH

```sql
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, count(a) AS numMovies
WHERE 1 < numMovies < 4
RETURN a.name, numMovies
```

In the above example, we start the query processing by retrieving all actors and their movie count. During the query processing, we only want to return actors with 2 or 3 movies. All other actors and the aggregated results are filtered out. Here WITH clause does the counting and collecting, and the intermediate result is used in the subsequent WHERE clause to test.

- Subqueries with WITH 

Here is an example where we retrieve all movies reviewed by a person. For a particular movie found, we want the list of directors of the movie, so we do a second query, a subquery as follows:

```sql
MATCH (m:Movie)<-[rv:REVIEWED]-(r:Person)
WITH m, rv, r
MATCH (m)<-[:DIRECTED]-(d:Person)
RETURN m.title, rv.rating, r.name, d.name
```

For the second MATCH clause, we use the found movie nodes, m. The RETURN clause has access to the movie, rating by that reviewer, the name of the reviewer, and the collection of director names for that movie.

## Controlling Results Returned

What have you learned until now?

- How to query the database using both simple and complex patterns
- How to control query processing by chaining queries using WITH. 

Now we will focus on controlling how results are processed in the RETURN and WITH clauses.

### Dealing with duplicates

- Duplicate results

```sql
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN m.title, m.released
```

[DISTINCT](https://neo4j.com/docs/cypher-manual/current/clauses/return/#return-unique-results) can be used to deal with duplicates. 

```sql
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN DISTINCT m.title, m.released
```

Using DISTINCT in the RETURN clause here means that rows with identical values will not be returned.

Another way to avoid duplicates is by using WITH and DISTINCT together.

```sql
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
WITH DISTINCT m
RETURN m.released, m.title
```

### Ordering results

If you want the results to be sorted, specify the expression to use for the sort using the ORDER BY keyword and whether you want the order to be descending using the DESC keyword. Ascending order is the default.

```sql
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks' OR p.name = 'Keanu Reeves'
RETURN DISTINCT m.title, m.released ORDER BY m.released DESC
```

- Ordering multiple results

```sql
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks' OR p.name = 'Keanu Reeves'
RETURN DISTINCT m.title, m.released ORDER BY m.released DESC , m.title
```

### Limiting the number of results

```sql
MATCH (p:Person)-->(m:Movie)
RETURN m.title as title, m.released as year ORDER BY m.released DESC LIMIT 10
```

```sql
MATCH (actor:Person)-[r:ACTED_IN]-(m:Movie) 
RETURN actor.name AS actor, COUNT(r) AS movienumber ORDER BY movienumber DESC LIMIT 5;
```

- Limiting the number of intermediate results

```{tip}
Using ORDER BY and LIMIT together can be helpful in many scenarios, just like how you did it in your SQL assignments.
```

```sql
MATCH (p:Person)-->(m:Movie)
WITH DISTINCT m.title AS title, m.released AS year ORDER BY m.released DESC LIMIT 1
MATCH (pp:Person)-->(mm:Movie)
WHERE mm.title=title
RETURN pp.name,title,year
```

```{important}
Expression in WITH must be aliased (use AS). As we discussed before whatever we give in WITH will be available for subquery. And if you are using a property within a node then you must use alias to refer to that in your subquery.
```

If within WITH you are using entire node or relationship then you don't want to alias (but its okay if you want to use)

```sql
MATCH (m:Movie)<-[rv:REVIEWED]-(r:Person)
WITH m, rv, r
MATCH (m)<-[:DIRECTED]-(d:Person)
RETURN m.title, rv.rating, r.name, d.name
```

```sql
MATCH (m:Movie)<-[rv:REVIEWED]-(r:Person)
WITH m AS mov, rv AS movierel, r AS person
MATCH (mov)<-[:DIRECTED]-(d:Person)
RETURN mov.title, movierel.rating, person.name, d.name
```

## Can you ?

- Specify multiple MATCH patterns.
- Specify multiple MATCH clauses.
- Specify varying length paths.
- Perform intermediate processing with WITH.
- Perform subqueries with WITH.
- Count results returned.
- Eliminate duplication in results returned.
- Order results returned.
- Limit the number of results returned.

## Class activity

- Practice CQL.

## Iclicker 1

***Question 1:*** Working with patterns in queries. Before the aggregation part as this question 

Given this Cypher query:

```
MATCH (follower:Person)-[:FOLLOWS]->(reviewer:Person)-[:REVIEWED]->(m:Movie)
WHERE m.title = 'The Replacements' RETURN follower.name, reviewer.name
```

What is the first node that is retrieved by the query engine?

Select the correct answer.

A) The first Person node with a FOLLOWS relationship

B) The first Person node with a REVIEWED relationship

C) The Movie node for the movie, The Replacements - 

D) The first Movie node in the alphabetical list of movies in the graph

```{toggle}

***Answer: B***

```

***Question 2:***

CONTEXT : We want a query that returns a list of people who acted in movies released later than 2005 and for those movies, also return title and released year of the movie, as well as the name of the writer. 

How can you correct this query?

```
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
(m)<-[:WROTE]-(w:Person)
WHERE m.released > 2005
RETURN a.name, m.title, m.released, w.name
```

Select the correct answer.

A) The second line should be: (m2:Movie)←[:WROTE]-(w:Person).

B) Add a comma after the first pattern in the MATCH clause. - 

C) The second line should be: (m2:Movie)←[:WROTE]-(a).

D) Add a MATCH clause at the beginning of the second line. -


```{toggle}

***Answer: B***

```

***Question 3:*** Suppose you have a graph of Person nodes representing a social network graph. A Person node can have a IS_FRIENDS_WITH relationship with any other Person node. Like in Facebook, there can be a long path of connections between people. 

What Cypher MATCH clause would you use to find all people in this graph that are two to four hops away from each other?

Select the correct answer.

A) MATCH (p:Person)-[:IS_FRIENDS_WITH*2..4]→(p2.Person) -

B) MATCH (p:Person)-[:IS_FRIENDS_WITH*2-4]→(p2.Person)

C) MATCH (p:Person)-[:IS_FRIENDS_WITH,2-4]→(p2.Person)

D) MATCH (p:Person)-[:IS_FRIENDS_WITH,2,4]→(p2.Person)

```{toggle}

***Answer: A***

```

***Question 4:***

(using WITH)
Given this code snippet, what variables can you use in the RETURN clause?

```
MATCH (a:Person)-[r:ACTED_IN]->(m:Movie)
WITH a, count(a) AS numMovies
WHERE 1 < numMovies < 4
RETURN ??
```

Select the correct answers.

A) a

B) r

C) m

D) numMovies

```{toggle}

***Answer: A, D***

```
 
 At the end

***Question 5:*** This code returns the titles of all movies that have been reviewed. Multiple people can review a movie. How can you change this code so that a movie title will only be returned once?

```
MATCH ()-[:REVIEWED]->(m:Movie)
RETURN m.title
```

Select the correct answers.

A) MATCH ()-[:REVIEWED]→(m:Movie) RETURN DISTINCT m.title

B) MATCH ()-[:REVIEWED]→(m:Movie) RETURN UNIQUE m.title

C) MATCH ()-[:REVIEWED]→(m:Movie) WITH DISTINCT m RETURN m.title

D) MATCH ()-[:REVIEWED]→(m:Movie) WITH UNIQUE m RETURN m.title

```{toggle}

***Answer: A, C***

```

***Question 6:***
How many property values can you order in the returned result?

Select the correct answer.

A) One

B) As many as you need to

C) Two

D) Three

```{toggle}

***Answer: B***

```

***Question 7:***
We want to retrieve the names of the five oldest actors in our dataset. What code will do this?
Select the correct answers.

A) MATCH (p:Person)-[:ACTED_IN]→() WITH p LIMIT 5 RETURN DISTINCT p.name, p.born ORDER BY p.born

B) MATCH (p:Person) WITH p LIMIT 5 RETURN DISTINCT p.name, p.born ORDER BY p.born

C) MATCH (p:Person)-[:ACTED_IN]→() RETURN DISTINCT p.name, p.born ORDER BY p.born LIMIT 5 -

D) MATCH (p:Person) RETURN DISTINCT p.name, p.born ORDER BY p.born LIMIT 5 -

```{toggle}

***Answer: C, D***

```