Skip to content


Subversion checkout URL

You can clone with
Download ZIP


Cypher: limit, order by performance #1919

SaboteurSpk opened this Issue · 3 comments

2 participants


Using neo 2.0.1, the query:

match (n:States)
where =~ '.*rul.*'
return order by desc
limit 200

returns result after 14 secs. Number of n:States nodes is about 450 000, number of nodes fullfilling the condition is about 130 000 and count with the same condition returns in 1 sec. There is index on :States(name).

Profiling returns

ColumnFilter(symKeys=["n", ""], returnItemNames=[""], _rows=200, _db_hits=0)
Top(orderBy=["SortItem(Cached( of type Any),false)"], limit="Literal(200)", _rows=200, _db_hits=0)
Extract(symKeys=["n"], exprKeys=[""], _rows=131072, _db_hits=131072)
Filter(pred="LiteralRegularExpression(Property(n,name(1)),Literal(.*rul.*))", _rows=131072, _db_hits=458752)
       NodeByLabel(label="States", identifier="n", _rows=458752, _db_hits=0)

Now, this type of match is very common to all applications - table with filter and ordering, so performance is important.

This query probably doesn't use the index (because of the condition) but it should because of the order by.
When running this type of query in MSSQL, server works as follows:

Using index scan in descending order goes over every value and evaluates the condition. When number of items fulfilling the condition reaches 200 the work is done and result is returned. In many cases this happens in very short time (enough of items fulfilling the condition is found at the end of indexed items)

In Neo4j with cypher, server probably sorts all items fulfilling the condition and then returns the 200 from the begining.

Is there some plan to optimize this type of queries?

Thanx and sorry, I'm new to Neo and GitHub :)


What you are asking for is unfortunately no graph query, it's a fulltext search query.

Currently Neo4j's schema indexes are pure exact lookup indexes, so no luck there.
What is done here is pulling all data with that label through the comparison and then sorting all names into that 200 element window to be returned.

Do you run this query on cold caches or is the 14s after the second run? Which is the more realistic value (if you have enough RAM available for caching).

If you really need to do this now you have to resort to legacy indexes and the node_auto_index for sub-pattern matching:

start n=node:node_auto_index("name:*rul*") return n

Fulltext and other special indexes will be added to Neo4j's schema index approach in a future version.


Thank You for quick response.
Yes, it's not graph query, but having possibility to use those types of queries can be very useful - for example list of starting points for real graph search, when you know just name of the starting node or nodes.

I tried to use legacy index, but it didn't help

start n=node:node_auto_index("name:*urra*")
return order by desc
limit 200

Profiler shows, that it used the auto index

ColumnFilter(symKeys=["n", ""], returnItemNames=[""], _rows=200, _db_hits=0) Top(orderBy=["SortItem(Cached( of type Any),false)"], limit="Literal(200)", _rows=200, _db_hits=0)
Extract(symKeys=["n"], exprKeys=[""], _rows=65536, _db_hits=65536)
NodeByIndexQuery(identifier="n", _db_hits=65536, _rows=65536, query="Literal(name:urra)", identifiers=["n"], idxName="node_auto_index", producer="NodeByIndexQuery")

The problem is not the usage of index, but the order by. It should be clever enough to do the same as SQL Server - start the search in ordered way using the index and stop after gathering enough nodes for result.
It could speed up also many graph search queries, that results are ordered and using limit.
I hope that many applications can benefit from this feature and spread the "Graph is everywhere" idea :)
But I also understand that for real graph targeted application this is not a major feature.


Just an offtopic
I have deleted all nodes using
(500 000 of nodes)
match(n) return n;
on empty DB took about 4 secs even after restart of db and multiple runs.
Did I do anything wrong? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.