Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG - Large Qty of Filters on Cross Join #7415

Closed
davidhbigelow opened this issue Jun 19, 2016 · 10 comments
Closed

BUG - Large Qty of Filters on Cross Join #7415

davidhbigelow opened this issue Jun 19, 2016 · 10 comments

Comments

@davidhbigelow
Copy link

Guidelines

  • Neo4j version: 2.3.5 / 3.0
  • Operating system: OSX + Linux
  • API/Driver: Cypher / Java

Steps to Reproduce

Download the test database:
problem.db.zip

NOTE: There are only 252 nodes in this database, I deleted everything else to narrow down the problem, so that is why you see all the indexes in the database. We do not feel it is necessary to index EVERY property for every type of node in our system -- especially since this is so LITTLE data to process for the filter operation on the cross-join to get to the results. (I have head a TON of indexes on everything is a terrible idea for neo4j)

The following has been confirmed on multiple systems, by multiple developers across multiple versions of neo4j.

First Test (THIS WILL WORK)

match (chord:group{className:"Wicks"})
where chord.propertyString="length_in|model"
and chord.all_configs=true 


match (jar:group{className:"Jars"}) 
where jar.propertyString="diameter_in|height_in" 
and jar.all_configs=true  

match (wax:group{className:"Wax"}) 
where wax.propertyString="material" 
and wax.all_configs=true  
with chord,jar,wax 

where (wax.material <> 'Soy') 
and (
    ((jar.height_in - 0.5 <= chord.length_in)  and   (chord.length_in <= jar.height_in + 0.5))
or   (chord.length_in > jar.height_in))
and (
(chord.model in ['CD-4'] and (1.25 <= jar.diameter_in and jar.diameter_in <= 1.75)) 
or (chord.model in ['CD-5'] and (1.75 <= jar.diameter_in and jar.diameter_in <= 2.0)) 
or (chord.model in ['CD-6'] and (2.0 <= jar.diameter_in and jar.diameter_in <= 2.25)) 
)

return chord, jar, wax limit 10;

Second Test (THIS WILL FAIL!!!)

match (chord:group{className:"Wicks"})
where chord.propertyString="length_in|model"
and chord.all_configs=true 

match (jar:group{className:"Jars"}) 
where jar.propertyString="diameter_in|height_in" 
and jar.all_configs=true  

match (wax:group{className:"Wax"}) 
where wax.propertyString="material" 
and wax.all_configs=true  
with chord,jar,wax 

where (wax.material <> 'Soy') 
and (
    ((jar.height_in - 0.5 <= chord.length_in)  and   (chord.length_in <= jar.height_in + 0.5))
or   (chord.length_in > jar.height_in))
and (
(chord.model in ['CD-4'] and (1.25 <= jar.diameter_in and jar.diameter_in <= 1.75)) 
or (chord.model in ['CD-5'] and (1.75 <= jar.diameter_in and jar.diameter_in <= 2.0)) 
or (chord.model in ['CD-6'] and (2.0 <= jar.diameter_in and jar.diameter_in <= 2.25)) 
or (chord.model in ['44-24-18'] and (2.0 <= jar.diameter_in and jar.diameter_in <= 2.5))  
or (chord.model in ['ECO-1'] and (1.25 <= jar.diameter_in and jar.diameter_in <= 1.5))  
or (chord.model in ['ECO-2'] and (1.5 <= jar.diameter_in and jar.diameter_in <= 2.0)) 
or (chord.model in ['ECO-4'] and (2.0 <= jar.diameter_in and jar.diameter_in <= 2.5))  
or (chord.model in ['HTP-31'] and (1.5 <= jar.diameter_in and jar.diameter_in <= 2.0)) 
or (chord.model in ['HTP-41'] and (2.0 <= jar.diameter_in and jar.diameter_in <= 2.25)) 
or (chord.model in ['HTP-52'] and (2.25 <= jar.diameter_in and jar.diameter_in <= 2.5)) 
or (chord.model in ['LX-8'] and (1.25 <= jar.diameter_in and jar.diameter_in <= 1.5))  
or (chord.model in ['LX-10'] and (1.5 <= jar.diameter_in and jar.diameter_in <= 2.0))  
or (chord.model in ['LX-12'] and (2.0 <= jar.diameter_in and jar.diameter_in <= 2.25))
)

return chord, jar, wax limit 10;

Expected behavior

Awesome performance -- especially for such a small amount of data.

Actual behavior

locks up - throws CPU to 100-600% and oscillates between them... forever... console crashes (thinks neo4j was disconnected). Rarely does it come back with data - maybe after 20 minutes. which is crazy for such a small database and set of relationships.

@spacecowboy
Copy link
Contributor

spacecowboy commented Jun 20, 2016

Just to clarify the difference between the two queries (a lot more conditions):

 match (chord:group{className:"Wicks"})
 where chord.propertyString="length_in|model"
 and chord.all_configs=true 

 match (jar:group{className:"Jars"}) 
 where jar.propertyString="diameter_in|height_in" 
 and jar.all_configs=true  

 match (wax:group{className:"Wax"}) 
 where wax.propertyString="material" 
 and wax.all_configs=true  
 with chord,jar,wax 

 where (wax.material <> 'Soy') 
 and (
     ((jar.height_in - 0.5 <= chord.length_in)  and   (chord.length_in <= jar.height_in + 0.5))
 or   (chord.length_in > jar.height_in))
 and (
 (chord.model in ['CD-4'] and (1.25 <= jar.diameter_in and jar.diameter_in <= 1.75)) 
 or (chord.model in ['CD-5'] and (1.75 <= jar.diameter_in and jar.diameter_in <= 2.0)) 
 or (chord.model in ['CD-6'] and (2.0 <= jar.diameter_in and jar.diameter_in <= 2.25)) 
+or (chord.model in ['44-24-18'] and (2.0 <= jar.diameter_in and jar.diameter_in <= 2.5))  
+or (chord.model in ['ECO-1'] and (1.25 <= jar.diameter_in and jar.diameter_in <= 1.5))  
+or (chord.model in ['ECO-2'] and (1.5 <= jar.diameter_in and jar.diameter_in <= 2.0)) 
+or (chord.model in ['ECO-4'] and (2.0 <= jar.diameter_in and jar.diameter_in <= 2.5))  
+or (chord.model in ['HTP-31'] and (1.5 <= jar.diameter_in and jar.diameter_in <= 2.0)) 
+or (chord.model in ['HTP-41'] and (2.0 <= jar.diameter_in and jar.diameter_in <= 2.25)) 
+or (chord.model in ['HTP-52'] and (2.25 <= jar.diameter_in and jar.diameter_in <= 2.5)) 
+or (chord.model in ['LX-8'] and (1.25 <= jar.diameter_in and jar.diameter_in <= 1.5))  
+or (chord.model in ['LX-10'] and (1.5 <= jar.diameter_in and jar.diameter_in <= 2.0))  
+or (chord.model in ['LX-12'] and (2.0 <= jar.diameter_in and jar.diameter_in <= 2.25))
 )

 return chord, jar, wax limit 10;

@davidhbigelow
Copy link
Author

YES - that is the problem. The more filtering clauses -- the slower neo4j gets - and with this many it will often crash with a "Java Heap Space" problem... I have tried with bigger machines (more CPU and Memory) -- but the same results. (again this is a REALLY SMALL cross product!)

@davidhbigelow
Copy link
Author

Sorry - DID NOT MEAN TO CLOSE THIS -- IT IS STILL VERY MUCH AN ISSUE!!!

@systay
Copy link
Contributor

systay commented Jun 22, 2016

Hi David,

Thanks for reporting this. You are absolutely right! This is a bug. The fix is in #7429

If you are curious, what we do is that we normalise the predicates to make them easier to work with for the planner. We are aiming for (Conjuctive Normal Form)[https://en.wikipedia.org/wiki/Conjunctive_normal_form], which basically means an AND of ORs. In this query, the predicates are in DNF, the reverse, which is an OR of ANDs. When normalising something in DNF to CNF, the resulting predicate tree gets huge, and that is where all time goes - in the planning of the query and not in the execution of it.

We already had a limit when we would give up normalisation, but that limit was too high.

@davidhbigelow
Copy link
Author

WOW - thanks for getting eyes on that SOOOOOO QUICKLY!!!!

Not sure how the "fixes" work - I assume there is a build somewhere where we can get that? Or is there a version release that this will be "official"?

@systay
Copy link
Contributor

systay commented Jun 22, 2016

We don't publish nightly builds, so unless you are willing to build it own your own, you'll have to wait until the next release. The fix will hopefully make it in to our 3.1.M04 release, but also maintenance releases for the 2.3 and 3.0 series.

M04 is coming out very soon, don't know about the release plans for the older versions though. :)

@Mats-SX
Copy link
Contributor

Mats-SX commented Jul 12, 2016

@davidhbigelow @systay From what I read, this issue seems to have been resolved. Could you please verify, so that we may close this issue?

@davidhbigelow
Copy link
Author

I am really sorry for my delay.... I have been swamped beyond measure...
I will try to test this in the next couple of days....

How do I get a testable version?

On Tuesday, July 12, 2016, Mats Rydberg notifications@github.com wrote:

@davidhbigelow https://github.com/davidhbigelow @systay
https://github.com/systay From what I read, this issue seems to have
been resolved. Could you please verify, so that we may close this issue?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7415 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABkopRNpUtMDy9jIQj0LmtYJ6EXTwc7nks5qU1XmgaJpZM4I5MCW
.

David Bigelow
Simplified Logic, Inc.
C: 317-431-5454

@Mats-SX
Copy link
Contributor

Mats-SX commented Aug 8, 2016

@davidhbigelow There are several ways. Which version of Neo4j are you using? For 2.3 and 3.0 (our latest released versions) the two latest patches 2.3.6 and 3.0.4, respectively, include the fix. You can also check out the latest milestone for our next release, 3.1.0-M06.

You can download from our website directly: https://neo4j.com/download/other-releases/ or you can use Maven.

@davidhbigelow
Copy link
Author

Awesome - thank you very much!!!!

Dave

On Mon, Aug 8, 2016 at 8:05 AM, Mats Rydberg notifications@github.com
wrote:

@davidhbigelow https://github.com/davidhbigelow There are several ways.
Which version of Neo4j are you using? For 2.3 and 3.0 (our latest
released versions) there are two patches 2.3.6 and 3.0.4 which include
the fix. You can also check out the latest milestone for our next release,
3.1.0-M06.

You can download from our website directly: https://neo4j.com/download/
other-releases/ or you can use Maven
http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.neo4j%22%20AND%20a%3A%22neo4j%22
.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7415 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABkopYBM2FbJ1VQ78L1sfgp_n-OaTWHrks5qdxuUgaJpZM4I5MCW
.

David Bigelow
Simplified Logic, Inc.
C: 317-431-5454

@systay systay closed this as completed Aug 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants