Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to read correct cypher schema from neo4j #666

Closed
hoptical opened this issue Oct 3, 2018 · 4 comments
Closed

Failing to read correct cypher schema from neo4j #666

hoptical opened this issue Oct 3, 2018 · 4 comments
Assignees

Comments

@hoptical
Copy link

hoptical commented Oct 3, 2018

Hi,
When I try to read from my neo4j server by using below command:
val names = graph.cypher("MATCH(n:Disease) RETURN n.name AS name")
this error is raised:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'property_Auscultation' is ambiguous, could be: property_Auscultation, property_Auscultation.;
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:97)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$36.apply(Analyzer.scala:822)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$36.apply(Analyzer.scala:824)
	at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve(Analyzer.scala:821)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:830)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:830)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
	at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve(Analyzer.scala:830)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$36.apply(Analyzer.scala:891)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$36.apply(Analyzer.scala:891)
	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
	at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.immutable.List.map(List.scala:285)
	at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122)
	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
	at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:127)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9.applyOrElse(Analyzer.scala:891)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9.applyOrElse(Analyzer.scala:833)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:833)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:690)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
	at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
	at scala.collection.immutable.List.foldLeft(List.scala:84)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:124)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:118)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:103)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3296)
	at org.apache.spark.sql.Dataset.select(Dataset.scala:1307)
	at org.apache.spark.sql.Dataset.select(Dataset.scala:1325)
	at org.opencypher.spark.api.io.CAPSRelationshipTable$.fromMapping(CAPSTable.scala:208)
	at org.opencypher.spark.api.io.CAPSRelationshipTable$.apply(CAPSTable.scala:177)
	at org.opencypher.spark.api.io.AbstractPropertyGraphDataSource$$anonfun$3.apply(AbstractPropertyGraphDataSource.scala:119)
	at org.opencypher.spark.api.io.AbstractPropertyGraphDataSource$$anonfun$3.apply(AbstractPropertyGraphDataSource.scala:112)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.MapLike$DefaultKeySet.foreach(MapLike.scala:174)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
	at scala.collection.SetLike$class.map(SetLike.scala:92)
	at scala.collection.AbstractSet.map(Set.scala:47)
	at org.opencypher.spark.api.io.AbstractPropertyGraphDataSource.graph(AbstractPropertyGraphDataSource.scala:112)
	at org.opencypher.okapi.impl.graph.CypherCatalog.graph(CypherCatalog.scala:124)
	at Main$.delayedEndpoint$Main$1(Main.scala:49)
	at Main$delayedInit$body.apply(Main.scala:21)
	at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
	at scala.App$$anonfun$main$1.apply(App.scala:76)
	at scala.App$$anonfun$main$1.apply(App.scala:76)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
	at scala.App$class.main(App.scala:76)
	at Main$.main(Main.scala:21)
	at Main.main(Main.scala)

This error shows that there is a property named "Auscultation" which has conflict with another property named "auscultation".
So I checked the cypher properties schema with this command:
CALL org.opencypher.okapi.procedures.schema
And this is the result related to property "Auscultation":

type nodeLabelsOrRelType property cypherTypes
"Relationship" ["indicate"] "Auscultation" ["NULL", "STRING"]
"Relationship" ["indicate"] "auscultation" ["NULL", "STRING"]

But the point is that there is not any "Auscultation" property and I've removed it before but neo4j thinks that it still exists.
This Neo4j bug has been issued in neo4j_issue#9726
This is solved by reading procedure from Apoc plugin not from neo4j built-in schema.

So I think CAPS needs to read schema correctly not from neo4j built-in method to not hit this kind of bug anymore.

@hoptical hoptical changed the title failing to read correct cypher schema from neo4j Failing to read correct cypher schema from neo4j Oct 3, 2018
@Mats-SX Mats-SX added the Bug label Oct 3, 2018
@s1ck
Copy link
Contributor

s1ck commented Oct 8, 2018

Thanks for reporting this. However, this bug doesn't seem to be the same as reported in neo4j/neo4j#9726. The procedure that you are using (i.e. org.opencypher.okapi.procedures.schema) scans the whole database to compute the schema and does not rely on db-statistics.

In CAPS, we require to know all data types a property might have, e.g. String, Integer, in order to correctly compute the resulting Spark column type.That's also why we cannot use the APOC procedure as it does not report conflicting property types.

I'll try to reproduce the issue and get back to you.

@s1ck
Copy link
Contributor

s1ck commented Oct 8, 2018

We tried the following steps to reproduce the behaviour for nodes:

  1. CREATE (:A { foo : 1234 })
  2. CREATE (:A { Foo : 1234 })
  3. `MATCH (n:A) WHERE n.Foo = 1234 DELETE n
  4. CALL org.opencypher.okapi.procedures.schema
╒══════╤═════════════════════╤══════════╤═════════════╕
│"type"│"nodeLabelsOrRelType"│"property"│"cypherTypes"│
╞══════╪═════════════════════╪══════════╪═════════════╡
│"Node"│["A"]                │"foo"     │["INTEGER"]  │
└──────┴─────────────────────┴──────────┴─────────────┘

We tried the following steps to reproduce the behaviour for relationships:

  1. CREATE (:A)-[:EDGE { foo : 1234}]->(:B)-[:EDGE { Foo : 1234}]->(:C)
  2. MATCH ()-[e:EDGE]->() WHERE e.foo = 1234 DELETE e
  3. CALL org.opencypher.okapi.procedures.schema
╒══════════════╤═════════════════════╤══════════╤═════════════╕
│"type"        │"nodeLabelsOrRelType"│"property"│"cypherTypes"│
╞══════════════╪═════════════════════╪══════════╪═════════════╡
│"Node"        │["B"]                │""        │[]           │
├──────────────┼─────────────────────┼──────────┼─────────────┤
│"Node"        │["C"]                │""        │[]           │
├──────────────┼─────────────────────┼──────────┼─────────────┤
│"Node"        │["A"]                │""        │[]           │
├──────────────┼─────────────────────┼──────────┼─────────────┤
│"Relationship"│["EDGE"]             │"Foo"     │["INTEGER"]  │
└──────────────┴─────────────────────┴──────────┴─────────────┘

Could you please report the steps you took to produce the error? Thanks.

@s1ck s1ck added NOT REPRODUCIBLE and removed Bug labels Oct 8, 2018
@hoptical
Copy link
Author

hoptical commented Oct 8, 2018

Thanks for your answer,

I found that property "Auscultation" was not deleted and that's why this error is produced.
I deleted the property and only keep the lower one(i.e. "auscultation") and now it's working completely properly.

Thanks for your contribution.
Best regards.

@pstutz
Copy link
Contributor

pstutz commented Oct 8, 2018

Glad to hear that, thanks for reporting the success.

@pstutz pstutz closed this as completed Oct 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants