Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing in Rapidminer #81

Open
whitfa99 opened this issue Sep 13, 2016 · 14 comments
Open

Testing in Rapidminer #81

whitfa99 opened this issue Sep 13, 2016 · 14 comments
Labels

Comments

@whitfa99
Copy link

Using jdbc Release 3.0.1 on Neo4j 3.0.5, using the (GUI) Read Database operator within Rapidminer.
Connection string is
jdbc:neo4j:bolt://localhost:7687/?debug=false,noSsl,flatten=1000

The resultset for this simple cypher statement looks promising:

Match (n:mynode)
WHERE NOT n.aninteger is NULL
RETURN n.aninteger

..however it fails when I comment out the second line:

org.neo4j.driver.v1.exceptions.value.Uncoercible: Cannot coerce NULL to Java double

However, it can handle null values of a sring field just fine in the output.

Also, if I try just this:

Match (n:mynode)
RETURN n

Then I get:
org.neo4j.driver.v1.exceptions.value.LossyCoercion: Cannot coerce FLOAT to Java float without losing precision

Although, I'm not yet sure which field it's referring to here.

Also, where the Neo4j browser would happily return arrays, the driver doesn't (the field is just null).

@inserpio
Copy link
Collaborator

Hi @whitfa99, I'll investigate and ping you back soon. It'd greatly simplify our work if you could create a test graph that certifies the problem.

@inserpio
Copy link
Collaborator

Also could you please send us the Java code so that we can see the way you're using the driver?

@whitfa99
Copy link
Author

I can reproduce the first error using the domain created by this:

UNWIND [{name:'A', aninteger:1}, {name:'B',aninteger:2}, {name:'C',aninteger:null}, {name:'D',aninteger:4}] AS properties CREATE (n:mynode) SET n = properties
RETURN n

Interestingly, while reproducing this for you I first only created the 'aninteger' property, which produces the expected result when using:

Match (n:mynode)
WHERE NOT n.aninteger is NULL
RETURN n.aninteger

,but when you exclude the 2nd line here the result set is

n.aninteger
null
null
null
null

If instead you create the nodes with both 'name' and 'aninteger' properties and retry the same then it throws the Cannot coerce NULL to Java double error.

As far as the java code is concerned I'm not entirely sure because I've only added this project's jar file (neo4j-jdbc-driver-3.0.1.jar) to the classpath of the Rapidminer application and utilised it's GUI to create the jdbc connection as described here.

Full Disclosure: I am using Rapidminer 5.31, which was the last fully open source version (imho). I don't imagine it's changed much though. Also, I ended up deleting com.fasterxml.* from the jar because it was meddling with a v2.4.0 of the same package elsewhere in the classpath.

If that last bit has caused the anomaly I will gladly apologise and go crawl back under my rock.

@inserpio
Copy link
Collaborator

Thanks for the additional info @whitfa99.

com.fasterxml.* is used for the HTTP protocol. If you just need to use the BOLT one (which is faster) then you could bring the protocol-specific jar (please have a look at the documentation here). The jar is downloadable here. This can save you from loading the all-protocols driver neo4j-jdbc-driver-3.0.1.jar and removing unwanted pieces inside it.

I'm keen to think that your config should be ok, but I still need to have a look at Rapidminer (I have never used it). In the meantime, if it sounds reasonable for you, you can keep the ResultSet clean either by adding the WHERE NOT n.aninteger is NULL or by coalescing the result like this RETURN coalesce(n.aninteger, 0).

Lorenzo

@whitfa99
Copy link
Author

Thanks for the tip Lorenzo. The domain I'm using in this thread is just for reproducing the error. I can't do without those nulls :) Perhaps if it's feasible, I'll hold out for a fix, assuming it's also something that affects other people. I can see that Knime is also mentioned elsewhere here and Rapidminer is in the same space.

I'm looking to replace my existing neo4j>rapidminer method, which currently involves me using the http REST interface and then a rather slow XML deserialiser. Hoping to replace with something less convoluted and more "Bolty-er" :)

@jexp
Copy link
Member

jexp commented Sep 14, 2016

I think the problem is that the "int" type is primitive and doesn't allow for nulls at all.

We would have to coerce it to 0 then instead of null, but not sure how common that case is.
Not sure what rapidminer uses internally to access the fields.

perhaps if you can get any access to stack traces / logs of rapidminer that would be helpful.

Not sure how that's solved in SQL.

Did you try return toString(n.aninteger)

@inserpio
Copy link
Collaborator

inserpio commented Sep 14, 2016

I configured (for the very first time Rapidminer) like that:

  1. Defined neo4j driver params + uploaded the jar file:

manage database drivers

  1. Defined a connection to a specific instance of neo4j (localhost):

manage database connection

  1. Defined a very basic Rapidminer process that executes a cypher query and automatically print the result:

rapidminer process

  1. Defined the query to be executed match (n:mynode) return coalesce(n.aninteger, 0):

cypher query

  1. Run the process:

process execution result

So coalescing could be a temporary workaround...

If I remove the coalesce like this match (n:mynode) return n.aninteger, I get the above mentioned exception:

screen shot 2016-09-14 at 23 13 15

For completeness, we have to implement (as it's in the roadmap) some metadata methods for schema, because I'm observing this exception also:

screen shot 2016-09-14 at 23 15 58

@whitfa99 I can rapidly provide you a patch for that, but a new 3.1 release is planned for half if October. This mean I'll give you a snapshot jar.

@whitfa99
Copy link
Author

That would be fantastic Lorenzo. Thanks for your suggestions guys.

inserpio added a commit that referenced this issue Sep 15, 2016
…oolean returns false for NULL values. This patch solves issue #81
@inserpio
Copy link
Collaborator

Hey @whitfa99,

can you try to replace the 3.0.1 driver jar with this 3.1-SNAPSHOT version and see if it sounds good for you? https://github.com/neo4j-contrib/neo4j-jdbc/releases/download/3.1-SNAPSHOT/neo4j-jdbc-bolt-3.1.0-SNAPSHOT.jar

Let me know!
Thanks,
Lorenzo

@whitfa99
Copy link
Author

Thank you - this looks promising! I no longer get the Cannot coerce NULL to Java double error. The fields/attributes/columns are being typed/cast correctly as far as I can see, with the exception of any column which has all null values for the returned rows, whereas in my version of Rapidminer at least, it then defaults(?) to nominal (string).

I am however getting a NPE when the recordset should otherwise return empty rows such as:

Match (n:mynode)
WHERE n.aninteger = 5
RETURN n.aninteger, n.name

Also, I've run some other queries and I'm seeing some odd behaviour. Is there perhaps some code that samples X rows and decides that if a column has been null then it should be for the whole dataset? That's what I'm guessing. I have simplified one of my queries like this (you don't have the domain for this):

Match (n:mynode)-[r]-()
WHERE NOT r.adouble is NULL
RETURN r.adouble

Which returns 299 rows of populated values with Rapidminer typing it as Real. If I comment out the WHERE statement, then I get 536 rows of nulls (299 of them should not be!), with the Type now being Nominal. If I run the same statement in the Neo4j Browser, then I get all 536 rows including 299 populated values, the rest are null, the first 190 of these nulls are in the first rows.

If I then change the order in the statement in Rapidminer like this:

Match (n:mynode)-[r]-()
RETURN r.adouble
ORDER BY id(r)

Then I get the 299 populated values, plus the 237 nulls = 536 records as expected (the same as the output in Neo4j Browser). Presumably, this is because the first 190 field values are no longer all null.

Apart from that, all looks good and runs very quickly.

Also, are arrays in plan to be handled in a similar way as the browser? I don't currently get any output of them using the driver in this way.

Adrian

@inserpio
Copy link
Collaborator

@whitfa99 if you could share a minimal dataset to work on together then it'd be great.

@whitfa99
Copy link
Author

This dataset also works:

WITH REDUCE (Build=[], n in range(1,500)|Build+{NodeNo:n,adouble:CASE WHEN n>200 THEN n*0.01 END}) as mapcollection
UNWIND mapcollection AS map CREATE (n:mynode) SET n = map
RETURN n

If you then run this in Rapidminer:

Match (n:mynode)
RETURN n.NodeNo,n.adouble
ORDER BY n.NodeNo DESC

Then you get what you expect, however if you run this:

Match (n:mynode)
RETURN n.NodeNo,n.adouble
ORDER BY n.NodeNo ASC

Then you only get null values for n.adouble

@inserpio
Copy link
Collaborator

Great, thank you. Let me try.

@inserpio inserpio added the tools label May 14, 2017
michael-simons added a commit that referenced this issue Feb 27, 2024
* build(deps): Bump spring-javaformat.version from 0.0.39 to 0.0.40

Bumps `spring-javaformat.version` from 0.0.39 to 0.0.40.

Updates `io.spring.javaformat:spring-javaformat-checkstyle` from 0.0.39 to 0.0.40
- [Release notes](https://github.com/spring-io/spring-javaformat/releases)
- [Commits](spring-io/spring-javaformat@v0.0.39...v0.0.40)

Updates `io.spring.javaformat:spring-javaformat-maven-plugin` from 0.0.39 to 0.0.40
- [Release notes](https://github.com/spring-io/spring-javaformat/releases)
- [Commits](spring-io/spring-javaformat@v0.0.39...v0.0.40)

---
updated-dependencies:
- dependency-name: io.spring.javaformat:spring-javaformat-checkstyle
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: io.spring.javaformat:spring-javaformat-maven-plugin
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Polish.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Michael J. Simons <michael@simons.ac>
michael-simons added a commit that referenced this issue Mar 1, 2024
This affects #81, but does not close it, as it cannot be tested with RapidMiner, as RapidMiner does not work with JDK17.
@michael-simons
Copy link
Collaborator

This works correctly in version 6, see

try (var result = stmt.executeQuery("MATCH (n:mynode) RETURN n.aninteger ORDER BY n.order_prop ASC")) {
	assertThat(result.next()).isTrue();
	assertThat(result.getInt(1)).isEqualTo(42);
	assertThat(result.next()).isTrue();
	assertThat(result.getInt(1)).isZero();
	assertThat(result.wasNull()).isTrue();
	assertThat(result.next()).isFalse();
}

I cannot test in RapidMiner, as the latest version of the product only works with a quite dated JDK 11, therefor I keep this issue open.

michael-simons added a commit that referenced this issue Mar 1, 2024
…c. (#533)

This affects #81, but does not close it, as it cannot be tested with RapidMiner, as RapidMiner does not work with JDK17.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants