Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let ResultSet.getObject() return Float.NaN in case of numeric NaN #1304

Merged
merged 2 commits into from Sep 21, 2018

Conversation

@tseylerd
Copy link
Contributor

@tseylerd tseylerd commented Sep 21, 2018

Postgres numeric column can store NaN value which is not convertable to BigDecimal. getBigDecimal() throws exception in this case. But when we call getObject() to retrieve the data we expect any type of object as return value. So it's convenient to return Float.NaN in this case.

Related DataGrip issue: https://youtrack.jetbrains.com/issue/DBE-5141

@codecov-io
Copy link

@codecov-io codecov-io commented Sep 21, 2018

Codecov Report

Merging #1304 into master will increase coverage by 0.02%.
The diff coverage is 80%.

@@             Coverage Diff              @@
##             master    #1304      +/-   ##
============================================
+ Coverage     68.49%   68.52%   +0.02%     
- Complexity     3868     3873       +5     
============================================
  Files           178      178              
  Lines         16184    16185       +1     
  Branches       2639     2639              
============================================
+ Hits          11086    11090       +4     
+ Misses         3865     3864       -1     
+ Partials       1233     1231       -2

@vlsi
Copy link
Member

@vlsi vlsi commented Sep 21, 2018

Well, technically speaking PostgreSQL has real (~Float), double precision (~Double) and numeric (~???).

@tseylerd , is there any strong reason you think numeric should be converted to Float.NaN rather than Double.NaN ?

@tseylerd
Copy link
Contributor Author

@tseylerd tseylerd commented Sep 21, 2018

@vlsi No. I think it's not very important. Which one is better? I don't know. But i know that anything is better than the exception. So if you prefer double let's make it double.

@davecramer
Copy link
Member

@davecramer davecramer commented Sep 21, 2018

@tseylerd hard to say which one is better.
I don't have a preference.

@vlsi
Copy link
Member

@vlsi vlsi commented Sep 21, 2018

I don't have a preference.

Ok, ThreadLocalRandom.current().nextBoolean() ? Float.NaN : Double.NaN then

@davecramer
Copy link
Member

@davecramer davecramer commented Sep 21, 2018

@vlsi
Copy link
Member

@vlsi vlsi commented Sep 21, 2018

Let's just stick with Double.NaN for "extra precision".

@vlsi vlsi added this to the 42.2.6 milestone Sep 21, 2018
@vlsi vlsi merged commit 265f22b into pgjdbc:master Sep 21, 2018
2 checks passed
@vlsi
Copy link
Member

@vlsi vlsi commented Sep 21, 2018

@tseylerd , could you please clarify expected release date?
E.g. when next DataGrip release is planned?

@tseylerd
Copy link
Contributor Author

@tseylerd tseylerd commented Sep 24, 2018

@vlsi We plan to release in November. But we can update the driver separately as soon as it comes out.

@arahman5
Copy link

@arahman5 arahman5 commented May 29, 2020

Hi guys, wondering if someone can help me with this. I have currently got the 42.2.12 version of the driver and I am still coming across the issue of org.postgresql.util.PSQLException: Bad value for type BigDecimal : NaN. This is happening only when I am selecting a column which has Numeric data type in the Postgres database and that column also contains NaN values. Correct me if I have got this wrong but I understand that this particular issue was targeted at solving this very problem. I would appreciate any ideas on how this can be fixed? Many thanks in advance!

@davecramer
Copy link
Member

@davecramer davecramer commented May 29, 2020

@arahman5 according to the test case in this PR this should not be happening. Can you provide us with a small test case that reproduces the problem ?

@arahman5
Copy link

@arahman5 arahman5 commented May 29, 2020

@davecramer yes sure. I load data from a table in a database by using the below code

val DF = spark.read
        .format("jdbc")
        .option("url", "jdbc:postgresql://" + args(1) + ":" + args(2) + "/" + args(3))
        .option("dbtable", "tablename")
        .option("user", args(4))
        .option("password", args(5))
        .load()

The schema of this table is as follows where the value column contains decimal values and NaN values.

root
 |-- id: long (nullable = true)
 |-- file_id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- time: timestamp (nullable = true)
 |-- value: decimal(38,18) (nullable = true)

I then execute the below piece of code and the code crashes with always the same error message that I previously mentioned. Calling the show method is supposed to print out the values of the value column in the console but I always get the error message

DF.createOrReplaceTempView("tablename")

val nonull = spark.sql("""select value from tablename
                                     order by time desc
                                     limit 1000""").show()

What is even more strange is that, when I am running the below SQL query in pgadmin4 I get a result containing no NaN value at all which is the correct result. However, when I execute the same exact query in scala then I continue to get the error message, thus implying the query is not getting executed in the same manner when run in scala and the NaN values are not getting filtered.

select value from tablename where NOT value = 'NaN'
order by time desc
limit 1000

@arahman5
Copy link

@arahman5 arahman5 commented May 29, 2020

Digging further into the java code, it seems like the issue is coming from the isBinary function to start with which is being called within the getNumeric function code in the script PgResultSet.java.

isbinary

The isBinary function relies on the format of the field (in my case format of the column value) to be equal to BINARY_FORMAT and my understanding is that only if it is true then the major part of the code dealing with NaN values in a Numeric column will be executed. In the script BINARY_FORMAT has been defined to have a value of 1 whereas for the case I am presenting the format of the Numeric column value is always 0 so its not a binary column but still has NaN values in it.

Why does the column need to be binary in order for the remaining of the code as shown in the image above to run?

binary_format

@davecramer
Copy link
Member

@davecramer davecramer commented May 29, 2020

@arahman5 this is very hard to read. github allows you to send links to code which is much easier to read

@davecramer
Copy link
Member

@davecramer davecramer commented May 29, 2020

better question is why there is:
private Number getNumeric(int columnIndex, int scale, boolean allowNaN) throws SQLException

and is only ever called with allowNaN false.

davecramer added a commit to davecramer/pgjdbc that referenced this issue Jul 5, 2021
'NaN'::real => Float.NaN
'NaN'::double precision => Double.NaN
'NaN'::numeric => Double.NaN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants