New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed type validation with support for NULL #72
Conversation
Without this fix when reading a boolean column with a NULL value ("" in the input CSV) it would fail to parse as JSON and in numeric columns the NULL value would be coerced to a 0. Now when an empty string is detected it is correctly handled as a NULL value.
Sorry, this took a while getting back to. I've been testing this and had a few observations.
But there is no way to know whether the original uploaded data value was empty or NULL. And since Athena itself supports both for all fields, including number (which is an anti pattern), then I think Athena-express shouldn't (and cannot accurately) correct that and therefore present that data as-is. Let me know what you think |
If we leave the code "as is" it crashes with queries like this:
If the column is of Bool or Numeric it should parse an empty string as a NULL value (it cannot be an empty string). |
How will we distinguish if the original uploaded value was meant to be empty or null, given Athena doesn't distinguish between the two? Feel like we might be making a leap here with polluting the response since Empty and Null are not the same values. |
If the column has been marked by Athena as |
That's what I was showing with the screenshot I posted above. Athena allows empty values for Boolean and Numeric columns. Upload a sample dataset with empty and null values for Athena and you will notice both are allowed and both show up as empty. |
I does not. It is interpreting them as NULL values and showing them as NULL. Try this in your example dataset:
And obviously this is allowed:
But this is not:
|
You're right in that Athena interprets uploaded empty values for Boolean/Number as NULL. |
With your PR, all fields (not just Boolean and number) that are empty will become null. And that will ignore the genuine empty values |
Yes, but note that this is the expected behavior in CSV parsers. Even Athena does this, if you upload a dataset with some empty values in a
|
So you're suggesting replacing all empty values with NULL. Is there no use case where empty values could stay as empty values? |
Yes, that is the typical behavior of CSV/Athena parsers. At least |
ok I'll accept the PR |
seems to work as expected. Are you noticing any issues?
…On Sat, Jan 22, 2022 at 1:12 AM João Dias ***@***.***' via 33Mail ***@***.***> wrote:
This email was sent to the alias ***@***.***' by '
***@***.***',
and 33Mail forwarded it to you. To block all further emails to this alias
click here
<http://www.33mail.com/alias/unsub/41bdfade131b958a27a3e30e285264d5>
***@***.**** commented on this pull request.
------------------------------
In lib/helpers.js
<#72 (comment)>:
> - case "boolean":
- updatedObjectWithDataType[key] = JSON.parse(
- input[key].toLowerCase()
- );
- break;
- case "integer":
- case "tinyint":
- case "smallint":
- case "int":
- case "float":
- case "double":
- updatedObjectWithDataType[key] = Number(input[key]);
- break;
- default:
- updatedObjectWithDataType[key] = input[key];
+ if (!input[key]) {
What if input[key] is false?
—
Reply to this email directly, view it on GitHub
<#72 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGRKZXMWZ2FMLYKM5IK6WLUXJDD7ANCNFSM5IT6QWCA>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
@ghdna no, sorry. I just misread something, nvm |
Without this fix when reading a boolean column with a NULL value ("" in the input CSV) it would fail to parse as JSON and in numeric columns the NULL value would be coerced to a 0. Now when an empty string is detected it is correctly handled as a NULL value.