New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove embedded nul bytes if present in raw data to be read #2250
Conversation
d22272f
to
8c86497
Compare
Looks like
We can skip the test, added a suggestion, thank you for this fix @jozefhajnala! |
8c86497
to
65fcdeb
Compare
Added the skip as suggested. |
732697e
to
1c536df
Compare
Databricks Connect tests failed. View logs here. |
Databricks Connect tests succeeded. View logs here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Let's try rebaseing past #2426 and see if arrow tests are skipped as expected.
Signed-off-by: Jozef <jozef.hajnala@gmail.com> Signed-off-by: Jozef Hajnala <jozef.hajnala@gmail.com>
1c536df
to
1797af1
Compare
Databricks Connect tests succeeded. View logs here. |
Great success! |
I'm encountering an error on collect I have the most updated 1.3.1 whose patch notes include "Embedded nul bytes are removed from strings when reading strings from Spark to R (#2250)" I do not have a reproducible example (the null appears somewhere in a 13 billion row table, and creating one artificially is difficult). So i'm just asking if this hotfix is supposed to cover the use case of running a collect() on a spark table and returning it to R, as is the case where I'm getting this error. |
@rexdouglass Hey I don't think this PR covers the use case involving arrow. You can try disabling arrow as a workaround but then there will be some performance penalty as deserialization will be slower without arrow. I'll try to address the use case involving arrow in #2633 -- I'm hoping it's non-complicated and can be shipped as part of Sparklyr 1.4. |
When trying to collect data that contain embedded nul bytes into R the process fails with an error (reproducible example added as a unit test case). This proposes to work around this issue by omitting those bytes.