New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparklyr and Arrow Max Records #2243
Comments
Related to this, I've recently discovered that trying to set spark.sql.execution.arrow.maxRecordsPerBatch as 100000 or more caused an error because when the value was processed, it was being converted to scientific form, rather than the integer form required. I think it can be fixed by changing the R session's scipen option, although I'm not 100% certain this is working. So, if there was any work to be done on this, it would also be good to check this behaviour. |
@mjcarroll1985 how are you setting spark.sql.execution.arrow.maxRecordsPerBatch? Are you setting it as a character, or as numeric? E.g. setting maxRecordsPerBath as char:
Versus numeric:
|
@rychien-official as numeric - it seems to work with a value of, say, 80000, just not with anything >=100000 |
Got it. Can please you try using the char version and see if that works? |
@rychien-official Yes, specifying as character works - thank you! |
Nice! |
Hey Sparklyr Team,
I noticed that when using spark_apply with arrow, there is a default maximum records per batch of 10000 if I forget to set "spark.sql.execution.arrow.maxRecordsPerBatch" when connecting to spark. This is the line of code: spark_apply line 191.
A demonstration of bad results when using the default settings is here: https://github.com/rychien-official/sparklyr_maxrecords/blob/master/readme.md
I am wondering if there is a technical reason for hard coding maximum records per batch during spark connection? It would be nice to modify spark_apply to be able to pass through maximum records per batch as a parameter. I'd be happy to create a pull request and do so.
Thanks!
-Ryan
The text was updated successfully, but these errors were encountered: