-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure Writing 20GB+ Files #51
Comments
Some questions:
|
On 1st one - add overwrite option like
|
Forgot to include that in my post, but I am using that option already. |
@westonsankey logs? |
|
Added logs. It happens when writing to other formats as well (tested with Parquet). |
@westonsankey can you provide the following:
|
@thesuperzapper - I think the files might be corrupt, but I'm doing some more testing/research on my end. I'll provide you with that information if I determine that this is not the case. Appreciate the help thus far. |
@westonsankey even if the files are corrupt, perhaps we can emit a nicer error message for such files. |
|
@westonsankey This might become an issue, because we use a crazy hack to use the Parso library, and I doubt it works in Scala 2.12. |
@thesuperzapper |
@westonsankey
|
@westonsankey could you also try setting |
@thesuperzapper - Tried setting the I wrote a small Java program using the Parso library to iterate over the rows in the SAS file to see if there were any errors parsing a single row. Turns out that is the issue - I get the |
@westonsankey can you raise an issue here: https://github.com/epam/parso |
I've been running into failures with certain files that are larger than 20 GB in size. Specifically, these errors come in two varieties:
FileAlreadyExists
exception will be thrown because it is attempting to write a file that has already been written.java.io.IOException
will be thrown with the messageThere are no available bytes in the input stream.
.I'm loading data from a SAS file into a DataFrame, then writing it out as a CSV. A sample of what I'm doing is below:
The text was updated successfully, but these errors were encountered: