-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArrayIndexOutOfBoundsException during read #77
Comments
thanks for reporting the issue! would it be possible for you to share the file or it contains sensitive information? just trying to understand why this particular file is tricky. |
I'm sorry, I am not allowed to share this file. |
we will investigate the issue. looks like your dataset contains deleted rows (see another similar issue: pandas-dev/pandas#15963), and the simplest workaround would be to remove those deleted rows if you don't need them, then the file should be read properly. |
Hi, @luca-vercelli As you can not share your dataset, would it be possible to try to read that file on your machine with this change in the Current code: private int bytesToShort(byte[] bytes) {
return byteArrayToByteBuffer(bytes).getShort();
} Fix proposal: private int bytesToShort(byte[] bytes) {
return byteArrayToByteBuffer(bytes).getShort() & 0xFFFF;
} ? |
Hi @xantorohara |
Thanks @luca-vercelli ! Seems like this fix can solve some other problems. |
Actually `vars.get(0)` is array of 4 bytes (its length is based on a SasFileConstants.PAGE_DELETED_POINTER_LENGTH = 4). So, it is more correct to use bytesToInt() conversion here instead of bytesToShort().
Actually `vars.get(0)` is array of 4 bytes (its length is based on a SasFileConstants.PAGE_DELETED_POINTER_LENGTH = 4). So, it is more correct to use bytesToInt() conversion here instead of bytesToShort().
closed thanks to xantorohara |
While reading of certain 900MB file, I get this error:
This is the line of code 917:
The problem is, I don't know why, the routine getBytesFromFile get some strange values for offset and lenght:
offset[i]=-209
andlength[i]=0
.As a workaround, I solved this way:
If you agree with this, I can send a PR.
I am not able reproducing the issue with a smaller file.
The text was updated successfully, but these errors were encountered: