Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
read_sas with chunksize/iterator raises ValueError #14734
Comments
jorisvandenbossche
added Bug IO SAS
labels
Nov 25, 2016
|
@pijucha, thanks for the report. I've PR'd a possible fix. |
jreback
added this to the
0.19.2
milestone
Nov 25, 2016
jorisvandenbossche
closed this
in #14743
Nov 28, 2016
jorisvandenbossche
added a commit
that referenced
this issue
Nov 28, 2016
|
|
kshedden + jorisvandenbossche |
c5f219a
|
jorisvandenbossche
added a commit
that referenced
this issue
Dec 15, 2016
|
|
kshedden + jorisvandenbossche |
6c688b9
|
boulund
commented
May 2, 2017
|
Is this issue solved? I just got this trying to iterate through a large sas7bdat file (using pandas
The file is 27GB, iterating with Is this related to the error referenced in this issue? |
|
Is the file compressed? Check the "compression" attribute of the iterator.
…
|
boulund
commented
May 2, 2017
•
|
@kshedden I get the following from the
Which I interpret as some kind of compression. So, yes, I guess? Edit: I actually got another error for one of my other files. Maybe it's related?
|
|
The SAS specification is not public and had to be reverse engineered
through examples. The compression algorithm was particularly hard to
reverse engineer (other people did most of the hard work on this, I only
made a small contribution). I have been able to validate that our code
successfully reads many compressed SAS files, but I'm pretty sure there are
some compression codes that we do not know. We know the common codes, but
it's more likely that we have missed a rare one, which is consistent with
the fault occurring late in the file.
To verify that this is a compression issue, would it be possible for you to
generate this file as an uncompressed SAS file and see if the issue arises
still?
…
|
boulund
commented
May 2, 2017
|
I see. Really appreciate your effort! Unfortunately I don't think I can generate the file without compression, but I'll look into it (I don't have access to the source data). |
|
If you have SAS, you can convert the file from compressed to uncompressed
(of course in that case you could just use SAS to dump it to csv, but it
would be helpful to us if this flags a problem that we can fix). I can
give you SAS code to do the conversion if needed.
…
|
pijucha commentedNov 25, 2016
read_sasdoesn't work well withchunksizeoriteratorparameters.Code Sample and Problem Description
The following data test file in the repository have 32 lines.
When we carefully read the file with
chunksize/iterator, all's well:or
But if we don't know the length of the data, we'll easily stumble on an exception and won't read the whole data, which is painful with large files.
or
Output of
pd.show_versions()INSTALLED VERSIONS
commit: 75b606a
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.20-1
machine: x86_64
processor: Intel(R)_Core(TM)i5-2520M_CPU@_2.50GHz
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.0+112.g75b606a
nose: 1.3.7
pip: 9.0.1
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0