Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
issue with StataReader for stata files versions 108 and older #12232
Comments
jorisvandenbossche
added the
IO Stata
label
Feb 4, 2016
|
@ckingdon95 thanks for the detailed report. We don't have any test files that old, and I cannot create a file that old with the latest version of stata, which is the only one I can access (see link below). So we might need someone to provide us with a test file to troubleshoot this. Are there any version 108 files floating around on the web? http://www.stata.com/support/faqs/data-management/save-for-previous-version/ |
|
I don't think we need _decode_bytes is not covered by any tests and is only called in one place. Can someone with a version < 108 file change lines 1204-1208 of stata.py to:
and report back? If we adopt this we should remove the |
ckingdon95
commented
Feb 5, 2016
|
thanks for the reply! the stata files I am trying to read can be found at this website: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/EXTLSMS/0,,contentMDK:21544648~pagePK:64168445~piPK:64168309~theSitePK:3358997,00.html |
jreback
added the
Compat
label
Feb 5, 2016
jreback
added this to the
0.18.0
milestone
Feb 5, 2016
jreback
closed this
in ca4f738
Feb 8, 2016
cldy
added a commit
to cldy/pandas
that referenced
this issue
Feb 11, 2016
|
|
kshedden + cldy |
e57fd92
|
ckingdon95 commentedFeb 4, 2016
I am having an issue with the StataReader class, which is found in stata.py ("pandas/io/stata.py").
I have pandas: 0.17.1.
The following is the python code I am trying to run:
where fileName is a stata file.
The following code is part of the _read_old_header method(which starts on line 1184) of the StataReader class in stata.py, which gets called during the initialization of a StataReader object:
I have no errors when my stata files are newer than version 108, but with files that are version 105, there seems to be a bug in _decode_bytes. The above code passes in self and only one additional argument to _decode_bytes, the string that is returned by path_or_buf.read(1).
Here is the the method _decode_bytes (line 896):
When no third argument is passed in (as is the case when it is called by _read_old_header), the argument "errors" is set to None. Here is where the error is thrown. The error is:
That is the issue: the decode method of the string class is expecting the second argument to not be a None type, but _decode_bytes passes in errors as None by default.