Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
StataReader.variable_labels() does not read variable label correctly for stata datasets saved under Stata 13 using 'save' (but it can read datasets saved using 'saveold') #7816
Comments
|
docs are here: http://pandas.pydata.org/pandas-docs/stable/io.html#reading-from-stata-format something like:
look inside the |
jreback
added Usage Question Stata
labels
Jul 22, 2014
|
closing as a usage question |
jreback
closed this
Jul 22, 2014
shafiquejamal
commented
Jul 22, 2014
|
Hello, I'm sorry if I wasn't clear earlier. I did use the Can you please re-open this issue? It is still not resolved (I am using the latest Pandas master branch). Thanks. |
|
ok, so this is a feature/bug request then? ok |
jreback
reopened this
Jul 22, 2014
jreback
added the
Bug
label
Jul 22, 2014
jreback
added this to the
0.15.1
milestone
Jul 22, 2014
|
cc @bashtage |
shafiquejamal
commented
Jul 22, 2014
|
Yes it is a bug/feature request. I guess Stata changed something in how they save data files, which means that the Stata reader needs to be updated to accommodate this change. Many thanks! |
|
@shafiquejamal Would be helpful if you could share a simple example file .dta which produces the problem, as well as a v12 one that works. This looks like it is implemented in the v13 path - although it probably is buggy |
jreback
removed the
Usage Question
label
Jul 22, 2014
shafiquejamal
commented
Jul 22, 2014
|
Certainly. I have a couple of .dta files of about 450kb each that I can share (problem dataset I tried dragging them into this comment window, but I'm getting this error at the bottom of this comment window: "Unfortunately, we don't support that file type. Try again with a PNG, GIF, or JPG." How can I share these .dta files with you? Thanks, |
|
@shafiquejamal put them up on a public dropbox / share site. I think you can do it via gist as well. and post the link here. |
shafiquejamal
commented
Jul 22, 2014
|
Here is the dropbox link: https://www.dropbox.com/sh/4r0fhspsiwpim5p/AACBaC-lu7TaNPLUQQgU_rt4a So StataReader can handle the file ending in |
|
The bug, unfortunately, seems to be in stata. Stata's dta file definition claims that it gives the offset to the start of this segment as 1 of 14 8 byte values, in . Unfortunately, this value is 0 (0000 0000 0000 0000 in the file) in this file, and is 0 in 1 I just saved from Stata 13. The code appears to be a correct implementation of Stata's documented file format, so I'm not sure if this should be "fixed" (which would be to hack around Stata's problem). |
shafiquejamal
commented
Jul 22, 2014
|
Thanks for looking into this so quickly. I'll see about contacting folks at Stata to see whether they can fix their documentation, which would then just justify modifying Pandas. To summarize then: the problem is that the offset (to the start of the segment in the dta file that defines the variable labels) should be 1, according to Stata's documentation (in Many thanks, |
bashtage
referenced
this issue
Jul 22, 2014
Merged
BUG: Fixed failure in StataReader when reading variable labels in 117 #7818
|
I have submitted a patch that works around the difference between the docs and the implementation. The required value is technically unnecessary since it can be computed from other values. |
jreback
modified the milestone: 0.15.0, 0.15.1
Jul 22, 2014
bashtage
pushed a commit
to bashtage/pandas
that referenced
this issue
Jul 23, 2014
|
|
6265450
|
shafiquejamal
commented
Jul 23, 2014
|
Thanks! Its working with my datasets. Cheers, |
shafiquejamal commentedJul 22, 2014
If I use SataReader to read a Stata dataset saved in Stata 13 using the
savecommand, I can get the data but not the variable labels.If, however, I use the
saveoldcommand in Stata 13, I am able to get the variable labels in Python3 usingStataReader.variable_labels().Can anyone suggest how to accommodate Stata 13? Thanks,