Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame.to_stata() uses wrong struct formats and crashes in int64 #6327

Closed
bashtage opened this issue Feb 12, 2014 · 7 comments
Closed
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@bashtage
Copy link
Contributor

The relevant code is

self.DTYPE_MAP = \
    dict(
        lzip(range(1, 245), ['a' + str(i) for i in range(1, 245)]) +
        [
            (251, np.int16),
            (252, np.int32),
            (253, np.int64),
            (254, np.float32),
            (255, np.float64)
        ]
    )

and

self.TYPE_MAP = lrange(251) + list('bhlfd')

which maps h to int32 and l to int64.

http://docs.python.org/2/library/struct.html#format-characters

shows that h is 2 bytes and l is 4, and so trying to run

struct.pack('<l',2**40)

produces an error.

The obvious fix is to use

self.TYPE_MAP = lrange(251) + list('blqfd')

but this will probably produce errors on 32-bit platforms.

@jreback
Copy link
Contributor

jreback commented Feb 12, 2014

cc @PKEuS

@bashtage can you show an actual file that produces an error? on what stata version? (and pandas version)

@bashtage
Copy link
Contributor Author

Trivial example:

issue = pd.DataFrame([11456230400L],columns=['a'])
issue.to_stata('issue.dta')

using 0.13.1 (Anaconda/x64/Windows)

Simple to reproduce using S&P 500 data from Yahoo! Finance - some of the volumes are > 2**32.

@jreback
Copy link
Contributor

jreback commented Feb 12, 2014

ok! thanks for the report

@jreback jreback added the Bug label Feb 12, 2014
@jreback jreback added this to the 0.14.0 milestone Feb 12, 2014
@jreback
Copy link
Contributor

jreback commented Feb 12, 2014

want to do a PR for this?

@bashtage
Copy link
Contributor Author

A simple fix in PR #6335. Worried that it may break on 32 bit.

@jreback jreback added the Dtypes label Feb 16, 2014
@jseabold
Copy link
Contributor

Indeed, this is incorrect. If you look at the statsmodels version of the reader this has been changed in moving it to pandas. The type maps does not include int64 over there.

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/iolib/foreign.py#L289

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

closed via 689d765

@jreback jreback closed this as completed Mar 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

3 participants