-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Closed
Milestone
Description
reading http://pandas.pydata.org/pandas-docs/dev/gotchas.html it implies that using dtype=object should preserve the integer, passing the data via a numpy array preserves the type as I would like:
In [1]: import pandas
In [2]: pandas.__version__
Out[2]: '0.9.1'
In [3]: data = [(6260L, 20302010L), (6262L, None)]
In [4]: df = pandas.DataFrame(data, dtype=object)
In [5]: df
Out[5]:
0 1
0 6260 2.030201e+07
1 6262 NaN
In [6]: df.dtypes
Out[6]:
0 object
1 object
In [7]: type(df.values[0][1])
Out[7]: float
In [8]: df = pandas.DataFrame(data)
In [9]: df.dtypes
Out[9]:
0 int64
1 float64
In [10]: type(df.values[0][1])
Out[10]: numpy.float64
In [11]: df = pandas.DataFrame(pandas.np.array(data, dtype=object))
In [12]: df.dtypes
Out[12]:
0 object
1 object
In [13]: type(df.values[0][1])
Out[13]: long
The place where this seems to start breaking is in the call to maybe_convert_objects from _convert_object_array, in pandas.core.frame.py L 5221 (for 0.9.1)
In [3]: type(pandas.lib.maybe_convert_objects(pandas.np.array([20302010, None], dtype=object), try_float=False)[0])
Out[3]: numpy.float64
Metadata
Metadata
Assignees
Labels
No labels