BUG: SeriesGroupBy.transform() tries to do dtype downcasting, NDFrameGroupBy.transform() - doesn't do this #13046

Closed
maxu777 opened this Issue Apr 30, 2016 · 1 comment

Comments

Projects
None yet
2 participants
Contributor

maxu777 commented Apr 30, 2016 edited by jreback

Code Sample, a copy-pastable example if possible

original: http://stackoverflow.com/questions/36960086/groupby-how-to-extract-seconds-from-datetime-with-diff

data = """
idx     A         ID3              DATETIME
0   B-028  b76cd912ff "2014-10-08 13:43:27"
1   B-054  4a57ed0b02 "2014-10-08 14:26:19"
2   B-076  1a682034f8 "2014-10-08 14:29:01"
3   B-023  b76cd912ff "2014-10-08 18:39:34"
4   B-023  f88g8d7sds "2014-10-08 18:40:18"
5   B-033  b76cd912ff "2014-10-08 18:44:30"
6   B-032  b76cd912ff "2014-10-08 18:46:00"
7   B-037  b76cd912ff "2014-10-08 18:52:15"
8   B-046  db959faf02 "2014-10-08 18:59:59"
9   B-053  b76cd912ff "2014-10-08 19:17:48"
10  B-065  b76cd912ff "2014-10-08 19:21:38"
"""
df = pd.read_csv(io.StringIO(data), delim_whitespace=True, index_col=[0], parse_dates=['DATETIME'])

In [237]: df.groupby('ID3')['DATETIME'].transform(lambda x: x.diff()).dtypes
Out[237]: dtype('<M8[ns]')

In [238]: df[['ID3','DATETIME']].groupby('ID3').transform(lambda x: x.diff()).dtypes
Out[238]:
DATETIME    timedelta64[ns]
dtype: object

#### Expected Output
I would expect the same DTYPEs for both:  `NDFrameGroupBy.transform()` and `SeriesGroupBy.transform()`

#### output of ``pd.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.1.2
sphinx: 1.4
patsy: None
dateutil: 2.5.1
pytz: 2016.3
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.5
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
Contributor

jreback commented Apr 30, 2016

this is a dupe of #10972

welcome to submit a PR to fix.

jreback closed this Apr 30, 2016

jreback added the Duplicate label Apr 30, 2016

jreback added this to the No action milestone Apr 30, 2016

@jreback jreback added a commit to jreback/pandas that referenced this issue Feb 27, 2017

@jreback jreback BUG: fix groupby.aggregate resulting dtype coercion, xref #11444, #13046


make sure .size includes the name of the grouped
11cb51f

@jreback jreback added a commit to jreback/pandas that referenced this issue Feb 27, 2017

@jreback jreback BUG: fix groupby.aggregate resulting dtype coercion, xref #11444, #13046


make sure .size includes the name of the grouped
61fa8be

@jreback jreback modified the milestone: 0.20.0, No action Feb 27, 2017

@AnkurDedania AnkurDedania added a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017

@jreback @AnkurDedania jreback + AnkurDedania BUG: fix groupby.aggregate resulting dtype coercion, xref #11444, #13046


make sure .size includes the name of the grouped
d18d0b4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment