New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform with Count Agg against DateTime returns DateTime #19200

Closed
WillAyd opened this Issue Jan 12, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@WillAyd
Member

WillAyd commented Jan 12, 2018

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'a': pd.date_range('2018-01-01', periods=3), 'b': range(3)})
df.groupby('b')['a'].transform('count') 

0   1970-01-01 00:00:00.000000001
1   1970-01-01 00:00:00.000000001
2   1970-01-01 00:00:00.000000001
Name: a, dtype: datetime64[ns]

Expected Output

The value 1.0 broadcasted with a float dtype, not a datetime64

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 78c3ff9
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+101.g78c3ff97a
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.5
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd

This comment has been minimized.

Member

WillAyd commented Jan 12, 2018

This appears to be a result of the following code:

out = self._try_cast(out, self.obj)

It affects other aggregation functions like size, rank and cumcount that I would expect should always return a number and not a datetime64.

One solution I could think of is to blacklist those particular aggregations from being cast, although I'd have to think through where exactly to apply that blacklist.

_transform_fast in the linked code only receives a lambda function as an argument to execute aggregation. I'm not sure if there's a way to inspect that lambda to see the exact agg function being used, if that's even the approach we would want to take

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 13, 2018

thought we had an issue about this. #15562 is related.

On certain operations we should not try to cast (count, size, rank). These are normally dispatched to specific methods for non-transforming groupbys (they just calll the method on Series/FrameGroupBy), this is not a problem. However transform calls a cython function and then does the casting.

So need to wade into this and make a better method of doing this. Could certainly pass the name of the function into _transform_fast (we may at a higher level).

@jreback jreback added this to the Next Major Release milestone Jan 13, 2018

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment