Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterating over groupby fails with period-indexed dataframe #3579

Closed
goyodiaz opened this issue May 11, 2013 · 4 comments · Fixed by #3591
Closed

Iterating over groupby fails with period-indexed dataframe #3579

goyodiaz opened this issue May 11, 2013 · 4 comments · Fixed by #3591
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby
Milestone

Comments

@goyodiaz
Copy link
Contributor

I have found this while working with period-indexed data frames on Windows XP 32 bits:

import numpy as np
import pandas as pd
index = pd.period_range(start='1999-01', periods=5, freq='M')
s1 = pd.Series(np.random.rand(len(index)), index=index)
s2 = pd.Series(np.random.rand(len(index)), index=index)
series = [('s1', s1), ('s2',s2)]
df = pd.DataFrame.from_items(series)
grouped = df.groupby(df.index.month)
list(grouped)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python27\lib\site-packages\pandas\core\groupby.py", line 595, in get_iterator
    for key, (i, group) in izip(keys, splitter):
  File "D:\Python27\lib\site-packages\pandas\core\groupby.py", line 2214, in __iter__
    sdata = self._get_sorted_data()
  File "D:\Python27\lib\site-packages\pandas\core\groupby.py", line 2231, in _get_sorted_data
    return self.data.take(self.sort_idx, axis=self.axis)
  File "D:\Python27\lib\site-packages\pandas\core\frame.py", line 2891, in take
    new_index = self.index.take(indices)
  File "D:\Python27\lib\site-packages\pandas\tseries\period.py", line 1110, in take
    taken = self.values.take(indices, axis=axis)
TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'

This happens with the last stable version 0.11.0. However iterating over grouped.s1 or grouped.s2 just works.

@cpcloud
Copy link
Member

cpcloud commented May 11, 2013

See the docs for numpy.ndarray.astype for why this is happening. Let d1 and d2 be two dtypes. array([1,2], d1).astype(d2, cast='safe') will not work if d1.itemsize > d2.itemsize because 'safe' implies that you are not okay with losing information when casting from d1 to d2. You are on a 32-bit system and it sounds like there is no 64-bit integer emulation on your OS.

@jreback
Copy link
Contributor

jreback commented May 11, 2013

@cpcloud this might be a bug I have to look
period index is backed by int64, by there are conversions necessary at times to platform int
(which should be transparent)

will take a look

@cpcloud
Copy link
Member

cpcloud commented May 11, 2013

@jreback I was just about to ask if this is a numpy or pandas issue :)

@jreback
Copy link
Contributor

jreback commented May 13, 2013

closed by #3591

@jreback jreback closed this as completed May 13, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants