Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agg() swallows exceptions. #3238

Closed
blais opened this issue Apr 2, 2013 · 12 comments · Fixed by #3277
Closed

agg() swallows exceptions. #3238

blais opened this issue Apr 2, 2013 · 12 comments · Fixed by #3277
Milestone

Comments

@blais
Copy link
Contributor

blais commented Apr 2, 2013

agg() appears to swallow exceptions.
This was the source of a hard-to-find bug today.
Here's a program that replicates the problem.

#!/usr/bin/env python                                                                                                                                                                                                                         
from pandas import *
from numpy.random import randint

df = DataFrame(randint(10, size=(20, 10)))

def raiseException(df):
  print '----------------------------------------'
  print df.to_string()
  raise TypeError

df.groupby(0).agg(raiseException)
@jreback
Copy link
Contributor

jreback commented Apr 2, 2013

Try raising Exception (instead of TypeError), and your example will work. ``TypeErroris caught internally in order to deal with issues like trying to perform asum` on non-numeric data. Prob ought to be a doc mention on this

@blais
Copy link
Contributor Author

blais commented Apr 2, 2013

I wasn't raising a TypeError myself; calling a function with an incorrect number of parameters raises a TypeError. Your solution is not acceptable.

@jreback
Copy link
Contributor

jreback commented Apr 2, 2013

what exactly are you trying to accomplish? maybe put up a sample frame, function that is closer to what you are doing

@blais
Copy link
Contributor Author

blais commented Apr 2, 2013

I have a custom aggregator function. This function calls another function; the parameter list was wrong. The error was ignored silently! It should not.

@jreback
Copy link
Contributor

jreback commented Apr 2, 2013

@wesm ?

@ghost
Copy link

ghost commented Apr 2, 2013

agg tries to be flexible by applying the aggregator to the entire dataset
at once, and then fallback to doing an item-by-item apply.
This can yield a big performance win if vectorization can be employed, especially in
groupby operations which tend to be large.

Unfortunately, it's hard to predict what kind of Error will be raised by the user-provided
aggregator if the first test fails, hence the broad exception clause.

Although I'm not very fond of this sort of magical behaviour for exactly the reason you
exemplified, I don't see a way to fix that without potentially breaking a lot of code, and
TypeError is specifically something you would expect to see in this situation.

If you have a solution that does not break existing code, let's discuss it.

@hayd
Copy link
Contributor

hayd commented Apr 2, 2013

I like the current magic, but maybe there could be some kind of raise_exceptions flag (which defaults to False)...

@ghost
Copy link

ghost commented Apr 2, 2013

This is a debugging issue, it doesn't make sense to crudify signatures with "dev-time only"
args, and that would open the flood gates to similiar signature noise across the lib.

IMO.

@changhiskhan
Copy link
Contributor

what about raising an exception if none of the items can be agg'ed while agg'ing item by item? It shouldn't break existing code (much) and it should solve the above case.
i know it's just adding on more magic but i can't really think of a general/clean solution off the top of my head.

@wesm
Copy link
Member

wesm commented Apr 2, 2013

Let me have a look at this soon and let you know what I think.

@wesm
Copy link
Member

wesm commented Apr 8, 2013

There's one unit test that breaks by no longer suppressing the TypeError but on inspection I'd rather not support the behavior-- i.e. to be able to pass an aggregation function that ONLY works on the string columns (but fails on the numeric columns). The auto-exclusion of non-numeric data when doing, say, df.groupby(keys).agg(np.sum) was only conceived as a convenience to begin with.

Any objections?

@ghost
Copy link

ghost commented Apr 8, 2013

I think that sounds right, grouper functions should be responsible for handling the data,
it's not pandas' role to guess how exceptions should be handled.

@wesm wesm closed this as completed Apr 8, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants