Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing "MergeError: incompatible merge keys [1] category and category, must be the same type" #26136

Closed
chrish42 opened this issue Apr 18, 2019 · 3 comments

Comments

@chrish42
Copy link
Contributor

commented Apr 18, 2019

I'm using pd.merges_asof() to merge two dataframes together. The by keys are categoricals, but not equal, apparently (one has a few values more than the other). It took me a little bit to figure that out, however, because the error message for this was a bit confusing: "incompatible merge keys [1] category and category, must be the same type". It'd be nice if it was clearer. I can do the pull request, once we figure out how to make this better.

I see the following possible solutions:

  1. Special-case the error message for dtypes that take parameters and so are not necessarily all equal. Something a bit like: "incompatible merge keys: both are category, but not equal ones" (Easiest solution.)
  2. Make a nicer error message for categories, by digging a little bit into what makes them not equal. Something like "incompatible merge keys: both categories, but the left one has 3 levels more", or "but they have different levels: ..."
  3. Change the __str__ method for CategoricalDType to print something a bit more informative than "category". (No guarantee though that what we would print would always allow people to distinguish two not-equal categoricals as not equal.. unless we were to print out all the levels.)

Anything that sounds good here?

@WillAyd

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

This is pretty tricky. Considering both option 1 and 2 to be a form of special casing I wouldn't be in favor of either of them. Not sure how option 3 would end up looking or if it even makes sense.

What would even be the requirements here? That the categoricals being merged would have to be ordered, monotonic, and that the right categorical would have to be a subset of the left?

@WillAyd WillAyd added the Categorical label Apr 18, 2019

@chrish42

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2019

This is about categoricals on both sides of the by key of pd.merge_asof(). My superficial understanding is that they need to be equal, because there's the equivalent of a groupby on those happening somewhere under the hood. But I'll let someone who knows more chime in.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 21, 2019

@chrish42 this is correct, categorical must be exactly equal. I suppose the error message could be enhanced if you wanted to do a pull request. soln 1 is the most reasonable. the others are non-trivial / hard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.