Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: skip tests of in overlay symmetric_difference under pandas 1.3.3 #2101

Merged
merged 11 commits into from
Sep 16, 2021

Conversation

martinfleis
Copy link
Member

overlay based on symmetric_difference started failing on pandas master because we used an outer join based on columns full of NaNs. Refactoring the implementation to avoid it.

@martinfleis martinfleis added this to the 0.10 milestone Sep 5, 2021
@martinfleis
Copy link
Member Author

martinfleis commented Sep 13, 2021

New pandas is out breaking even more environments.

@jorisvandenbossche can we get this in to make CI useful again?

@jorisvandenbossche
Copy link
Member

Hmm, that's a pity I didn't look into it earlier, as this is a regression in pandas as far as I can see, and unfortunately it ended up in the 1.3.x release (and not just in master). See pandas-dev/pandas#43550

Now, I think this patch is a nice clean-up anyway :) (not fully sure why we were using an outer merge on keys we knew had nothing in common, so it's basically a concat on the other axis)

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Sep 13, 2021

Now, I think this patch is a nice clean-up anyway :) (not fully sure why we were using an outer merge on keys we knew had nothing in common, so it's basically a concat on the other axis)

Actually, that doesn't handle duplicate column names .. (column names present in both left and right dataframe: with concat they are combined in a single column, with merge they are kept as separate columns, so that's probably the reason we were using merge and not concat ..)

@martinfleis
Copy link
Member Author

Actually, that doesn't handle duplicate column names

True. I didn't check that and our CI is not doing that either.

I guess I will scrap this PR and skip the test for pandas 1.3.3 assuming it will get fixed there?

@jorisvandenbossche
Copy link
Member

Yes, skipping the test for pandas 1.3.3 might be best for now (it should already be fixed on master). Hopefully pandas can do a 1.3.4 relatively soon.

@jorisvandenbossche
Copy link
Member

And we should also add a test with a case that would be different between merge vs concat

@martinfleis martinfleis changed the title BUG: outer merge in overlay symmetric_difference TST: skip tests of in overlay symmetric_difference under pandas 1.3.3 Sep 15, 2021
@martinfleis
Copy link
Member Author

I've reverted the code change back, added xfail marks where needed and changed the existing test of duplicated columns to run with all how options. That would catch the bug I had here before.

@jorisvandenbossche
Copy link
Member

I pushed two changes:

  • xfailing the how fixture, which avoids having to add the if pandas_133 ... check to each test that uses it
  • I pinned to pandas 1.3.2 in one of the "latest" envs, so we have at least one build with recent pandas where we still test overlay (and that also avoids having to temporarily add the doctest skip)

@jorisvandenbossche
Copy link
Member

OK, all passing so merging we can get our CI back green :) Thanks!

@jorisvandenbossche jorisvandenbossche merged commit 8caa139 into geopandas:master Sep 16, 2021
@martinfleis martinfleis deleted the overlay_symm branch September 16, 2021 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants