Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-36058: Fix untested Pandas deprecation warnings in ap_association #162

Merged
merged 1 commit into from Oct 25, 2022

Conversation

kfindeisen
Copy link
Member

@kfindeisen kfindeisen commented Oct 19, 2022

This PR removes references to Dataset.append that weren't covered by unit tests, and tweaks the tests so that they don't make assumptions about the call signatures in application code. It does not try to expand test coverage to perform actual Pandas operations, since I don't see a way to introduce mock tables without making them brittle to external changes.

This commit removes some calls to Dataframe.append that were missed in
3fa22fb. The calls could not be
detected because they were mocked out.

I don't see a way to prevent such issues short of rewriting the tests to
use real tables in place of mocks, but I don't know how to do so while
following the correct table schema (which is subject to change).
@@ -383,8 +383,9 @@ def run(self,
inplace=True)

# Append new DiaObjects and DiaSources to their previous history.
diaObjects = loaderResult.diaObjects.append(
createResults.newDiaObjects.set_index("diaObjectId", drop=False),
diaObjects = pd.concat(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very worried about using concat in our code like this: it can change the schema out from under us. Should we be using convert_dtypes here (and anywhere we use concat)? Could we move this away from pandas entirely and use numpy.concatenate or numpy.vstack?

If we had a way to get pd.concat to raise instead of silently casting to float, that might be more reasonable, but I don't think there is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about convert_dtypes, I just followed the instructions for how to phase out append. As for using pandas in general, see my comment on #160.

Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comment is my growing concern about pd.concat. I don't know what to do about this, though, if we're going to continue to use pandas in diaPipe.

@parejkoj
Copy link
Contributor

See discussion on #160 : I've just found that the diaForcedSource append is currently changing the dtype of the flags field. So, possibly your change here doesn't actually alter how that particular bug behaves, but we should definitely audit our use of pd.concat and replace it with something that doesn't cast everything to float.

That's for a different ticket (not filed yet), so I think you're good to merge here.

@kfindeisen kfindeisen merged commit 37fc9aa into main Oct 25, 2022
@kfindeisen kfindeisen deleted the tickets/DM-36058 branch October 25, 2022 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants