Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-36058: Fix untested Pandas deprecation warnings in ap_association #162

Merged
merged 1 commit into from
Oct 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 7 additions & 6 deletions python/lsst/ap/association/diaPipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -383,8 +383,9 @@ def run(self,
inplace=True)

# Append new DiaObjects and DiaSources to their previous history.
diaObjects = loaderResult.diaObjects.append(
createResults.newDiaObjects.set_index("diaObjectId", drop=False),
diaObjects = pd.concat(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very worried about using concat in our code like this: it can change the schema out from under us. Should we be using convert_dtypes here (and anywhere we use concat)? Could we move this away from pandas entirely and use numpy.concatenate or numpy.vstack?

If we had a way to get pd.concat to raise instead of silently casting to float, that might be more reasonable, but I don't think there is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about convert_dtypes, I just followed the instructions for how to phase out append. As for using pandas in general, see my comment on #160.

[loaderResult.diaObjects,
createResults.newDiaObjects.set_index("diaObjectId", drop=False)],
sort=True)
if self.testDataFrameIndex(diaObjects):
raise RuntimeError(
Expand All @@ -393,8 +394,8 @@ def run(self,
"Apdb. If this was not the case then there was an unexpected "
"failure in Association while matching and creating new "
"DiaObjects and should be reported. Exiting.")
mergedDiaSourceHistory = loaderResult.diaSources.append(
associatedDiaSources,
mergedDiaSourceHistory = pd.concat(
[loaderResult.diaSources, associatedDiaSources],
sort=True)
# Test for DiaSource duplication first. If duplicates are found,
# this likely means this is duplicate data being processed and sent
Expand Down Expand Up @@ -445,8 +446,8 @@ def run(self,

if self.config.doPackageAlerts:
if len(loaderResult.diaForcedSources) > 1:
diaForcedSources = diaForcedSources.append(
loaderResult.diaForcedSources,
diaForcedSources = pd.concat(
[diaForcedSources, loaderResult.diaForcedSources],
sort=True)
if self.testDataFrameIndex(diaForcedSources):
self.log.warning(
Expand Down
2 changes: 1 addition & 1 deletion tests/test_diaPipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def _testRun(self, doPackageAlerts=False, doSolarSystemAssociation=False):
if not doSolarSystemAssociation:
self.assertFalse(hasattr(task, "solarSystemAssociator"))

def concatMock(data):
def concatMock(_data, **_kwargs):
return MagicMock(spec=pd.DataFrame)

# Mock out the run() methods of these two Tasks to ensure they
Expand Down