Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-28555: Add verbosity to ApPipe and DiaPipe DB errors #106

Merged
merged 3 commits into from Mar 16, 2021

Conversation

morriscb
Copy link
Contributor

No description provided.

Copy link
Contributor

@isullivan isullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. My main question is whether the grouping you do to find and drop duplicates changes the order of the diaObjects, and if it does, whether that matters.

Comment on lines 132 to 134
# if len(diaObjects) > 0:
# dups = diaObjects.iloc[[0, -1]]
# diaObjects = diaObjects.append(dups, sort=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this commented-out code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. Was keeping this in until the decision was made on warn vs raise for finding duplicates. Will take it out.

@@ -97,11 +98,19 @@ def run(self, exposure, apdb):
``diaObjectId``, ``filterName``, ``diaSourceId`` columns.
(`pandas.DataFrame`)
"""
visit_info = exposure.getInfo().getVisitInfo()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be visitInfo?

"Duplicate DiaObjects created after association. This may "
"cause downstream pipeline issues. Dropping duplicated rows.")
# Drop duplicates via index and keep the first appearance.
diaObjects = diaObjects.groupby(diaObjects.index).first()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and throughout these changes where you use groupby: does this change the ordering of the diaObjects?

Copy link
Contributor Author

@morriscb morriscb Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might, but since the dataFrames are indexed on the object/source dataIds and continue to be so after this interaction, it won't matter. I rarely use iloc and only do so after matching it to the proper location within the arrays.

Copy link
Contributor

@isullivan isullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

"Duplicate DiaSources found after association and merging "
"with history. This is likely due to re-running data with an "
"already populated Apdb. If this was not the case then there "
"was a failure in Association which should not happen. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "an unexpected failure in Association, and should be reported". From personal experience, I find it frustrating to get an error message that tells me that it shouldn't happen, without also telling me how to try to debug or fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded a bit but left the heart of what you suggested.

Additinally add code to drop duplicates for now.

Debug loadDiaCatalogs dup tests.

Properly re-index multiIndex dataframes to not break downstream
processing.

Copy debugged duplicate detction code to all points.

Commit after initial debug of must has_duplicates tests.

Respond to reviewer.

Fix variable name.
Remove commented code.
Implement tests for inputing dups in association.

Debug RuntimeError test.

Fix numpy type warnings.

Fix more numpy deprication warnings.
@morriscb morriscb merged commit 708f8d8 into master Mar 16, 2021
@morriscb morriscb deleted the tickets/DM-28555 branch March 16, 2021 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants