New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-28555: Add verbosity to ApPipe and DiaPipe DB errors #106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. My main question is whether the grouping you do to find and drop duplicates changes the order of the diaObjects, and if it does, whether that matters.
# if len(diaObjects) > 0: | ||
# dups = diaObjects.iloc[[0, -1]] | ||
# diaObjects = diaObjects.append(dups, sort=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this commented-out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch. Was keeping this in until the decision was made on warn vs raise for finding duplicates. Will take it out.
@@ -97,11 +98,19 @@ def run(self, exposure, apdb): | |||
``diaObjectId``, ``filterName``, ``diaSourceId`` columns. | |||
(`pandas.DataFrame`) | |||
""" | |||
visit_info = exposure.getInfo().getVisitInfo() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be visitInfo
?
"Duplicate DiaObjects created after association. This may " | ||
"cause downstream pipeline issues. Dropping duplicated rows.") | ||
# Drop duplicates via index and keep the first appearance. | ||
diaObjects = diaObjects.groupby(diaObjects.index).first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, and throughout these changes where you use groupby
: does this change the ordering of the diaObjects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might, but since the dataFrames are indexed on the object/source dataIds and continue to be so after this interaction, it won't matter. I rarely use iloc and only do so after matching it to the proper location within the arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
"Duplicate DiaSources found after association and merging " | ||
"with history. This is likely due to re-running data with an " | ||
"already populated Apdb. If this was not the case then there " | ||
"was a failure in Association which should not happen. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "an unexpected failure in Association, and should be reported". From personal experience, I find it frustrating to get an error message that tells me that it shouldn't happen, without also telling me how to try to debug or fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded a bit but left the heart of what you suggested.
Additinally add code to drop duplicates for now. Debug loadDiaCatalogs dup tests. Properly re-index multiIndex dataframes to not break downstream processing. Copy debugged duplicate detction code to all points. Commit after initial debug of must has_duplicates tests. Respond to reviewer. Fix variable name. Remove commented code.
Implement tests for inputing dups in association. Debug RuntimeError test. Fix numpy type warnings. Fix more numpy deprication warnings.
db57dc3
to
0fa287e
Compare
No description provided.