New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/cumulus 2309 granule parallel deletes #2057
Feature/cumulus 2309 granule parallel deletes #2057
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I mostly had thoughts about how we should structure the delete operations and where that code should live. Happy to do a pair review to talk further
Also, I didn't notice anything in this PR related to updating the bulk delete functionality:
https://github.com/nasa/cumulus/blob/master/packages/api/lambdas/bulk-operation.js#L118
Is that just going to be done in a second PR? That's fine and would be logical, just wanted to make sure it was on your radar.
…://github.com/nasa/cumulus into feature/CUMULUS-2309-granule-parallel-deletes
Yes, there was a circular dependency caused by having that helper in /lib/granules. Moving it to a separate file in /lib did the trick: |
My concerns appear to have been addressed. 👍
This is now handled in the granule's del() endpoint and the granule-delete.js helper.
So the last (🤞 ) change here revises the "Remove a granule from CMR" functionality. Previously this was handled by a couple methods on the Dynamo model. Those still exist but I copied them to a new I've marked the public Dynamo model function as "deprecated" but I was hesitant to remove it. Really just to limit the scope of changes. We'll be removing these models altogether so it seemed alright but I'm open to pushback. I'm also not 100% comfortable with the pattern here of tossing things like this in /lib as their own files. There is similar work coming up in the next PR for bulk deletes where I had to move another, related-but-different Dynamo Granule method to /lib so I may revisit the organization here in the next PR. |
My $.02:
|
…://github.com/nasa/cumulus into feature/CUMULUS-2309-granule-parallel-deletes
} | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is a little weird right now because this unpublishGranule
function is still driven by the Dynamo Granule. I think that's probably right for this phase but once we've fully switched to PG records I think we'll want to throw an error here if we can't get the PG Collection (or, by extension, update the PG Granule).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll want to throw an error here if we can't get the PG Collection (or, by extension, update the PG Granule).
If that is the intent, which I think it should be, then the code is not doing that. Inside the catch
, we're only re-throwing if the error is not RecordDoesNotExist
. But a RecordDoesNotExist
is what would be thrown if we can't get the PG collection. So I think we want to remove this if
condition at least, but really I think we can remove the entire try/catch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the code is actually doing what I'd intended but my comment was unclear.
What I'm saying in that comment (which I'm not saying is right but is just what I'm saying) is "What we should do in phase 2, once the PG reads are implemented and Dynamo is no longer the source of truth, is throw an error if we can't get the PG Collection or update the Granule." In other words unpublishGranule should fail if we can't get the PG Collection. BUT I don't think we want to do that now. I think the logic as-written might be what we want because I think this could potentially be used in the case where we don't have a PG record because it hasn't been migrated?
If some other error is thrown because the database connection is refused for some reason, that should be thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, yeah. I think that makes sense
I was vaguely concerned about the possibility of introducing drift here where the Dynamo record gets updated but not the PG record, but in theory if the PG collection record lookup fails then we most likely don't even have a PG granule (since it requires a collection).
So I think we're good 👍
That reasoning for removing the Dynamo method makes sense. I've just committed the other changes with that method still in place. Once this passes CI I'll circle back, remove that method, and update the tests. I think you're right that we should do it now. If it's used somewhere else that would cause problems we should know about it. |
Summary: Summary of changes
Addresses CUMULUS-2309: Implement parallel delete for granules/files
Changes
** Note that the ticket also asks for updating Bulk Granule Delete to accommodate parallel deletes. I'm working on that in a separate PR. It seemed like a good place to split this work up.
PR Checklist