Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cumulus 2309 granule parallel deletes #2057

Merged
merged 66 commits into from Feb 19, 2021

Conversation

npauzenga
Copy link
Contributor

@npauzenga npauzenga commented Jan 27, 2021

Summary: Summary of changes

Addresses CUMULUS-2309: Implement parallel delete for granules/files

Changes

  • Granule DELETE endpoint now deletes Postgres granules, Dynamo granules, Postgres Files belonging to a granule, and S3 files belonging to a granule
  • Model changes on FilePgModel and GranulePgModel to support the new delete behavior
  • New delete() method on BasePgModel

** Note that the ticket also asks for updating Bulk Granule Delete to accommodate parallel deletes. I'm working on that in a separate PR. It seemed like a good place to split this work up.

PR Checklist

  • Update CHANGELOG
  • Unit tests
  • Adhoc testing
  • Integration tests

README.md Show resolved Hide resolved
@npauzenga npauzenga marked this pull request as ready for review February 3, 2021 14:06
Copy link
Contributor

@markdboyd markdboyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I mostly had thoughts about how we should structure the delete operations and where that code should live. Happy to do a pair review to talk further

Also, I didn't notice anything in this PR related to updating the bulk delete functionality:

https://github.com/nasa/cumulus/blob/master/packages/api/lambdas/bulk-operation.js#L118

Is that just going to be done in a second PR? That's fine and would be logical, just wanted to make sure it was on your radar.

packages/api/endpoints/granules.js Outdated Show resolved Hide resolved
packages/api/tests/endpoints/test-granules.js Outdated Show resolved Hide resolved
packages/api/tests/endpoints/test-granules.js Outdated Show resolved Hide resolved
packages/db/src/models/base.ts Outdated Show resolved Hide resolved
packages/aws-client/src/S3.ts Show resolved Hide resolved
packages/db/src/models/file.ts Outdated Show resolved Hide resolved
packages/db/src/models/file.ts Outdated Show resolved Hide resolved
packages/db/src/models/file.ts Outdated Show resolved Hide resolved
packages/db/src/models/base.ts Outdated Show resolved Hide resolved
packages/db/tests/models/test-granule-model.js Outdated Show resolved Hide resolved
@npauzenga
Copy link
Contributor Author

Was this concern addressed?

Yes, there was a circular dependency caused by having that helper in /lib/granules. Moving it to a separate file in /lib did the trick:

https://github.com/nasa/cumulus/pull/2057/files?file-filters%5B%5D=.js&file-filters%5B%5D=.ts#diff-e44c79bfab1f114c5e4c4162d4b808cfbb7d6aca9550cddc8f0e48fd5fc92f7d

@Jkovarik Jkovarik dismissed their stale review February 16, 2021 17:11

My concerns appear to have been addressed. 👍

@npauzenga
Copy link
Contributor Author

So the last (🤞 ) change here revises the "Remove a granule from CMR" functionality. Previously this was handled by a couple methods on the Dynamo model. Those still exist but I copied them to a new unpublishGranule() function in /lib and deprecated the model methods.

c6191fa

I've marked the public Dynamo model function as "deprecated" but I was hesitant to remove it. Really just to limit the scope of changes. We'll be removing these models altogether so it seemed alright but I'm open to pushback.

I'm also not 100% comfortable with the pattern here of tossing things like this in /lib as their own files. There is similar work coming up in the next PR for bulk deletes where I had to move another, related-but-different Dynamo Granule method to /lib so I may revisit the organization here in the next PR.

@markdboyd
Copy link
Contributor

My $.02:

  • I'm fine with deprecating the Dynamo model methods, but would also be fine with deleting them. Once we've refactored such that they're no longer used, I'm inclined to just bite the bullet and remove them now while it's fresh. But this model is going away entirely later so I don't have strong feelings
  • I don't really see the issue with tossing things in lib/ for now. If they were going to be used outside of the API package, I could see an argument for putting them in a new package, but I don't want to do that prematurely

packages/api/lib/utils.js Outdated Show resolved Hide resolved
}
);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is a little weird right now because this unpublishGranule function is still driven by the Dynamo Granule. I think that's probably right for this phase but once we've fully switched to PG records I think we'll want to throw an error here if we can't get the PG Collection (or, by extension, update the PG Granule).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to throw an error here if we can't get the PG Collection (or, by extension, update the PG Granule).

If that is the intent, which I think it should be, then the code is not doing that. Inside the catch, we're only re-throwing if the error is not RecordDoesNotExist. But a RecordDoesNotExist is what would be thrown if we can't get the PG collection. So I think we want to remove this if condition at least, but really I think we can remove the entire try/catch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the code is actually doing what I'd intended but my comment was unclear.

What I'm saying in that comment (which I'm not saying is right but is just what I'm saying) is "What we should do in phase 2, once the PG reads are implemented and Dynamo is no longer the source of truth, is throw an error if we can't get the PG Collection or update the Granule." In other words unpublishGranule should fail if we can't get the PG Collection. BUT I don't think we want to do that now. I think the logic as-written might be what we want because I think this could potentially be used in the case where we don't have a PG record because it hasn't been migrated?

If some other error is thrown because the database connection is refused for some reason, that should be thrown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yeah. I think that makes sense

I was vaguely concerned about the possibility of introducing drift here where the Dynamo record gets updated but not the PG record, but in theory if the PG collection record lookup fails then we most likely don't even have a PG granule (since it requires a collection).

So I think we're good 👍

@npauzenga
Copy link
Contributor Author

My $.02:

  • I'm fine with deprecating the Dynamo model methods, but would also be fine with deleting them. Once we've refactored such that they're no longer used, I'm inclined to just bite the bullet and remove them now while it's fresh. But this model is going away entirely later so I don't have strong feelings
  • I don't really see the issue with tossing things in lib/ for now. If they were going to be used outside of the API package, I could see an argument for putting them in a new package, but I don't want to do that prematurely

That reasoning for removing the Dynamo method makes sense. I've just committed the other changes with that method still in place. Once this passes CI I'll circle back, remove that method, and update the tests. I think you're right that we should do it now. If it's used somewhere else that would cause problems we should know about it.

@npauzenga npauzenga merged commit 8a64603 into feature-rds Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants