Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for moving the snapshot to a S3 bucket #4

Closed
ghost opened this issue Jun 10, 2020 · 8 comments · Fixed by #5
Closed

Add support for moving the snapshot to a S3 bucket #4

ghost opened this issue Jun 10, 2020 · 8 comments · Fixed by #5
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@ghost
Copy link

ghost commented Jun 10, 2020

Right now the snapshot created is stored in the source AWS account. We want to add support for storing of this snapshot in a S3 bucket thereby opening up to additional DR use cases.

Please refer the below doc.

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ExportSnapshot.html

@ghost ghost added enhancement New feature or request good first issue Good for newcomers labels Jun 10, 2020
@RiyaJohn
Copy link
Contributor

RiyaJohn commented Aug 3, 2020

Hi! I would like to work on this issue, would be great if I could get more info on where we plan to add this step in the state machine.

@despot
Copy link
Contributor

despot commented Oct 1, 2020

@namitad @stationeros this should be reopened as there are 24 regions in total (https://aws.amazon.com/about-aws/global-infrastructure/#:~:text=AWS%20Global%20Infrastructure%20Map,Indonesia%2C%20Japan%2C%20and%20Spain.) out of which only 5 are supporting exporting snapshot to S3 (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ExportSnapshot.html):
US East (N. Virginia)
US East (Ohio)
US West (Oregon)
Europe (Ireland)
Asia Pacific (Tokyo)
This make this feature only ~20% effective and failing in ~80% of the time.

The following should be done to get the rest of the ~80%:
You can copy a snapshot from an AWS Region where S3 export isn't supported to one where it is supported, then export the copy. The S3 bucket must be in the same AWS Region as the copy. (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ExportSnapshot.html)

I already create a PR for the non-cluster part: #34 . Kindly review it, and if ok, I will do the cluster part as well.

@despot despot mentioned this issue Oct 1, 2020
5 tasks
@ghost
Copy link
Author

ghost commented Oct 4, 2020

Thanks @despot for notifying about this. While I am aware of the support of the snapshot export function typically being limited to five regions, due to the limit on the number of access control list (ACL) entries for a VPC, I am also mindful of the fact that the export across region will induce a certain lag primarily due to the longer network channels between regional data centers.

IMHO the export to S3 feature can be looked at from the purview of offloading snapshots to S3 for long term retention , typically suitable for someone who want to create data lakes and have a way to query their snapshots easily.(P.S. I do see Trapheus branching into that someday as well).

https://aws.amazon.com/blogs/database/building-data-lakes-and-implementing-data-retention-policies-with-amazon-rds-snapshot-export-to-amazon-s3/

However, as it stands today, this is not good if you would like to import data back into RDS or use the exported data to restore an RDS instance or cluster.

Having said that, if we look at our implementation, we do see a potential issue wherein the async export might take more time than the entire state machine to execute (probably might get into a timeout) given the size of data and various other constraints of different vendors. The chances of the same increases multifold in case of a cross-region export. On that front, I have a few larger questions

  1. Is there a way to benchmark the cross region approach in terms of any standard performance metrics.
  2. Is there a way to notify users (via slack or email) that the export to S3 has been successful.

Do let me know your thoughts.

@despot
Copy link
Contributor

despot commented Oct 6, 2020

Hi @stationeros

thanks for the review and thoughts.

Legend:
5-region part/implementation: The implementation part of the feature that does the export for regions that support exporting snapshots to S3.
S3 export supporting region: region that supports exporting a snapshot to S3

"While I am aware of the support of the snapshot export function typically being limited to five regions, due to the limit on the number of access control list (ACL) entries for a VPC, I am also mindful of the fact that the export across region will induce a certain lag primarily due to the longer network channels between regional data centers."
I understand your concerns and possible limits, though even for the 5-region feature, the same applies (the users for Trapheus would need resources to do it). I would suggest this be looked at from the point of providing expected functionality for cases when your default region doesn't support exporting. Regardless whether you view this as the same or a separate feature, someone might expect that they can export to S3 regardless of region, and have the required resources (200 ACLs, time, etc..) to do so. Why wouldn't we allow someone to export to a S3 export supporting region, if they have the resources? Wouldn't it be better for the system to have this as an option regardless of how many use-cases there are for it?

Additionally, as it is built now the user can decide whether to use the feature part from my PR or not. If they don't use it, the resources, commands and time of execution stay the same. This is just a helper in case they need it. So no loss, just gain.

"1. Is there a way to benchmark the cross region approach in terms of any standard performance metrics."
When I was trying it with an empty RDS instance, and regions eu-central-1(Frankfurt) and eu-west-1(Ireland), the copy of the snapshot from one region (Frankfurt) to another(Ireland) usually took less then the time it took to create the snapshot or to export it to an S3 bucket. All the rest of the times are the same as in the 5-region feature part.
For a more serious benchmarking part a more extensive research should be done, and this hasn't been part of this feature ticket so far. I would suggest, kindly, consider merging this to get the value, and later on either

  • create another ticket that will do the benchmarking and a person that has time, resources, to do it (as not a part of this feature now), or
  • the limits of the feature be placed in the README/wiki once someone creates an issue that Trapheus couldn't copy x size of data, or it takes y time for x size of data for this to this region etc.

"2. Is there a way to notify users (via slack or email) that the export to S3 has been successful."
The same approach that was taken for the 5-region feature part, was followed for this feature part (uses SES..). Again, as before, I kindly suggest merging this to get the value, and create another feature to do additional parts that are not part of this feature request.

Kind regards,
Despot

@ghost
Copy link
Author

ghost commented Oct 11, 2020

Thanks @despot , sorry couldn't catch up to this because of vacation time. Agreed that a more rigorous benchmarking does warrant for some dedicated time, although I am glad that we have started the discussion in the direction.

Also to answer the point Why wouldn't we allow someone to export to a S3 export supporting region, if they have the resources? Wouldn't it be better for the system to have this as an option regardless of how many use-cases there are for it?"

I am just being ridiculously cautious about putting sleeper features in the codebase without any particular roadmap for delivering a value out of it, although in this instance it doesn't seem to be the case with having discovered the use case for building data lakes and the likes of it.

Your initial analysis with an empty RDS instance albeit not very conclusive does look a step in the right direction. I would branch out a separate issue for the benchmark tests and hope to get to a more right place in time.

@ghost ghost reopened this Oct 11, 2020
@despot
Copy link
Contributor

despot commented Oct 17, 2020

... @stationeros and I hope you had a great vacation ;)

@stationeros @namitad @RiyaJohn as we are collaborating, feel free to connect with me on linkedin: First name: Despot Last name: Jakimovski.

@ghost
Copy link
Author

ghost commented Oct 17, 2020

@despot Thanks for reaching out. Yes, was finally able to catch a break in these desperate times :) . Will be glad to connect.

@despot
Copy link
Contributor

despot commented Oct 25, 2020

When region supports export (the default/old command), the flow is 1-2-8-9-10-11 (check state numbers in the image below).
For the case when the region doesn't support an export (the additional new command), the flow is 1-2-3-4-5-6-7-8-9-10-11.

export snapshot to s3 when region doesn't support export

@ghost ghost closed this as completed Apr 24, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants