Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for moving snapshots to s3 #5

Merged
merged 19 commits into from
Sep 29, 2020
Merged

Conversation

RiyaJohn
Copy link
Contributor

@RiyaJohn RiyaJohn commented Aug 3, 2020

Description

Thanks for contributing this Pull Request. Make sure that you submit this Pull Request against the master branch of this repository, add a brief description, and tag the relevant issue(s) and PR(s) below.
Fixes #4

@RiyaJohn RiyaJohn requested a review from namitad as a code owner August 3, 2020 18:17
@RiyaJohn RiyaJohn marked this pull request as draft August 3, 2020 18:17
@RiyaJohn RiyaJohn changed the title WIP: export task to s3 WIP: Snapshot to s3 Aug 3, 2020
@RiyaJohn RiyaJohn changed the title WIP: Snapshot to s3 Support for moving snapshots to s3 Aug 9, 2020
@RiyaJohn RiyaJohn marked this pull request as ready for review August 10, 2020 14:25
Copy link
Collaborator

@namitad namitad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, please address.
Also, do add the unit test file for the newly added lambda

src/export/export_snapshot_s3_function.py Outdated Show resolved Hide resolved
src/export/export_snapshot_s3_function.py Outdated Show resolved Hide resolved
snap_arn = snap['DBSnapshotArn']
snap_status = snapshots[0]['Status']
if snap_status == 'available':
return snap_arn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we directly not return snap['DBSnapshotArn'] if snapshot status is available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARN will be available even in creating state, couldn't place it inside if case with cluster support changes now. Let me know if its not ok

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above comment will address this issue as well i believe. do check and verify

@RiyaJohn
Copy link
Contributor Author

RiyaJohn commented Sep 7, 2020

@namitad resolved conflicts that came up. Do take a look the PR that has review changes and support for Cluster snapshots also added.

region = os.environ['Region']
rds = boto3.client('rds', region)
if is_cluster:
snapshots_response = rds.describe_db_cluster_snapshots(DBClusterSnapshotIdentifier=snapshot_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pattern followed is to have separate py files for cluster and instance as they have completely independent branches and the isCluster check is added in the beginning itself.
Please check the pattern followed and move the cluster specific logic to its own file. Any common processing can be added in the common dir

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored to have separate py files for cluster and instance, with common function in util file

snap_arn = snap['DBSnapshotArn']
snap_status = snapshots[0]['Status']
if snap_status == 'available':
return snap_arn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above comment will address this issue as well i believe. do check and verify

response = rds.start_export_task(
ExportTaskIdentifier=export_id,
SourceArn=snapshot_arn,
S3BucketName=bucket_name,
Copy link
Contributor

@despot despot Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use the same bucket name that was provided by the user in https://github.com/intuit/Trapheus/blob/master/README.md -> Instructions -> Setup -> point 2. ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use that too, but that bucket is meant for Trapheus Cloudformation deployment files right. So I did not want to store the snapshots there. What do you guys think @namitad and @stationeros

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure how we can get that bucket name in our lambdas, unless the user passes it in params, which I don't think should be done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3 bucket name needs to be a input as its not part of the state machine while executing sam deploy


DELETE = "Delete"
RENAME = "Rename"
SNAPSHOT = "SnapshotCreation"
DB_RESTORE = "Restore"
CLUSTER_RESTORE = "ClusterRestore"
EXPORT_SNAPSHOT = "SnapshotExportTask"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the Task from SnapShotExportTask as the name doesn't follow the convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,
4d22883

@@ -0,0 +1,52 @@
import os
import time

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove whitespaces between imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stationeros So the blanks between imports are added by PyCharm as the import layout when you click optimize imports cmd

rds = boto3.client('rds', region)
snapshots_response = rds.describe_db_cluster_snapshots(DBClusterSnapshotIdentifier=snapshot_name)
assert snapshots_response['ResponseMetadata'][
'HTTPStatusCode'] == 200, f"Error fetching cluster snapshots: {snapshots_response}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an illegal character "f" present in the line 44. Can you please check that.

This comment was marked as outdated.

assert snapshots_response['ResponseMetadata'][
'HTTPStatusCode'] == 200, f"Error fetching cluster snapshots: {snapshots_response}"
snapshots = snapshots_response['DBClusterSnapshots']
assert len(snapshots) == 1, f"More than one snapshot matches name {snapshot_name}"

This comment was marked as outdated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @namitad already added info around it in the above comment. Just to add to it, F-strings have a shorter, more readable syntax, and is also much faster. Con is, it requires at least Python 3.6, since I saw Trapheus doc already had python 3.7 as a pre-req. I thought it wasn't a problem.

if snap_status == 'available':
return snap['DBClusterSnapshotArn']
else:
raise Exception(f"Snapshot is not available yet, status is {snap_status}")
Copy link

@ghost ghost Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Illegal character . Please check.

snapshot_id = instance_id + constants.SNAPSHOT_POSTFIX
snapshot_arn = get_instance_snapshot_arn(snapshot_id)
account_id = util.get_aws_account_id()
bucket_name = constants.RDS_SNAPSHOTS_BUCKET_NAME_PREFIX + account_id
Copy link
Collaborator

@namitad namitad Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RiyaJohn is it assumed that the s3 bucket is already existing? and if it doesnt, the export task throws an exception?
can we add a testcase on this scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bucket will exist as the bucket is created as part of our deployment here

SnapshotsBucket:
. Don't remember if it will throw error or will create one, will test and update here. Also yes will add a test case for it

Copy link
Contributor

@despot despot Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't throw an exception, when the bucket doesn't exist, as the deployment always creates one (which imo is correct (better)). Imo there is no need handling a case that will not appear (having no export bucket), as that would assume the user tempering manually. Usually system doesn't handle these scenarios. Anyway, just my 2 cents.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@despot agreed. If its being created as part of the deployment, i think we are ok. But considering a scenario that the s3 bucket can be deleted, do you think its good to have that handled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@namitad thanks for considering my opinion. I was trying to find some articles about what i already practiced within the team in the past, though I would need to dig deeper. The idea is, that usually the application refers to doing one thing, and as soon as the team starts handling cases (including only exception handling) what a user can do outside of the prescribed way, the team goes into the direction of solving the problem of the universe. More pragmatically, the team wouldn't wanna handle someone deleting a lambda function or the kmsKeyId manually e.t.c. as this is not what they should do. So I don't think a test should be written for the user deleting the S3 bucket. It all comes down to what you want the application to do. Does a user deleting a S3 bucket a normal behavior that you expect to handle. I don't imagine this app like that. But if you do, then either handle the case (by, for instance, creating a new S3 bucket) or have a custom exception so you remember to do this in future. As I think this shouldn't be handled, imo @RiyaJohn should not do anything more regarding the S3 bucket. But again, you are leading the project :), so let us know.

Copy link
Contributor Author

@RiyaJohn RiyaJohn Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree with @despot. But already pushed code for this test case, so @namitad you can take a look at it b4701c8

src/export/export_snapshot_s3_function.py Outdated Show resolved Hide resolved
src/export/export_cluster_snapshot_s3_function.py Outdated Show resolved Hide resolved
@namitad namitad merged commit f2ac1c4 into intuit:master Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for moving the snapshot to a S3 bucket
3 participants