-
Notifications
You must be signed in to change notification settings - Fork 2
[PROD-1648] Bugfix describe volumes #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sync/awsdatabricks.py
Outdated
| ) -> List[dict]: | ||
| """Get all ebs volumes associated with a list of instance reservations""" | ||
|
|
||
| def get_chunk(instance_ids: list, chunk_size: int) -> Iterator[list]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the return type technically a -> Generator[list, None, None]?
Might want to also type the args like so:
| def get_chunk(instance_ids: list, chunk_size: int) -> Iterator[list]: | |
| def get_chunk(instance_ids: List[str], chunk_size: int) -> Generator[List[str], None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioning this because it can help the IDE auto-complete.
For example if you were to [item.count('a') for item in self.get_chunk(***) pycharm will autocomplete the .count() because it knows its a list of strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah yes, thanks good catch.
romainissynced
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just left a nitpick on typing that doesn't need to be fixed before merge.
sync/__init__.py
Outdated
| @@ -1,4 +1,4 @@ | |||
| """Library for leveraging the power of Sync""" | |||
| __version__ = "1.0.0" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update this to 1.0.3 now
| while next_token: | ||
| response = ec2_client.describe_volumes(Filters=filters, NextToken=next_token) | ||
| volumes += response.get("Volumes", []) | ||
| next_token = response.get("NextToken") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you had a chance to test this out on a big cluster yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I tested this out by kicking off a big job with 300 .large instances and then wrote a local scipt to call describe_volumes() on that cluster_id while the cluster was running and it worked fine (also worked fine for a cluster with 10 instances)
gorskysd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary
There is a bug currently in the library where a call to the
ec2.describe_volumesonly accepts at most 199 items in theFiltersargument. However, in the case of very large clusters, there can be more than 199 instance ids passed in, which causes the call to fail.This PR splits the number of filters to chunks of size no more than 199.
Testing Performed
Checklist
Before formally opening this PR, please adhere to the following standards:
Related Jira Ticket