Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gce image support #7080

Merged
merged 5 commits into from
Sep 17, 2020
Merged

Conversation

bentsi
Copy link
Contributor

@bentsi bentsi commented Aug 20, 2020

  • Add necessary changes to scylla_util.py in order to support Scylla Machine Image in GCE.
  • Fixes and improvements for curl function.

# @param headers dict of k:v
def curl(url, byte=False, headers={}):
def curl(url, headers, byte=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change. Why switch the order around? Why make headers mandatory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually, the order of arguments is chosen by their use, importance, and common use case
example:

  • curl(url)
  • curl(url, headers)
    and if we want to encode the output as byte string we set byte=True
  • curl(url, byte=True)
  • curl(url, headers, byte=True)
    this is a better API.

Regarding making headers mandatory you are right, I will change it (but we can't use a mutable {} as a default (https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what about the removal of the default?

The goal is not to confuse the maintainer with lots of unrelated changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fixing a bug, we can't use a mutable ({}) as a default argument to the function (https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments)
I can separate it as a different commit.

time.sleep(5)
retries += 1
if (retries >= max_retries):
if retries >= max_retries:
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look unrelated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsi You can add another patch for this kind of style change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is styling fix, no need to put the round brackets here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to separate commit

@bentsi bentsi force-pushed the gce-image-support branch 3 times, most recently from 354fae6 to bba96ef Compare August 24, 2020 21:47
# @param headers dict of k:v
def curl(url, byte=False, headers={}):
def curl(url, headers=None, byte=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not write headers={}?

The subject of the commit is wrong, since we could pass headers before. As fat as I can tell, this commit does nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix the commit message. It was originally done before Lubus changes that added the headers argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This way I understand the change, and it can be backported independently if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avi, FYI doing headers={} in a function signature, can bite someone very hard,
since this dict is created only once. it basically becomes a global variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unexpected for a nice language such as Python.

return curl("http://169.254.169.254/computeMetadata/v1/instance/" + path+"?recursive=true", headers={"Metadata-Flavor": "Google"})
#169.254.169.254 is metadata.google.internal
def __instance_metadata(self, path, recursive=False):
return curl(self.META_DATA_BASE_URL + path + "?recursive=%s" % str(recursive).lower(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after the change, the information in the comment is not useful here

def is_ec2():
patterns_files = ((r"^Amazon EC2$", "/sys/class/dmi/id/board_vendor"),
(r"^ec2.*", "/sys/hypervisor/uuid"))
return match_patterns_in_files(patterns_files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fragile. Is there no way to do this from the metadata server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what AWS recommends. You can read more here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/identify_ec2_instances.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second one is not accurate. There is a 1:4000 chance of a random uuid starting with "ec2". They even document it as "probably".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use the instance metadata, since we have to read it anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

max_retries = 5
retries = 0
while True:
try:
req = urllib.request.Request(url,headers=headers)
req = urllib.request.Request(url, headers={} if not headers else headers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could write this as headers or {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. This will look better. Fixed.

@bentsi bentsi force-pushed the gce-image-support branch 3 times, most recently from f2e8df9 to 16f8796 Compare August 27, 2020 23:10
@slivne
Copy link
Contributor

slivne commented Sep 14, 2020

@avikivity / @syuu1228 - according to bentsi he is fixed everything - please review and approve so we can move forward

@@ -369,6 +394,14 @@ def __init__(self):
self._type = self.__instance_metadata("instance-type")
self.__populate_disks()

def is_aws_instance(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is_gce_instance is static, and this one isn't ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or at least a class method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole scylla_util lacks basic architecture and basic usage of OOP design principles (Base class in this case). I am not going to redesign and rewrite this...
I can't use static method since I need META_DATA_BASE_URL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use classmethod

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

elif os.path.exists('/sys/class/dmi/id/board_vendor'):
with open('/sys/class/dmi/id/board_vendor') as f:
s = f.read()
return True if re.match(r'^Amazon EC2$', s) else False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to remove is_ec2() to support GCE?
There are multiple scripts relies on the function, if you remove this you will need to fix them too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, my bad, deleted it mistakenly when resolving conflicts with Lubush changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@slivne
Copy link
Contributor

slivne commented Sep 15, 2020

@bentsi ping

Bentsi Magidovich added 5 commits September 16, 2020 11:25
…ction

- It was set to {} that is incorrect and can lead to unexpected behavior
https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments
- Order of the arguments changed to more convinient way
- rename deprecated logging.warn to logging.warning
- remove redundant round brackets in the if statement
@bentsi
Copy link
Contributor Author

bentsi commented Sep 16, 2020

@slivne all comments resolved

@avikivity
Copy link
Member

@syuu1228 @fruch please re-review.

Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks o.k.

return False


def is_ec2():
return aws_instance.is_aws_instance()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so above code depended on /sys
this new code depends on http call to a service ...

I actually like the check for /sys much better, since no network is involved
shouldn't we keep the /sys check and use it before the http metadata call?
(I didn't do it for GCP machines, since I didn't find a way to do it, but since we have it for AWS, why making this check slower by adding network to the game?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also GCP actually just does dns query (but I'd love it to be optimized and use /sys or any other variable instead)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarzanek That's how it was originally implemented but Avi was not happy about it:
comment from above:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so how will this method then behave on non cloud servers? or it will just wait for timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current implementation, it will wait for a timeout... but it's not critical since those functions are mainly used in the Scylla Amazon/GCE images

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants