Skip to content

[batch] Expose region variable to users#12465

Merged
danking merged 12 commits intohail-is:mainfrom
jigold:region-status
Nov 29, 2022
Merged

[batch] Expose region variable to users#12465
danking merged 12 commits intohail-is:mainfrom
jigold:region-status

Conversation

@jigold
Copy link
Contributor

@jigold jigold commented Nov 15, 2022

This adds what region the job ran into both the job environment as well as a field in the job status.

Copy link
Contributor

@daniel-goldstein daniel-goldstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple nits


self.timings: Timings = Timings()

self.env['HAIL_REGION'] = REGION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be

Suggested change
self.env['HAIL_REGION'] = REGION
self.env.append({'name': 'HAIL_REGION', 'value': REGION})

DOCKER_PREFIX=$(jq -r '.docker_prefix' userdata)

INTERNAL_GATEWAY_IP=$(jq -r '.internal_ip' userdata)
REGION=$(jq -r '.region' userdata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to grab this from the instance config more like how we do in GCP. I could see this getting missed if we added more regions to azure.

return 'Standard_D2ds_v4'


def nondefault_region(cloud):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only used in one test can we just keep it in that test? Also this is a nit because I don't see us changing our default region to us-east1, but it does bother me slightly that it's not necessarily correct. We could make it correct by defining two regions and picking one that is not the default region.


def region_for(self, location: str) -> str:
# location = zone
if location.startswith('projects'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this "projects" about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is the return value from using Google's metadata server is a path and not just the zone name. It's like /projects/hail-vdc/random-hash/us-central1-a. I can either pass the zone directly that we know when creating the instance into the startup script or I can try and do a better job integrating the parsing with the other place in the code base where we have to parse the ZONE that starts with projects. Preferences?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ya let's just use the custom userdata for this like in your first commit and not use the metadata server's endpoint.

assert instance_config.cores == CORES
assert instance_config.cloud == CLOUD

REGION = instance_config.region_for(LOCATION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just pass REGION as the env variable here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I had that previously, but you suggested using the instance config. I could be wrong though. This question is related to the comment above about how to pass the ZONE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry that's not what I meant. I meant that in your original commit I liked the gcp create_instance.py changes, with the caveat that you could get the region by using the instance_config.region_for method instead of doing a string split.

What I intended was that the environment variable given to the worker be REGION and that we get that value in the first line of the create_vm_config function by doing

region = instance_config.region_for(location)

assert False


@skip_in_azure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also work in azure, can we make a test fixture here that passes a us-east1 to the test method in gcp and eastus in azure? This PR is more testing that the region is provided in the status not necessarily that we can successfully schedule in different regions so it still makes sense as a test on azure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also have the job command print the env variable and assert that it shows up in the job log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally wanted to test it creates a VM in a different region, but I was concerned that this will continue to add to test times as it's another VM that we have to wait on to create and startup. Do you have preferences on what the test should do? I can increase the timeout on the batch tests to account for this. Also, it would have to be a highcpu instance because we've capped the number of standard instances so it will probably never be able to create a new instance before it times out.

assert instance_config.cores == CORES
assert instance_config.cloud == CLOUD

REGION = instance_config.region_for(LOCATION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry that's not what I meant. I meant that in your original commit I liked the gcp create_instance.py changes, with the caveat that you could get the region by using the instance_config.region_for method instead of doing a string split.

What I intended was that the environment variable given to the worker be REGION and that we get that value in the first line of the create_vm_config function by doing

region = instance_config.region_for(location)


def region_for(self, location: str) -> str:
# location = zone
if location.startswith('projects'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ya let's just use the custom userdata for this like in your first commit and not use the metadata server's endpoint.

@danking danking merged commit 330b295 into hail-is:main Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants