[batch] Expose region variable to users#12465
Conversation
daniel-goldstein
left a comment
There was a problem hiding this comment.
Just a couple nits
batch/batch/worker/worker.py
Outdated
|
|
||
| self.timings: Timings = Timings() | ||
|
|
||
| self.env['HAIL_REGION'] = REGION |
There was a problem hiding this comment.
I think this should be
| self.env['HAIL_REGION'] = REGION | |
| self.env.append({'name': 'HAIL_REGION', 'value': REGION}) |
| DOCKER_PREFIX=$(jq -r '.docker_prefix' userdata) | ||
|
|
||
| INTERNAL_GATEWAY_IP=$(jq -r '.internal_ip' userdata) | ||
| REGION=$(jq -r '.region' userdata) |
There was a problem hiding this comment.
Probably better to grab this from the instance config more like how we do in GCP. I could see this getting missed if we added more regions to azure.
batch/test/utils.py
Outdated
| return 'Standard_D2ds_v4' | ||
|
|
||
|
|
||
| def nondefault_region(cloud): |
There was a problem hiding this comment.
If this is only used in one test can we just keep it in that test? Also this is a nit because I don't see us changing our default region to us-east1, but it does bother me slightly that it's not necessarily correct. We could make it correct by defining two regions and picking one that is not the default region.
|
|
||
| def region_for(self, location: str) -> str: | ||
| # location = zone | ||
| if location.startswith('projects'): |
There was a problem hiding this comment.
What's this "projects" about?
There was a problem hiding this comment.
The issue is the return value from using Google's metadata server is a path and not just the zone name. It's like /projects/hail-vdc/random-hash/us-central1-a. I can either pass the zone directly that we know when creating the instance into the startup script or I can try and do a better job integrating the parsing with the other place in the code base where we have to parse the ZONE that starts with projects. Preferences?
There was a problem hiding this comment.
ah ya let's just use the custom userdata for this like in your first commit and not use the metadata server's endpoint.
batch/batch/worker/worker.py
Outdated
| assert instance_config.cores == CORES | ||
| assert instance_config.cloud == CLOUD | ||
|
|
||
| REGION = instance_config.region_for(LOCATION) |
There was a problem hiding this comment.
Can we just pass REGION as the env variable here instead?
There was a problem hiding this comment.
I thought I had that previously, but you suggested using the instance config. I could be wrong though. This question is related to the comment above about how to pass the ZONE.
There was a problem hiding this comment.
Ah sorry that's not what I meant. I meant that in your original commit I liked the gcp create_instance.py changes, with the caveat that you could get the region by using the instance_config.region_for method instead of doing a string split.
What I intended was that the environment variable given to the worker be REGION and that we get that value in the first line of the create_vm_config function by doing
region = instance_config.region_for(location)
batch/test/test_batch.py
Outdated
| assert False | ||
|
|
||
|
|
||
| @skip_in_azure |
There was a problem hiding this comment.
This should also work in azure, can we make a test fixture here that passes a us-east1 to the test method in gcp and eastus in azure? This PR is more testing that the region is provided in the status not necessarily that we can successfully schedule in different regions so it still makes sense as a test on azure.
There was a problem hiding this comment.
Let's also have the job command print the env variable and assert that it shows up in the job log
There was a problem hiding this comment.
I originally wanted to test it creates a VM in a different region, but I was concerned that this will continue to add to test times as it's another VM that we have to wait on to create and startup. Do you have preferences on what the test should do? I can increase the timeout on the batch tests to account for this. Also, it would have to be a highcpu instance because we've capped the number of standard instances so it will probably never be able to create a new instance before it times out.
batch/batch/worker/worker.py
Outdated
| assert instance_config.cores == CORES | ||
| assert instance_config.cloud == CLOUD | ||
|
|
||
| REGION = instance_config.region_for(LOCATION) |
There was a problem hiding this comment.
Ah sorry that's not what I meant. I meant that in your original commit I liked the gcp create_instance.py changes, with the caveat that you could get the region by using the instance_config.region_for method instead of doing a string split.
What I intended was that the environment variable given to the worker be REGION and that we get that value in the first line of the create_vm_config function by doing
region = instance_config.region_for(location)|
|
||
| def region_for(self, location: str) -> str: | ||
| # location = zone | ||
| if location.startswith('projects'): |
There was a problem hiding this comment.
ah ya let's just use the custom userdata for this like in your first commit and not use the metadata server's endpoint.
This adds what region the job ran into both the job environment as well as a field in the job status.