-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[qob] Add regions parameter #12581
[qob] Add regions parameter #12581
Conversation
@@ -157,6 +157,8 @@ async def serialize(self, writer: afs.WritableStream): | |||
|
|||
|
|||
class ServiceBackend(Backend): | |||
input_version = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this versioning here? Seems arbitrary, is it ever checked? Are there instructions on how or when it should be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. It was old cruft from when I thought I had to make the spec backwards compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry one more thing I noticed.
import hail as hl | ||
|
||
from ..helpers import skip_unless_service_backend | ||
from hail.backend.service_backend import ServiceBackend | ||
|
||
CLOUD = os.environ['HAIL_CLOUD'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this impede running other query tests locally? Not sure if it does but it would be annoying to have to specify HAIL_CLOUD
to run tests that have nothing to do with it. I'd prefer in that case to put the os.environ
call in the method that actually needs it and remove the env variable from test steps that aren't testing the service backend.
import hail as hl | ||
|
||
from ..helpers import skip_unless_service_backend | ||
from hail.backend.service_backend import ServiceBackend | ||
|
||
CLOUD = os.environ.get('HAIL_CLOUD') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to treat None
and Azure the same. Can we keep this as os.environ['HAIL_CLOUD']
but just put this line inside the relevant test?
if regions_str is not None: | ||
regions = regions_str.split(',') | ||
else: | ||
regions = bc.supported_regions() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regions parameter below is Optional[List[str]]
, but from this else case it looks like we don't exercise the None
option. I don't have a strong opinion about whether to do one or the other, but I think we should either make regions
not Optional, or keep this None instead of using supported_regions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try and keep the same interface as hailtop.batch
as much as possible for consistency reasons. This would mean we'd need to expose the ServiceBackend.supported_regions
and the equivalent of ServiceBackend.ANY_REGION
. I think they should be hl.supported_regions()
that is implemented as a static method on the service backend when applicable and hl.ANY_REGION
is just an object. Once we figure this out, then we can figure out what makes sense for the types further downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try and keep the same interface as hailtop.batch as much as possible for consistency reasons.
Totally agree with this, but I see the constructor of the ServiceBackend
object and its implementation as private, so I don't see why we can't currently write the type as it's being used -- if it's never None
let's use that knowledge to cut down on None checks that are currently dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the user-facing API should be as close to batch as possible, but the ServiceBackend
for query is not a user-facing API. Users only use hl.init
which select the backend to use.
@@ -329,6 +340,9 @@ async def _rpc(self, | |||
await write_str(infile, orjson.dumps(reference_config).decode('utf-8')) | |||
await write_str(infile, str(self.worker_cores)) | |||
await write_str(infile, str(self.worker_memory)) | |||
await write_int(infile, len(self.regions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks if self.regions
is None, so would need to change if we allow this to be optional
Some(regionsArrayBuffer.toArray) | ||
} else { | ||
None | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would the service_backend
ever send an empty list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left further comments in the thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes so it's off my CI page. Looks fine there's just merge conflicts now.
I know you've got a lot on your plate, but it would be awesome if we could get this in tomorrow or early next week. Of all my remaining PRs, I think this one is highest priority and should be close to being mergeable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder. This looks good
CHANGELOG: Added the regions parameter to
hl.init()
for specifying which regions Query on Batch jobs should run in.