-
Notifications
You must be signed in to change notification settings - Fork 250
[hailtop] Add hailtop.fs for user-level RouterFS functions #12731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1e3aad7
to
fd34227
Compare
8048bd4
to
48cad87
Compare
98aeaa8
to
46c6128
Compare
@@ -202,7 +203,9 @@ async def create(*, | |||
"MY_BILLING_PROJECT'" | |||
) | |||
|
|||
async_fs = RouterAsyncFS('file') | |||
if not isinstance(gcs_requester_pays_configuration, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can also be None
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially going to throw this into another PR and then stack this on top. Do you want me to just keep it in here though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fixed as of now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Actually it looks like this is still failing service backend tests that explicitly request a handful of requester pays buckets, e.g.
test_requester_pays_with_project_more_than_one_partition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK, I follow what your point is. You'll need to change RouterAsyncFS to intelligently use or not use the project based on the buckets and you hoped to do that in a follow up PR?
I think, unfortunately, we have to pull that into this one because folks, at least in principle, could be relying on that behavior.
In particular, IIRC, you get an error if you include a project on a non-requester-pays bucket, right?
3fd1a28
to
aae1582
Compare
So I think I'd appreciate a review on this. Would especially appreciate feedback about the question I wrote in the PR body as well as what to do about documentation and testing:
|
**kwargs): | ||
if not storage_client: | ||
if project is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daniel-goldstein you gotta fix copy.py line 118 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danking What do you mean here? I changed copy.py line 118 to use gcs_requester_pays_configuration
, which I thought was correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must've been looking at an old test
19ec594
to
654a826
Compare
Huh, quite confused as to why the local backend test hung in GCP and not azure. As far as I can tell the test that hung is |
Seems to have starved itself of memory? https://batch.hail.is/batches/7260182/jobs/69 |
Next thing I would try is running the full split. Maybe we need to run that in a memory limited way?
|
Also not the most sophisticated thing but littering the test and the things it calls with print statements, making sure the tests print in real-time, and kicking off another test is something else I'd try. |
Though it does seem like something upstream of hadoop_ls_glob_2 has just sucked all the memory out of the container. |
Also ask Tim about limiting RAM available to Hail & Java, we should probably keep at least a gig dedicated to Python. Maybe that will cause memory errors to appear closer to where they belong. |
One more idea, can you add this https://pypi.org/project/pytest-timestamper/ package? |
Hm, I added timestamper and merged main and the tests passed… getting retested now though |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This does remind me that we don't currently generate docs for hail top anywhere. We probably need to sit down and rethink the structure of the docs a bit.
All the tests are passing but something is hanging. |
Like, pytest is hanging. |
CPU use is flat after that test completes successfully. |
Next thing to try is running the split on your laptop to reproduce |
So, there was a test run that got cancelled (probably main branch changed) but which passed the service backend tests. It confirms that this most recent run ran all the tests, but it has some fishy looking error outputs:
Are we not cleaning up files somewhere and that's somehow hanging the system? |
I pushed a commit that will error on resource warning. Hopefully we can figure out where we're leaking, fix the leaks, and stop the hangs. |
CHANGELOG: Introduce
hailtop.fs
that makes public a filesystem module that works for local fs, gs, s3 and abs. This is now used as theBackend.fs
for hail query but can be used standalone for Hail Batch users byimport hailtop.fs as hfs
.Still have to do the docs but a couple questions remain:
I create a hidden singleton
RouterFS
object so that is used by functions inhailtop.fs
. Should this singleton also be used by the Hail Query backends when they are initialized? How do we propagate configuration information such asrequester_pays_bucket
to the FS?