Skip to content

Make ExtendedGcsFileSystem the default implementation#773

Merged
ankitaluthra1 merged 14 commits intofsspec:mainfrom
ankitaluthra1:default-to-extended-gcsfs
Mar 13, 2026
Merged

Make ExtendedGcsFileSystem the default implementation#773
ankitaluthra1 merged 14 commits intofsspec:mainfrom
ankitaluthra1:default-to-extended-gcsfs

Conversation

@ankitaluthra1
Copy link
Copy Markdown
Collaborator

@ankitaluthra1 ankitaluthra1 commented Mar 6, 2026

  • Default HNS Support: The ExtendedGcsFileSystem is now the default implementation for gcsfs.GCSFileSystem, enabling multi storage support by default for improved directory operations.
  • New HNS Documentation: Comprehensive documentation has been added to explain Hierarchical Namespace (HNS) in Google Cloud Storage and how gcsfs leverages ExtendedGcsFileSystem for enhanced HNS interactions.
  • Configuration Change: The default value for the GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT environment variable has been changed from false to true, making multi storage support opt-out rather than opt-in.

@ankitaluthra1 ankitaluthra1 changed the title Make ExtendedGcsFileSystem the default implementation Brief & Direct Make ExtendedGcsFileSystem the default implementation Mar 6, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.00%. Comparing base (bd27d9b) to head (a99e655).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #773      +/-   ##
==========================================
+ Coverage   75.02%   76.00%   +0.97%     
==========================================
  Files          14       14              
  Lines        2623     2625       +2     
==========================================
+ Hits         1968     1995      +27     
+ Misses        655      630      -25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@martindurant
Copy link
Copy Markdown
Member

Of course, this is still in draft, but:

  • we will want to make a public splash about this, between when this PR merges and release
  • GCSFileSystem effectively becomes deprecated (or do some Extended methods still call back?)
  • we need to ensure that we have full benchmark coverage and be sure that there is no speed regression even for normal buckets, which is the majority of the user base
  • Maybe after release, we can consider renaming Extended to GCSFileSystem and removing the code altogether; some downstream code probably directly instantiates it.

In short, I am recommending a release plan to go along with this, perhaps as a github project or milestone.

@martindurant
Copy link
Copy Markdown
Member

Alternatively, we can delay this PR until after the next release, and ask the public to test the implementation via GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT , to see if they uncover any adverse effects the tests haven't cought.

@ankitaluthra1
Copy link
Copy Markdown
Collaborator Author

Of course, this is still in draft, but:

  • we will want to make a public splash about this, between when this PR merges and release
  • GCSFileSystem effectively becomes deprecated (or do some Extended methods still call back?)
  • we need to ensure that we have full benchmark coverage and be sure that there is no speed regression even for normal buckets, which is the majority of the user base
  • Maybe after release, we can consider renaming Extended to GCSFileSystem and removing the code altogether; some downstream code probably directly instantiates it.

In short, I am recommending a release plan to go along with this, perhaps as a github project or milestone.

> * GCSFileSystem effectively becomes deprecated (or do some Extended methods still call back?) -- GCSFileSystem would still remain as the core for normal buckets, its also used for few operations (like metadata for zonal and data for HNS) for specialised buckets.

we will want to make a public splash about this, between when this PR merges and release -- We definitely want to make a splash and with data (benchmarks numbers) thats why there is no mention for Zonal buckets yet. We ll be covering that in separate PR.

> * Maybe after release, we can consider renaming Extended to GCSFileSystem and removing the code altogether; some downstream code probably directly instantiates it. -- Yes, I would raise more PRs for cleanup, some are smaller (testing related cleanup), which I ll fast follow, some like you suggested we can do after release and feedback.

The idea for this change is to remove any friction point for customers who want to experiment and get the early feedback without any friction. The change is backward compatible so we should be able to make this change.

@ankitaluthra1 ankitaluthra1 marked this pull request as ready for review March 6, 2026 15:43
@martindurant
Copy link
Copy Markdown
Member

GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT environment variable has been changed from false to true

So you mean this will only have an effect for the specialised buckets? That's not how I read it.

@ankitaluthra1
Copy link
Copy Markdown
Collaborator Author

ankitaluthra1 commented Mar 6, 2026

GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT environment variable has been changed from false to true

So you mean this will only have an effect for the specialised buckets? That's not how I read it.

The entry point would change for all bucket types, that is needed to identify the bucket type, if the bucket is identified as normal, the request would be passed to core.GCSFileSystem or core.GCSFile (based on operation) so for normal buckets there should be no change
code ref for file based operations
example for FileSystem operation

@martindurant
Copy link
Copy Markdown
Member

OK, sorry for the confusion. It is still a change in default behaviour for those that do have specialised buckets - do you know how common that is now?

@ankitaluthra1
Copy link
Copy Markdown
Collaborator Author

ankitaluthra1 commented Mar 6, 2026

OK, sorry for the confusion. It is still a change in default behaviour for those that do have specialised buckets - do you know how common that is now?

HNS buckets: All existing object apis also work for HNS, so there might be some usage. For HNS, there is only change in dir apis (like renamedir etc), we also fallback to existing object apis in case there is any error in new dir apis (example). Data path apis remains same as normal buckets for HNS, everything is tested on actual HNS buckets with new pipeline, so we should okay for HNS bucket type.
Zonal buckets: Since existing object apis don't work on zonal buckets, this would not have any existing usage, so no impact for zonal.

@martindurant
Copy link
Copy Markdown
Member

Comment on places with statements like "more performant": will we be able to directly link benchmark data to these statements? For example, even though a hierarchical rm can delete an entire file tree in one DELETE call, actually issuing 1000 DELETE calls concurrently is almost as fast as a single one (the slow part is listing the files to be deleted). So it's best to be armed with numbers.

@martindurant
Copy link
Copy Markdown
Member

should we change the variable name from GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT to GCSFS_ZB_HNS_SUPPORT

I would be happy to include support for a new simpler name, but then we would need to support both, since the original one is already available, even if not widely publicised.

@ankitaluthra1
Copy link
Copy Markdown
Collaborator Author

should we change the variable name from GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT to GCSFS_ZB_HNS_SUPPORT

I would be happy to include support for a new simpler name, but then we would need to support both, since the original one is already available, even if not widely publicised.

then there is not much value (we would still have to document the older name). Since this is temporary, I think we can continue with existing name then

@ankitaluthra1
Copy link
Copy Markdown
Collaborator Author

ankitaluthra1 commented Mar 11, 2026

Comment on places with statements like "more performant": will we be able to directly link benchmark data to these statements? For example, even though a hierarchical rm can delete an entire file tree in one DELETE call, actually issuing 1000 DELETE calls concurrently is almost as fast as a single one (the slow part is listing the files to be deleted). So it's best to be armed with numbers.

Updated the benchmarks results for rename in the docs which is claimed to be more performant than normal buckets in the same docs

@ankitaluthra1
Copy link
Copy Markdown
Collaborator Author

@martindurant are we good in merging this PR ?

@martindurant
Copy link
Copy Markdown
Member

Let's do it

@ankitaluthra1 ankitaluthra1 merged commit e95afc0 into fsspec:main Mar 13, 2026
9 checks passed
@ankitaluthra1 ankitaluthra1 deleted the default-to-extended-gcsfs branch March 18, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants