Skip to content

Use bisect for listing deduplication#1010

Merged
martindurant merged 1 commit intofsspec:mainfrom
martindurant:bisect_listing
Feb 18, 2026
Merged

Use bisect for listing deduplication#1010
martindurant merged 1 commit intofsspec:mainfrom
martindurant:bisect_listing

Conversation

@martindurant
Copy link
Member

Fixes #1009

@karlanka , can you please run your benchmark again? This can be installed with

pip install git+https://github.com/martindurant/s3fs.git@bisect_listing

@karlanka
Copy link

Looking good!

         6657371 function calls (6534658 primitive calls) in 6.716 seconds

   Ordered by: internal time
   List reduced from 1788 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      785    1.665    0.002    1.665    0.002 {method 'control' of 'select.kqueue' objects}
    41/28    1.655    0.040    1.762    0.063 {method 'acquire' of '_thread.lock' objects}
      248    0.601    0.002    2.447    0.010 /Users/hmmaka/repon/data_platform/.venv/lib/python3.12/site-packages/s3fs/core.py:925(_find)
     1247    0.308    0.000    0.308    0.000 {built-in method posix.listdir}
   161941    0.225    0.000    0.427    0.000 /Users/hmmaka/repon/data_platform/.venv/lib/python3.12/site-packages/dateutil/parser/_parser.py:77(get_token)
        2    0.198    0.099    0.228    0.114 {built-in method _socket.getaddrinfo}
     3404    0.115    0.000    0.115    0.000 {built-in method posix.stat}
 12470/13    0.112    0.000    1.595    0.123 /Users/hmmaka/repon/data_platform/.venv/lib/python3.12/site-packages/botocore/parsers.py:454(_handle_structure)
    12457    0.080    0.000    0.932    0.000 /Users/hmmaka/repon/data_platform/.venv/lib/python3.12/site-packages/dateutil/parser/_parser.py:666(_parse)
   629868    0.069    0.000    0.069    0.000 {method 'get' of 'dict' objects}


Found 12457 files in 6.73 seconds

Vs with s3fs==2026.2.0:

         84064310 function calls (83941544 primitive calls) in 15.649 seconds

   Ordered by: internal time
   List reduced from 1788 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 77557284    5.325    0.000    5.325    0.000 /Users/hmmaka/repon/data_platform/.venv/lib/python3.12/site-packages/s3fs/core.py:1012(<genexpr>)
    12507    3.666    0.000    8.985    0.001 {built-in method builtins.any}
     1245    2.039    0.002    2.039    0.002 {method 'control' of 'select.kqueue' objects}
    77/64    1.497    0.019    1.608    0.025 {method 'acquire' of '_thread.lock' objects}
    18/16    0.532    0.030   10.696    0.668 /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py:637(wait)
     1247    0.292    0.000    0.292    0.000 {built-in method posix.listdir}
   161941    0.220    0.000    0.419    0.000 /Users/hmmaka/repon/data_platform/.venv/lib/python3.12/site-packages/dateutil/parser/_parser.py:77(get_token)
        2    0.172    0.086    0.197    0.099 {built-in method _socket.getaddrinfo}
 12470/13    0.105    0.000    1.544    0.119 /Users/hmmaka/repon/data_platform/.venv/lib/python3.12/site-packages/botocore/parsers.py:454(_handle_structure)
     3404    0.082    0.000    0.082    0.000 {built-in method posix.stat}


Found 12457 files in 15.66 seconds

@martindurant
Copy link
Member Author

Thanks for the sanity check.

I'll merge and may make a release in the next ~10 days.

@martindurant martindurant merged commit 14a8f7d into fsspec:main Feb 18, 2026
16 checks passed
@martindurant martindurant deleted the bisect_listing branch February 18, 2026 17:44
@karlanka
Copy link

Great, thanks a lot for the quick resolution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

find/glob slow when running in docker

2 participants