Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-30461: glob: sort the resulting list #1794

Closed
wants to merge 1 commit into from

Conversation

bmwiedemann
Copy link
Contributor

@bmwiedemann bmwiedemann commented May 24, 2017

because POSIX readdir does not guarantee any order
glob often gave unexpectedly random results.

This change makes it behave similar to POSIX glob(3).

Some background:
for openSUSE Linux we build packages in the Open Build Service (OBS)
which tracks dependencies, so when e.g. a new glibc is submitted,
all packages depending on glibc are rebuilt
and if those depending binaries changed,
the new version is pushed to the mirrors.

Many python modules build their .so files from a glob.glob("*.cpp")

The old glob behaviour would often lead to the linker
randomly ordering functions in resulting object files,
thus we were not able to auto-detect
that the package did not actually change
which wastes bandwidth of distribution mirrors and users.

See also https://reproducible-builds.org/ on that topic.

This change should not break existing software
because there were no guarantees on ordering of glob results.

Measurements with 'perf' show the new code to be 4ms / 1.07x slower
(for /usr/*/* with 9854 files)

The alternative would be to patch each package individually
but that would be quite some effort and not be as nice to use
as can be seen in
https://www.riverbankcomputing.com/pipermail/pyqt/2017-May/039214.html

and there are plenty others out there
https://github.com/pytries/datrie/blob/master/setup.py#L10
https://github.com/jonashaag/bjoern/blob/master/setup.py#L6
https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/dsolve/setup.py#L28

@mention-bot
Copy link

@bmwiedemann, thanks for your PR! By analyzing the history of the files in this pull request, we identified @hynek, @benjaminp and @serhiy-storchaka to be potential reviewers.

@bmwiedemann bmwiedemann changed the title glob: sort the resulting list bpo-30461: glob: sort the resulting list May 24, 2017
@serhiy-storchaka serhiy-storchaka self-requested a review May 24, 2017 20:39
@nascheme
Copy link
Member

I like the idea as we should match POSIX glob(3). Micro-optimization: use sorted(iter) rather than sorted(list(iter)).

because POSIX readdir does not guarantee any order
glob often gave unexpectedly random results.

This change makes it behave similar to POSIX glob(3).

Some background:
for openSUSE Linux we build packages in the Open Build Service (OBS)
which tracks dependencies, so when e.g. a new glibc is submitted,
all packages depending on glibc are rebuilt
and if those depending binaries changed,
the new version is pushed to the mirrors.

Many python modules build their .so files from a glob.glob("*.cpp")

The old glob behaviour would often lead to the linker
randomly ordering functions in resulting object files,
thus we were not able to auto-detect
that the package did not actually change
which wastes bandwidth of distribution mirrors and users.

See also https://reproducible-builds.org/ on that topic.

This change should not break existing software
because there were no guarantees on ordering of glob results.

Measurements with 'perf' show the new code to be 4ms / 1.07x slower
(for /usr/*/* with 9854 files)

The alternative would be to patch each package individually
but that would be quite some effort and not be as nice to use
as can be seen in
https://www.riverbankcomputing.com/pipermail/pyqt/2017-May/039214.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants