-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable localisation when SOURCE_DATE_EPOCH
is set
#10949
Disable localisation when SOURCE_DATE_EPOCH
is set
#10949
Conversation
Hmm.. the original issue report was specifically about the |
if getenv('SOURCE_DATE_EPOCH') is not None: | ||
# Disable localization during reproducible source builds | ||
# See https://reproducible-builds.org/docs/source-date-epoch/ | ||
# | ||
# Note: Providing an empty/none value to gettext.translation causes | ||
# it to consult various language-related environment variables to find | ||
# locale(s). We don't want that during a reproducible build; we want | ||
# to run through the same code path, but to return NullTranslations. | ||
# | ||
# To achieve that, specify the ISO-639-3 'undetermined' language code, | ||
# which should not match any translation catalogs. | ||
languages: Optional[List[str]] = ['und'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may seem overly complicated compared to, for example:
if getenv('SOURCE_DATE_EPOCH') is not None:
return NullTranslations(), False
...which would achieve similar results.
My thinking here is that the presence/absence of SOURCE_DATE_EPOCH
should, ideally, minimize the effect on the control flow of the program and build.
If the environment variable results in completely different control flows, then arguably the program being built is different.
My sense is that the ideal roadmap for SOURCE_DATE_EPOCH
is to introduce it as-required to achieve reproducible builds, but also to keep in mind that it should be removed in future when no-longer-required.
(and to be precise in the case of sphinx
: a potential time for removal would be when it's possible to perform deterministic documentation builds in all-available-locales -- meaning that the software would be packaged for universal applicability and should build into an identical artefact regardless of the build host's locale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and to be precise in the case of
sphinx
: a potential time for removal would be when it's possible to perform deterministic documentation builds in all-available-locales -- meaning that the software would be packaged for universal applicability and should build into an identical artefact regardless of the build host's locale)
If it is realistic to achive this, I think it's a good goal -- would you be happy to open a new issue to track the 'locale independent' workstream?
A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From searching around before doing that, I think that this would be approximately the same as #788 (although that is specific to HTML, currently).
I'll spend some more time reading before deciding whether to add more comments there or to open a separate ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I appreciate your help here!
A
To confirm: exporting That means that we can't simply configure an empty language list when running in reproducible build mode; |
Thank you for the summary of what is going on here as well as an overview of the proposed solution. It would be great to see this land in Sphinx. |
After looking back at the changes, I think my use of I'm planning to push a few commits soon to move those typecasts earlier in the code flow - to the declaration of the relevant localized string variables. Even so: the fact that this typecasting was necessary was discovered by unit test coverage, and that makes me concerned that there could be other locations in the code that are affected too (without test coverage to highlight that). I don't have a sense for the level of test coverage that's in place here, though. |
…egardless of environment locale) when reproducible builds are enabled
…is configured for reproducible output
… localization test cases
…at localization has an effect on the test application's inventory output
…, since this should not have an associated translation catalog
… LANGUAGE environment variable is configured
… (and confirming that LANGUAGE variable does not affect the result)
…for additional plural forms to the lazy loader translation method
…util.i18n.format_date method (note: mypy doesn't appear to catch this?)
This reverts commit c8d9de3.
…t_inventory_localization
@jayaddison this is only missing a CHANGES entry, then happy to merge! A |
Commit message draft, for later: This commit disables Sphinx's localisation features when reproducible
builds are requested, as determined by a non-empty SOURCE_DATE_EPOCH_
environment variable.
The `Reproducible Builds`_ project aims to provide confidence to
consumers of packaged software that the artefacts they're downloading
and installing have not been altered by the environment they were
built in, and can be replicated at a later date if required.
Builds of localised documentation using Sphinx currently account for
a large category of reproducible build testing failures, because the
builders intentionally use varying environment locales at build-time.
This can affect the contents of the ``objects.inv`` file.
During investigation, it turned out that many ``gettext``-localised
values (particularly in Python modules under ``sphinx.domains``) were
being translated at module-load-time and would not subsequently be
re-localised.
This creates two unusual effects:
1. Attempting to write a test case to build the same application in
two different languages was not initially possible, as the
first-loaded translation catalogue (as found in the
``sphinx.locale.translators`` global variable) would remain in-use
for subsequent application builds under different locales.
2. Localisation of strings could vary depending on whether the
relevant modules were loaded before or after the resource
catalogues were populated.
We fix this by performing all translations lazily so that module
imports can occur in any order and localisation of inventory entries
should occur only when translations of those items are requested.
Localisation can then be disabled by configuring the ``gettext``
language to the ISO-639-3 'undetermined' code (``'und'``), as this
should not have an associated translation catalogue. We also want to
prevent ``gettext`` from attempting to determine the host's locale
from environment variables (including ``LANGUAGE``).
.. _SOURCE_DATE_EPOCH: https://reproducible-builds.org/docs/source-date-epoch/
.. _Reproducible Builds: https://www.reproducible-builds.org/ |
SOURCE_DATE_EPOCH
is set
Thank you @jayaddison! A |
You're welcome - thank you too for the review and merge! I'll keep a lookout for any issue reports related to this in future, especially in the weeks/months after it's released. |
…x-doc#10949)" This reverts commit f82c3c9.
…x-doc#10949)" This reverts commit f82c3c9.
After further investigation of the original bug #9778 today, I believe that the fix here was unnecessary; see commentary at #9778 (comment) for details. It may make sense to scavenge some of the test coverage, but I think that this change should be reverted (and #11306 does that). |
The changes here also cause breakage when help arguments are requested from command-line tools such as |
This has been introduced in sphinx-doc/sphinx#10949
Subject: disables localization when reproducible builds are requested, as determined by a non-empty
SOURCE_DATE_EPOCH
environment variable.Feature or Bugfix
Purpose
The Reproducible Builds project aims to provide confidence to consumers of packaged software that the artefacts they're downloading and installing have not been altered by the environment they were built in, and can be replicated at a later date if required.
The project performs continuous reproducibility testing of many different software packages - many of which fail reproducibility tests for various reasons. The project then identifies ways to fix the causes of differences, or filter out irrelevant differences that should not affect the software's practical behaviour.
Builds of localized documentation using
sphinx
currently account for a large category of reproducible build testing failures, because the builders (intentionally) use varying environment locales at build-time. This can affect the contents of theobjects.inv
file.During investigation, it turned out that many
gettext
-localized values -- particularly in Python modules undersphinx.domains
-- were being translated at module-load-time (in other words, when each Python module wasimport
'ed) -- and would not subsequently be re-localized.This creates two unusual effects:
sphinx.locale.translators
global variable) -- would remain in-use for subsequent application builds under different localesA possible approach to a fix here is a naive one: all translations are performed lazily so that module imports can occur in any order and localization of inventory entries should occur only when translations of those items are requested (during the
objects.inv
dump, in the context of the investigation).With that change in place, it's relatively straightforward to then disable localization (setlanguages = None
) when a reproducible build is requested.Edit: description correction: the branch has been updated to configure
languages = ["und"]
instead oflanguages = None
.und
is a three-character ISO-639-3 code for 'undetermined' language. This is used to prevent Python's stdlibgettext
from attempting to determine the host's locale from various environment variables (including theLANGUAGE
variable).Relates