New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
builds of openmpi are not yet reproducible #3759
Comments
|
I think we'd be amenable to (the PMIX stuff will be submitted upstream to PMIX -- but they copy a lot of their build system / methodology from us, so they'll likely be amenable, too) FWIW: the Is there a |
|
Yeah, we'll take it upstream to PMIx. If it doesn't already exist, perhaps something like |
|
i already started implementing this, and i will likely PR today |
use SOURCE_DATE_EPOCH environment variable if defined in order to make build reproducible Fixes open-mpi#3759 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
|
On 2017-06-27 16:37, Jeff Squyres wrote:
Is there a |$SOURCE_DATE_EPOCH| analog for |hostname|? I.e., does a
reproducible build require identical builds on different machines?
one common approach I have seen is to re-use $SOURCE_DATE_EPOCH as an
indicator for 'someone wants the build to be reproducible' e.g.
```bash
if [ -n "$SOURCE_DATE_EPOCH" ] ; then
hostname=someconstant
fi
```
|
|
@bmwiedemann Ralph suggested |
|
The disadvantage of an extra variable is that a dozen distributions would have to discover and use it then, while $SOURCE_DATE_EPOCH is already well-established. I usually use |
|
https://www.zq1.de/~bernhard/temp/openmpi-reproducible-patch.txt is my test-patch that made the build fully reproducible, but I considered too unclean to upstream in this form. |
|
@bmwiedemann Just to be sure I understand: you're suggesting using Be sure to see #3779, too (but it doesn't [yet] include anything for hostname). |
|
I meant: do not use an extra variable to signify that the build should override the hostname, but look at the existance of $SOURCE_DATE_EPOCH for that. |
|
@bmwiedemann I have to admit that that's a bit weird to me. I can see the argument of using Do reproducible builds have to produce exactly the same output when run on multiple different servers? Or just at different times on the same server? |
|
That seems a little ugly as it requires one infer some intent of the user that may not exist. However, I don't feel strongly about it. I would, though, suggest that somebody in the distro world fix the ambiguity by creating a separate variable. |
|
@jsquyres we have a build farm for openSUSE and the idea is to be able to (re)create identical binaries on any of the build hosts. (otherwise we have to re-publish build results to the mirrors and users wasting their bandwidth when nothing in the code actually changed) |
|
To give some background, If the build is reproducible, then many aspects of the build process including the machine's host name, the date, etc, become irrelevant for debugging because the binary output only depends on the source code plus the build dependencies. The hostname is only going to be of interest to a small group of individuals who know something about that host. In some contexts, it could even be an undesired privacy leak. It is true that not all builds are going to be reproducible, so emitting the hostname somewhere, might be good for debugging purposes, until the build becomes reproducible. However, to aid the push towards reproducibility, this should not form part of the distributed build products, but some auxiliary side products. For example in Debian we're putting this stuff in a buildinfo file. But if you really need this sort of information to be part of a distributed build product, it would be better for distributors, if this were not the default behaviour. For hostnames, I'd suggest that a default value of |
|
sounds good. |
use SOURCE_DATE_EPOCH environment variable if defined in order to make build reproducible, by forcing timestamps. See https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Bash_.2F_POSIX_shell Thanks Bernhard M. Wiedemann for bringing this to our attention Fixes open-mpi#3759 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
|
from a vendor/packager point of view, i understand the value of having reproducible builds. so i'd rather give everyone the option to choose what fits best. the rationale is i'd rather have when a consensus is reached, i will update #3779 accordingly. |
|
Yeah, Jeff and I discussed this a bit earlier today and feel that we need to keep that hostname in there. I think using A "standard" has to start somewhere |
|
I think there was some misunderstanding of what I wrote. My comments to do with Having too many environment variables defeats the point of having reproducible builds by default. At the very least, pick a different name, Defaulting the value to |
|
@infinity0 any better idea for the environment variable instead of |
|
Would you be happy with adding a I could suggest one of |
|
I think we are falling towards not making the build reproducible per your definition by default, but I don't think we have any particular position regarding the switch to use when indicating that desired behavior. I agree with your concern about having a bunch of flags, but I also wonder if we won't encounter situations where one distro may feel that certain fields should be set a particular way, while another distro wants something else. However, that said, nobody has asked for this before and I don't see some of the usual parties commenting here, so perhaps we should just deal with the immediate request. If @bmwiedemann is comfortable with the |
|
A few points in no particular order... Use casesThere's a few use cases that we're concerned with:
I think that most software packages probably fall into using pre-built packager builds. But Open MPI is a bit different -- in the environments where it is used, it is quite common for the sysadmins to install a specific OS/distro, and then go manually install the latest compilers, network stacks, and MPI implementation (because they're usually more recent than are what are available in the OS/distro). Don't get me wrong -- I'm not making any judgements about OS/distro release speeds -- there's very good reasons that they go slow (stability === good!). I'm just saying that we do have a fair number of users who actually download and build Open MPI manually -- perhaps more than most other open source packages. Meaning: the 3rd case is still fairly important to us, which is exactly why we have elements like the hostname and timestamp in some of our build product. I.e., that hostname and timestamp are meaningful to the sysadmins who do end-user builds. Reproducible buildsAs @rhc54 and @ggouaillardet have said, I think we're quite amenable to reproducible builds. But it may still not be our default (because of reasons cited above). Keep in mind, however, that Open MPI is highly configurable. There are a few less than 16 gazillion As such, it is extremely common for different sites -- and even different installations of Open MPI across different resources at the same site -- to have different configurations of Open MPI. This at least part of the reason that the idea of "reproducible by default" is kinda weird to us. We are strong believers that if you build it twice the same way in identical environment it should be totally deterministic, of course (e.g., #3755 was a great fix). But getting exactly the same binary output when building on different resources is unlikely -- but that is by design. Specifically: we're not trying to hide the differences between different resources and environments -- we actively utilize those differences. CLI optionsAll the above being said, note that we can't quite do That being said, the presence of What to do?My remaining question, however, is still about the hostname. I guess there's a few options here -- if
I kinda like using Sidenote: we could also add a field in |
|
as far as i am concerned, i'd rather go with
if we decide to go with |
using the standard $USER and $HOSTNAME environment variables to make reproducible builds possible. See https://reproducible-builds.org/ for why this is good. This helps improve issue open-mpi#3759
using the standard $USER and $HOSTNAME environment variables to make reproducible builds possible. See https://reproducible-builds.org/ for why this is good. This helps improve issue open-mpi#3759 Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
using the standard $USER and $HOSTNAME environment variables to make reproducible builds possible. See https://reproducible-builds.org/ for why this is good. This helps improve issue open-mpi#3759 Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
using the standard $USER and $HOSTNAME environment variables to make reproducible builds possible. See https://reproducible-builds.org/ for why this is good. This helps improve issue open-mpi#3759 Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
use SOURCE_DATE_EPOCH environment variable if defined in order to make build reproducible, by forcing timestamps. See https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Bash_.2F_POSIX_shell Thanks Bernhard M. Wiedemann for bringing this to our attention Fixes open-mpi#3759 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
use SOURCE_DATE_EPOCH environment variable if defined in order to make build reproducible, by forcing timestamps. See https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Bash_.2F_POSIX_shell Thanks Bernhard M. Wiedemann for bringing this to our attention Fixes open-mpi#3759 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
use SOURCE_DATE_EPOCH environment variable if defined in order to make build reproducible, by forcing timestamps. See https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Bash_.2F_POSIX_shell Thanks Bernhard M. Wiedemann for bringing this to our attention Fixes open-mpi#3759 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
If defined, use SOURCE_DATE_EPOCH environment variable make the build Reproducible by forcing timestamps. See https://reproducible-builds.org/docs/source-date-epoch/ for more information. Thanks Bernhard M. Wiedemann for bringing this to our attention. Fixes open-mpi#3759 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
If defined, use SOURCE_DATE_EPOCH environment variable make the build Reproducible by forcing timestamps. See https://reproducible-builds.org/docs/source-date-epoch/ for more information. Thanks Bernhard M. Wiedemann for bringing this to our attention. Fixes open-mpi#3759 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
If defined, use SOURCE_DATE_EPOCH environment variable; make the build Reproducible by forcing timestamps. See https://reproducible-builds.org/docs/source-date-epoch/ for more information. Thanks Bernhard M. Wiedemann for bringing this to our attention. Fixes open-mpi#3759 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
If defined, use SOURCE_DATE_EPOCH environment variable; make the build Reproducible by forcing timestamps. See https://reproducible-builds.org/docs/source-date-epoch/ for more information. Thanks Bernhard M. Wiedemann for bringing this to our attention. Fixes open-mpi#3759 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 7b4e8ba)
If defined, use SOURCE_DATE_EPOCH environment variable; make the build Reproducible by forcing timestamps. See https://reproducible-builds.org/docs/source-date-epoch/ for more information. Thanks Bernhard M. Wiedemann for bringing this to our attention. Fixes open-mpi#3759 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 7b4e8ba)
If defined, use SOURCE_DATE_EPOCH environment variable; make the build Reproducible by forcing timestamps. See https://reproducible-builds.org/docs/source-date-epoch/ for more information. Thanks Bernhard M. Wiedemann for bringing this to our attention. Fixes open-mpi#3759 **NOTE:** This was cherry-picked from master, and slightly modified / amended for the v4.0.x branch. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 7b4e8ba)
If defined, use SOURCE_DATE_EPOCH environment variable; make the build Reproducible by forcing timestamps. See https://reproducible-builds.org/docs/source-date-epoch/ for more information. Thanks Bernhard M. Wiedemann for bringing this to our attention. Fixes open-mpi#3759 **NOTE:** This was cherry-picked from master, and slightly modified / amended for the v4.1.x branch. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 7b4e8ba)
If defined, use SOURCE_DATE_EPOCH environment variable; make the build Reproducible by forcing timestamps. See https://reproducible-builds.org/docs/source-date-epoch/ for more information. Thanks Bernhard M. Wiedemann for bringing this to our attention. Fixes open-mpi#3759 **NOTE:** This was cherry-picked from master, and slightly modified / amended for the v4.0.x branch. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 7b4e8ba)
If defined, use SOURCE_DATE_EPOCH environment variable; make the build Reproducible by forcing timestamps. See https://reproducible-builds.org/docs/source-date-epoch/ for more information. Thanks Bernhard M. Wiedemann for bringing this to our attention. Fixes open-mpi#3759 **NOTE:** This was cherry-picked from master, and slightly modified / amended for the v4.1.x branch. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 7b4e8ba)
Background information
What version of Open MPI are you using?
1.10.7 on openSUSE Tumbleweed (aka Factory)
Details of the problem
For https://reproducible-builds.org/ I tried to build packages of openmpi twice and found various diffs in the binaries.
One source of such diffs should be removed by
#3756
But there are still invocations of
in
openmpi-1.10.7/autogen.pl
openmpi-1.10.7/config/opal_functions.m4
openmpi-1.10.7/config/opal_get_version.m4
openmpi-1.10.7/ompi/tools/ompi_info/Makefile.am
openmpi-1.10.7/orte/tools/orte-info/Makefile.am
openmpi-1.10.7/oshmem/tools/oshmem_info/Makefile.am
master also has
ompi/tools/mpisync/Makefile.am
opal/mca/pmix/pmix2x/pmix/config/pmix_functions.m4
opal/mca/pmix/pmix2x/pmix/config/pmix_get_version.sh
Would be nice if those
dateusages could be either dropped or use the SOURCE_DATE_EPOCH environment var to override the current time (see examples in https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal#Bash_.2F_POSIX_shell )The text was updated successfully, but these errors were encountered: