Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make rpm builds more reproducible #2590

Closed
sjmudd opened this issue Jul 29, 2023 · 8 comments
Closed

Make rpm builds more reproducible #2590

sjmudd opened this issue Jul 29, 2023 · 8 comments

Comments

@sjmudd
Copy link

sjmudd commented Jul 29, 2023

I was looking at rebuilding some MySQL community rpms which are normally built by Oracle but doing this turns out to be surprisingly hard. The spec file, https://github.com/mysql/mysql-server/blob/trunk/packaging/rpm-oel/mysql.spec.in, uses a number of macros to defined various parts of the build process, BuildRequires: entries and so on depending on the OS being used. This works for RHEL 7..9 but of course should also work for CentOS 7..9 and other similar distros.

However, to reproduce a build made by someone else you need to know the exact macro definitions when the rpms were built. Unless I'm mistaken if you build a package with rpmbuild --define 'something 1' --define 'something_else 1' name.spec then the actual command line arguments used to build the package are not explicitly recorded in the binary rpms but perhaps more importantly in the src.rpm which I think only contains the sources and the spec file used.

If that's the case the lack of recording this information means that from a src rpm I may be unable to rebuild the binary rpms in the same way as the original packager. Is this assumption correct? If so would it make sense that the .src.rpm also included the command line defines (and anything else that might make sense) to simplify this task?

I also notice that building any software depends on the installed software on the host/container where the build process runs, yet this is also not "registered". For rpm systems it might be convenient to also record the installed rpm package list as that would also be useful for reproducing the build environment appropriately.

Outside of rpm itself is repo configuration which is OS dependent and it seems that RHEL/CentOS/OEL and the other RH-clones all do things slightly differently which makes rebuilds more complex. I guess that's outside of the scope of this issue.

Why would improving this be useful if the source is provided? Simply because I may want to patch the originally built rpms in a specific way yet be sure that the rest of the build and packaging process is as close to the original packaging as before.

Alternatively I may want to build a sub-module of the upstream packages which is compatible with the originally built packages and can be used without having to rebuild the whole upstream code again.

So far I've not seen a way to make this process simpler and think the suggestions above, to include more information on the build command line arguments (and maybe macro values) and the installed package list, would help the rebuild process.

Can something be done in this direction?

For context I created: https://github.com/sjmudd/mysql-rpm-builder/ which was an attempt to simplify / document the reproducible rebuild process and it has turned out to be harder than originally anticipated. It is still work in progress but maybe gives some context to where the question comes from.

@keszybz
Copy link
Contributor

keszybz commented Aug 4, 2023

Hi Simon,

To achieve build reproducibility, you must have the same versions of dependencies. In particular, the compiler, linker, and compressors, python if you write .pyc files, but also various helper programs that process documentation files, etc. So generally the only feasible approach is to record the full package set installed in the buildroot, and then use the exact same versions of those packages when rebuilding. You need the same build macros too, but those are generally set through packages that drop macro files. So for most macros the issue is solved by fixing the package set which you already need to do anyway.

There might be other macros which might be set in a different way. For example, in Fedora we can now set macros in a side tag. But those need to be recorded somehow. The mechanism that is used to record the list of packages should generally store this information too.

@sjmudd
Copy link
Author

sjmudd commented Aug 4, 2023

I think you misunderstand. If you build for multiple OS versions. e.g. RHEL 7..9 then dependencies are different, so you can't explicitly as rpm works at the moment configure all the dependencies at the same time. This is done by adding additional macro processing to either be told the OS to build for .e.g. Oracle community MySQL rpms tend to use --define 'rh7 1' or --define 'rh8 1' or --define 'rh9 1' to provide the "hints" to define the OS specific Requires: or BuildRequires: tags or even to build the package. This can't be done during the build process as BuildRequires: or Requires: tags are "static" by this time (evaluated prior to the build by rpmbuild).

Yet to be reproducible you need to know which macros were provided by the user or at least are "not part of the base rpm/build setup" and therefore are "user configurable". Without that you may not be able to determine exactly the same "input parameters" to re-build a package in the same was as the original packager. Similarly if I want to patch this package with different/extra functionality I can not be sure that my build will represent a consistent change against the original sources, and thus the whole premise of repeatable builds falls apart.

My question therefore was whether there are any plans to ensure that a built src rpm could be configured to include any rpmbuild time command line macros as metadata so that I can use that in theory to reproduce the build process faithfully.

Reason for this coming up. I'm having problems reproducing the builds because it looks like the original build environment used by the original packagers may not be "pristine" to fix one specific build issue I had to build add some symlinks , e.g. https://github.com/sjmudd/mysql-rpm-builder/blob/main/config/prepare__centos.8__8.0.33.sh#L44-L50 to setup the OS in a way which the build process would complete without errors. That's clearly a very obscure example but it's real.

So I'm looking at ways to be make it easier for the rpm packaging to be configured in such a way that issues such as this can be avoided and having a "suitable spec file" I really can repeatably build from scratch successfully.

@keszybz
Copy link
Contributor

keszybz commented Aug 4, 2023

Maybe I misunderstood what you mean by "reproducible". I meant exact reproducibility in the sense of https://reproducible-builds.org/. If you just want to "roughly recreate" a build, then what you say is applicable. I would suggest retitling the issue to something like "Record macros defined during creation in srpm" (or something that fits better what you need) in order to reduce confusion.

@sjmudd sjmudd changed the title Make rpm rebuilds more reproducible Make rpm builds more reproducible Aug 7, 2023
@sjmudd
Copy link
Author

sjmudd commented Aug 7, 2023

In the end as a first step I'd like to be able to reproduce the builds as close to what the original builder did. So I'd like that build to be reproducible in the way referenced in the URL.

  • One thing is to record the rpm macro values used during the build process.
  • another simple thing would be to record the full list of name / version of rpms installed on the build system.

"rpm builds" are a bit of a pain as often we may use multiple repos. The build environment does not explicitly say where the BuildRequires: packages come from, so rpm itself is unable to pull down the needed packages in order to trigger the builds. yum or dnf could do that but it rpm doesn't know about this and the lack of integration between the two sets of tools has always somewhat surprised me though I guess that's a different story and it's water under the bridge now.

I have changed the issue title to make rpm builds more reproducible as I think that's what I want and I think that by solving the 2 points above this would help a lot.

I'm also aware that if you run rpmbuild -bs .....spec there's no binary build process going on so I'd guess it's quite possible that macro definitions and installed package lists may not actually be useful. However, if you build source and binary packages together, which I'd assume is the correct thing to do, e.g. rpmbuild -ba <optional extra macro definitions> whatever.spec then you will have the information needed and could save this inside the src.rpm.

I think this would pretty much help solve my problem.

My second desire, which is NOT relevant to this issue, is to then be able to modify the build config or sources or add patches so that the finally built packages provide extra features compared to the originally built packaging, yet if done correctly the result will be compatible with the original packages. You could think of this along the lines of building extra kernel modules in a separate rpm provided as part of the build process which can be installed and used on the upstream running base kernel. It's much easier to do this if you can reproduce the upstream build process first. This is what I'm doing with additional plugins for the MySQL server.

@sjmudd
Copy link
Author

sjmudd commented Aug 7, 2023

Also related to your comments about how rpm works with dependencies:

I'm aware of rpm dependencies. I've been building rpm packages since RedHat 3.0.3 (that's in 1996). Also I'm not the owner of the rpms I want to rebuild so it's not a matter of me rebuilding my rpms it's also a matter of me figuring out how to rebuild others' rpms. I could certainly suggest the upstream packagers make changes to their packages but that's a longer term discussion and it requires them doing this explicitly. The nature of this issue I have created is have rpm do this for us automatically, so I don't have to figure out the details myself.

@keszybz
Copy link
Contributor

keszybz commented Aug 7, 2023

I'm also aware that if you run rpmbuild -bs .....spec there's no binary build process going on so I'd guess it's quite possible that macro definitions and installed package lists may not actually be useful. However, if you build source and binary packages together, which I'd assume is the correct thing to do, e.g. rpmbuild -ba <optional extra macro definitions> whatever.spec then you will have the information needed and could save this inside the src.rpm.

"correct thing to do" is a matter of opinion. An easier question is whether this is what always happens, and it's easy to answer: no, in some workflows people receive an srpm from somewhere and build that. See for example the (now obsolete) workflow for CentOS: RH would publish rebranded srpm and various projects would rebuild them, treating the srpm as the initial input.

In general, the macros that are defined during srpm build and the macros that are defined during binary build can be completely different.

See https://pagure.io/koji/issue/3878 for a slightly different approach to this problem.

@sjmudd
Copy link
Author

sjmudd commented Aug 7, 2023

ok, so it depends on your point of view, that's clear. For downstream "repackagers" there's clearly internal macro "mangling" and perhaps build environment differences specifically due to that, and while that's clearly a very important use case it's not the same as mine. It may be that some of this behaviour needs to be optional, but again if the current rpm build process does not allow you to complete a rebuild correctly or it requires a large amount of investigation to work out how to achieve the end goal of reproducibly building packages then to some extent I think it's fragile.
The link you provide just goes to show how fragile the current process is if you look at it in any detail, even if general building it seems to work. I suspect/know that things are more complex now than they were in the initial rpm days.

@sjmudd
Copy link
Author

sjmudd commented Aug 7, 2023

To your comment:

in some workflows people receive an srpm from somewhere and build that.

my answer is precisely that. When I try to rebuild the single package the process fails at the end of a 3-hour build run with an extremely obscure error message. Yet somehow it seems to work with the upstream packager as the rpms are built and shared publicly. The upstream packager's build environment is not public and I've seen that to make the build work in some cases some "munging" of the build environment is needed. That's very messy. It does not resolve the build for the 3 OS combinations I am trying to build the package on: CentOS 7..9 and I'm still trying to resolve that.

However, I think you understand my point of view. The intent of me creating this issue was to bring it up. It seems you're aware of the problem space and have shared that others also experience similar issues.

Is there anything that can be done now? What should happen next? I do not think I can do anything right now and clearly any changes would be a long term effort. Can any further progress be made?

@rpm-software-management rpm-software-management locked and limited conversation to collaborators Sep 12, 2023
@pmatilai pmatilai converted this issue into discussion #2654 Sep 12, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants