Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hand back SRPMs, containing sources that were modified as part of the build process #599

Closed
nim-nim opened this issue Jun 30, 2020 · 21 comments
Labels

Comments

@nim-nim
Copy link

nim-nim commented Jun 30, 2020

I’ve been investigating release and changelog autobumping as part of the needs of my Fedora SIGs.

I initially thought I would hit some limitation at the rpmbuild level.

Turns out, rpm copes fine with this process, and needs no change to accomodate autobumping.

Unfortunately specs that autobump beautifully in rpmbuild do not in mock. The whole process relies on storing the changelog and build information in sources modified at %build stage (because autobumping needs to know the previous evr to bump), but mock seems to replace those modified files with the pre-bumped ones.

Could this restriction be lifted?

I can not store the build info in a subpackage, otherwise autobumping would not work without some horrible self-depend cycle, and I can not emit the build info before %build, because other sections are executed several times by all kinds of software (including spectool), and you do not want those other executions to produce a spec bump.

https://fedoraproject.org/wiki/Changes/rpm_level_auto_release_and_changelog_bumping

Another limitation is that changelog autobumping requires passing the name and email that will end up in the changelog some way. Doing it for rpmbuild is trivial, you add two variables to your ~/.rpmmacros, you can do it with mock by passing a couple
config_opts['macros'] but that’s quite inconvenient usability-wise and makes the buildroot definition user-specific

I suppose one could say there is a security issue with allowing sources to be modified by the build process, but since you end up installing and executing the result of this build process, I don’t see why the files contained in the srpm, needs to be more protected, than the files contained in rpms

@xsuchy
Copy link
Member

xsuchy commented Jun 30, 2020

You say that setting the macros in config_opts is inconvenient but I do not see any proposal or even obvious common-sense value for

%buildsys_name  Your name
%buildsys_email Your email

Could this restriction be lifted?

There are no restrictions. Mock just rebuild SRPM first. Then just call rpmbuild on that rebuilt SRPM. See py/mockbuild/backend.py:rebuild_package()

@nim-nim
Copy link
Author

nim-nim commented Jun 30, 2020

@xsuchy Sorry, I probably made a mess of the report. Rushing to meet change deadlines is not a good mode to pass information.

There are two parts:

  1. first, autobumping makes the final SRPM dependent on the %build stage. Therefore, in autobumping mode, the SRPM you need to present to the mock user is the one produced at the end of rpmbuild -ba, not the one produced at the end of rpmbuild -bs. The two SRPMs are no longer identical and equivalent. That is because the sources packed into the final SRPM will have been modified during the %build stage (the source file containing the changelog will have been bumped, and the build evr will have been stored so the next build knows where to bump from)

  2. second, because the changelog will be updated at build time with a new build line, and rpm changelog format includes the name and email of the person that did the change, there is a need to tell mock who is performing the build (that did not matter before changelog bumping implied storing the user ID in the changelog). While that is doable using config_opts, I do not believe this is the correct way to do it. You need a user-level config file like ~/.rpmmacros or ~/.gitconfig for that, and probably some mock flags or argument variables so build systems like koji or copr can tell the mock layer who is performing a build

Of course, that is my understanding of the problem. I may have missed some awesome mock feature I had no need to use so far. Do inform me if that’s the case, I regularly discover new wonderful parts in mock I never knew about before.

@xsuchy
Copy link
Member

xsuchy commented Jun 30, 2020

You need a user-level config

https://github.com/rpm-software-management/mock/wiki#generate-custom-config-file

Users can use: ~/.config/mock.cfg

Though, I do not think it is more convenient in this case.

@nim-nim
Copy link
Author

nim-nim commented Jun 30, 2020

Ok, thank you for the hint, I will look at it. I have not needed user-level config before, so I did not knew about it.

However, that is the least critical issue. The real blocker is to get mock to output an srpm that uses the sources that were modified during %build (like rpmbuild -ba does). Without that part the bumping will always restart from the same point, making it a useless single-bump system.

Is it possible in current mock or does it need some mock change?

@nim-nim
Copy link
Author

nim-nim commented Jul 6, 2020

Thinking a little more about it, the only real problem seems to be mock builds a srpm, and uses this srpm to build packages, and returns this initial srpm with the built packages?

Could that be changed to "return the built packages with the srpm that was built at the same time" instead of returning the original rpm? As an option, maybe?

@praiskup
Copy link
Member

praiskup commented Jul 7, 2020

Mock doesn't even run -ba, only -bb. The only exception is dynamic
buildrequires case, but then the -ba is only used to make sure that the
SRPM in result directory contains as complete Requires: as possible for
further repo analysis (detecting what packages were/are needed for the
src.rpm build). It's not possible or wise to change anything else.

Could that be changed to "return the built packages with the srpm that
was built at the same time" instead of returning the original rpm? As an
option, maybe?

We could do -ba instead of -bb when requested explicitly. But that
doesn't have a proper justification.

first, autobumping makes the final SRPM dependent on the %build stage.
Therefore, in autobumping mode, the SRPM you need to present to the mock
user is the one produced at the end of rpmbuild -ba, not the one
produced at the end of rpmbuild -bs. The two SRPMs are no longer
identical and equivalent. That is because the sources packed into the
final SRPM will have been modified during the %build stage (the source
file containing the changelog will have been bumped, and the build evr
will have been stored so the next build knows where to bump from)

This is no-go for me. To stay sane, at the time of building SRPM and RPM
we need to have things like %release and %changelog (and others) ready in
hand, defined.

Also, the dist-git is place to store the sources to be built by rpmbuild,
not to store rpmbuild results. Not only this technically
complicates the whole build process (so far pretty trivial!), but it
creates a huge hole into our build systems (builders can not be allowed to
commit to sources).

second, because the changelog will be updated at build time with a new
build line, and rpm changelog format includes the name and email of the
person that did the change, there is a need to tell mock who is
performing the build (that did not matter before changelog bumping
implied storing the user ID in the changelog). While that is doable
using config_opts, I do not believe this is the correct way to do it.
You need a user-level config file like ~/.rpmmacros or ~/.gitconfig for
that, and probably some mock flags or argument variables so build
systems like koji or copr can tell the mock layer who is performing a
build

This is very similar to the first paragraph. Yes, Koji/Copr build system
could be allowed to to auto-bump, but not during the mock'ed rpmbuild time.
That should be separate, properly secured build-system task. We certainly
can use mock for abstracting that (e.g. wrapping rpmdev-bumpspec call,
if that was actually needed or useful).

@nim-nim
Copy link
Author

nim-nim commented Jul 8, 2020

@praiskup

Thank you for having taken the time to write a long answer.

First, let me state that no one is asking mock to interact with dist-git, this is not mock’s role, and (IMHO) a build tool has no business assuming a particular SCM, or interacting with this particular SCM.

The situation is a little less clear for higher level layers like koji, because they mix the buildsystem role with repo orchestration, and repo orchestration is very close to release orchestration, though we’ve been moving, in the past years, to separate the two and put bodhi in sole charge of deciding what is an official Fedora rpm or not. Thus mock + dist git, definitely no, koji + dist-git, probably no (but that depends how complete the move of gating decisions to bodhi is). Copr? No idea but I suspect it has taken at the copr level some of the release management responsibilities that require deciding what to keep as a release or not, and thus the responsibility to record those decisions.

So, no one is asking mock to store anything, that’s a release/gating management responsibility, that belongs to the entity that does release management in a workflow (that may include mock, as the build, not release, component). And the dist-git back-commit is just registering the change decision in the change management system, which is the purpose of this management system in the first place.

That’s inherently a saner model, than a model where you record decisions in git before they are effective, and hope for the best. Builds will fail and failure analysis will lead to plan changes, therefore pretending a build exists before it is built and release management commits to keep it is ultimately a source of problems (for releng, for example, which are asked to clean up the mess every time the future does not turn up the way the optimistic packager thought it would).

Second, the “proper justification” is IMHO that it is useful for some users and that it does not break the rpm design. Which is clearly the case here, or the change would not have attracted that many replies by people interested in the subject. And I don’t see how one could argue that something which works in rpmbuild without any change in the rpmbuild codebase breaks the rpm design.

“We did not do that before” is not a good justification, rpm workflows must evolve to adapt to the constrains of the language ecosystems rpm packages, those ecosystems have flourished in the past years (thanks to git) making the amount of artefacts to package grow by several orders of magnitude. This is not sustainable at the rpm level without steep tooling changes and improvements, because the packager number did not follow the same growth curve. Thanks to git devs do more, with the same amount of manpower. The rpm ecosystem must help packagers do more, with the same amount of manpower.

That means more dynamic specs: dynamic buildrequires (and I was involved there) dynamic subpackages (what @ffesti is working on right now), dynamic autobumping releases, etc. rpm maintainers are discussing how to inject the result of spec generation in the SRPM for debugging purposes, dynamic subpackages means this generation will not be effective before the end of the new postbuild section, therefore, mock will have to handle “SRPM as it exists after the start of the build” one way or another.

In a few years rpms and the rpm workflows will look very different from the workflows of the past score of years (where the ecosystem essentially stagnated) and that will be great. There would not be so many people working on core rpm changes today if the stagnation was sustainable.

And you could try (as people are trying) to move the dynamicism outside rpm and mock, with external tools munging specs and feeding the result to mock, and mock pretending it is doing business as usual static builds. But that involves a huge amount of expensive and brittle complexity, and is ultimately limiting (too many cooks for a single dish). The difference in line counts between autobumping within the spec, and autobumping from outside the spec, is pretty damning, and it will get worse the more tools you layer outside the spec, trying to coax it do do things it can not do by itself. Adding build logic in the spec is ultimately simpler than layers over layers of pre-processing that try to deconstruct the spec logic, then reconstruct it, adding changes, behind the packager’s back.

@praiskup
Copy link
Member

praiskup commented Jul 8, 2020

And you could try (as people are trying) to move the dynamicism

I think this is not personal, but I'm not trying to block dynamicism (as
you can see in original %dynamic_buildrequires proposal, I was the huge
supporter). But we need to declare what is and what isn't the
input/output of mock and rpmbuild precisely.

I'd like to avoid mixing-up two categories of things we declare in spec file:

  • Declaration of what we build the package from
  • Declaration of how we build, i.e. transform the sources into the binary
    packages

I don't think we want to/can make the first point dynamic. That would
bring a lot of build-system non-determinism, and it would go against the
"reproducible" builds initiative. I have a feeling that the proposal goes
the way to break this rule?

From a mock perspective, we should protect these points:

  • spec + buildroot [+ sources optionally] declare how the output
    complete source RPM looks like
  • the complete source RPM is a definitive answer to "how the binary RPM
    should be built in buildroot, and from what sources"
  • output from --buildsrpm should provide the same variant of source RPM
    that is provided by mock --rebuild.
  • we can not expect the %build to be executed for --buildsrpm mode

That being said, I view the current mock code to be broken. Because the
output source RPM from --buildsrpm and --rebuild isn't the same (see
the difference between -bs and -br in
rpm-software-management/rpm/issues/1304 -- mock should do -br, why would
you otherwise use mock for --buildsrpm?). I simply think it is a bug
because -ba in rpmbuild provides the -br variant, not the -bs variant.

The input source RPM for --rebuild mode is somewhat historically
misunderstood, though. Even though we enforce a source RPM on the input,
we don't actually need it. This input format could calmly be some
trivial tarball or any other archive providing just spec [+ mandatory sources].
Because the only thing we do with the input is that we
extract its content (despite the fact there are other information like
Requires headers, etc.), and we first re-build the real complete source
RPM
.

Then, we install that complete source RPM (including all the Requires,
even the dynamic), and rebuild the binary RPMs from there. At this
point I see as an absolute no-go to think about that from this
complete source RPM we will generate yet another source RPM, modified by
%build section. At least not until rpmbuild itself comes with
something entirely new (like the fresh addition -br was recently).

mock pretending it is doing business as usual static builds

Mock's basic task is to wrap rpmbuild and depsolver (chroot). Not to
pretend we'll add some other dynamics :-)

Think of it like: rpmbuild --rebuild is an equivalent to
mock --rebuild. And mock --buildsrpm the equivalent for
plain rpmbuild -br (let's fix the bug that we don't do this).

The TL/DR:
We can not affect the --buildsrpm output by %build in mock, same as we
can not affect the output from rpmbuild -br by %build.
We can not differ mock --builsrpm output source RPM from the
mock --rebuild output source RPM, same as we can not differ
outputs in rpmbuild -br vs. rpmbuild -ba.

@nim-nim
Copy link
Author

nim-nim commented Jul 8, 2020

I'd like to avoid mixing-up two categories of things we declare in spec file:

* Declaration of what we build the package from

* Declaration of how we build, i.e. transform the sources into the binary
  packages

I don't think we want to/can make the first point dynamic. That would
bring a lot of build-system non-determinism,

Ok, thanks for the reply. That helps clarifying how the proposal works. I think the misunderstanding here is that, because the mechanism modifies files in Sources, it is some kind of self modifying non deterministic monster.

The modified sources are not used by the build process itself. The modified sources are an output of the build process not an input. Modifying sources is not necessary for the bumping to work. It is necessary to store the result of the bumping in the SRPM, so the next build knows where to bump from, and you can chain bump to your heart content, without being stuck in a loop where the spec says release is R, and you compute R+1, and the next rebuild still sees R, and still computes R+1.

(You do not even need the SRPM. Exactly like you wrote, the SRPM is just a way to convey updated sources + spec to the next build. In a direct rpmbuild setup the proposal will auto-bump without any intermediary SRPM stage, because sourcedir is shared between rpmbuild calls, so each rpmbuild will see the results of the last build without intermediate SRPM import/export).

You can think of it like a %files section, except a %files attached to the SRPM. And it needs to be attached to the SRPM because the payload we import/export between rpmbuild, mock, koji, copr, obs etc is the SRPM, so info that needs carrying over to the next rebuild must be included in the SRPM itself.

It could probably be modeled better with:

  1. there is an SRPM at the start of the build process, with a set of files in %sourcelist
  2. there is an SRPM at the end of the build process, with a set of %files intended for the next rebuild (mostly the same files, with a few updates to trace a build occurred at a particular date).

Sources are not dynamic within a build. The dynamic property comes from the fact the next rebuild, will use a new set of sources as starting point.

The data model is actually a lot stricter and clear cut than the one you are used to, because software and especially rpm macros are dumb as a brick, and could not stomach the kind of fuzzy data model humans have no problem with.

With the proposal you have a clear-cut:

  1. evr in the spec file is the evr the spec was last modified → packager owned
  2. info about last official build (last build time, last packager, last evr) is in the buildsys.conf file → generated by last builder, saved (or scratched) by the part of infra that decides to keep or scratch a build (the gating authority)
  3. changelog is a detached rpm-changelog.txt in sources, which can be alimented by git or whatever else you want, with build dates added by the builder when it builds

The third part is not 100% satisfying, because it mixes builder-provided data, with packager or git or whatever provided data, but you can not have a timeline that includes both kinds of data, without multiple writers.

and it would go against the "reproducible" builds initiative. I have a feeling that the proposal goes
the way to break this rule?

Autobumping is the antithesis of reproducible builds. You can not both bump releases automatically, and change nothing between builds.

However, because the data model is clean, implementing a reproducible mode was trivial: the code just takes the "previous state" that would have been bumped in normal mode, and reuses it as-is.

https://src.fedoraproject.org/fork/nim/rpms/redhat-rpm-config/blob/forge-with-patches-auto-call-auto-changelog-auto-srpm-bump/f/buildsys-srpm.lua#_81

You can see by yourself, the code is a KISS "if reproducible, set everything to the values produced by last build, else compute new values".

@praiskup
Copy link
Member

Just my opinion: I think it is too complicated. See my tl/dr above why we can not not affect the output source RPM by %build.

Autobumping is the antithesis of reproducible builds. You can not both bump releases automatically, and change nothing between builds.

Not really. As long as the package doesn't bump the release on its own, and we rather delegate the task to build system -- we can go very easily back in time and re-do the build with the same parameters, and reproduce the build.

There are two really straight forward options to let build-system do the decission:

  • Auto-bump script which generates patch, and let build-system to commit back the change to dist-git (then do the reproducible build as we know it)
  • Don't even automatize the commit, and just specify a build-time macro incrementing the release. You can define the same value for the macro anytime you want in future.

@nim-nim
Copy link
Author

nim-nim commented Jul 10, 2020

The first solution does not work mid-term, because it requires spec syntax to stay static, and all dynamic changes being worked on will change this syntax (it will probably break even before @festi finishes his part, since I’m changing other related parts for my own needs).

Basically, as soon as rpm maintainers insisted in spring they would only support Tag evaluation in very specific places of the spec, and people just had just to generate Tags in this very specific place, the game was over for the first solution, because it assumes there is a physical Release tag to hang on (you can not have a physical release Tag in the spec when rpm maintainers require you to generate it because it simplifies the rpm spec parser work).

The second solution does not work either, because it ties release bumping to a buildsystem view of release history, meaning as soon as you move a package from one system to another things break.

You could make the second solution work, by telling the new buildsystem where the old buildsystem stopped, but that’s exactly what you object in the change.

However once you’ve made the mental break that adding to release history safely requires knowing this history (at least the part of history you’ll be grafting things upon), and moving between build systems requires relaying history, you quickly realise that they’re zip reason to make each buildsystem invent slightly different and incompatible bumping algorithms, and that you want the logic in a single place, and really, it’s not that hard to implement (less than a hundred of basic lua code for the logic itself, a bit more once you add the parts to relay the results to the rest of the spec).

And the change does use counter macros that can be set beforehand, as I already wrote reproducible builds are implemented by just keeping old counter values instead of computing new values.

@praiskup
Copy link
Member

There's nothing we can do about this in mock, see the tl/dr above. Re-repeating:
At this point in time there's nothing else than rpmbuild -br or rpmbuild -bs for
source RPM building. Don't expect that -bb will produce different source RPM
than just plain -br.

I suppose that by Florian's work you mean rpm-software-management/rpm/issues/329
but that's an entirely different thing. It is a way to declare how to generate binary
packages from clearly given sources. How that modifies source RPM output by
%build section?

@nim-nim
Copy link
Author

nim-nim commented Jul 10, 2020

No, I meant
rpm-software-management/rpm#1239

The last line in Florian’s message clearly indicates that the target is to make it possible to generate the SRPM (main) section too, meaning the SRPM result of the build won’t be known before the new section, after build

You currently cannot put the main package / initial preamble into the %postbuild section. This requires rethinking what kind of processing is done during (initial) build and how this affects other users of rpmSpecParse.

I agree 100% with Florian that it needs to be done this way. You can not square dynamic subpackages with projects that produce a single artefact otherwise (ie the vast mass of simple projects).

@praiskup
Copy link
Member

Honestly, I don't see any desire in that statement that %postbuild
should touch the main package preamble (that affects src.rpm format).
I just see a neutral statement of what is not possible, and what would
be needed if and only if we wanted it to happen.

@nim-nim
Copy link
Author

nim-nim commented Jul 10, 2020

Florian would not be writing that if he thought main should stay stuck in the preamble. He would either not write it at all (because it would be obvious for him that main has nothing to do with his feature) or write plainly "you can not generate main, because main belongs at the start of the process" (like you wrote yourself in this issue).

The fact is that a huge number of the packages we create do not have explicit subpackages sections in the preamble, because SRPM metadata = deployment package metadata.

So, what you are going to do once Florian finishes his work, and rust/python/go whatever automate their subpackage creation using Florian’s feature. Tell packagers "you can use the feature, except when you have a single deployment package to generate" (because in that case the main srpm declaration in the preamble will collide with the single generated subpackage declaration in the new section)? Or tell them "you can use the feature, but be extra careful to choose manually a srpm envelope that does not collide with generated subpackage naming, ie more manual packager busywork plus renaming all your single subpackage srpms to something else, plus dealing with renaming reviews? (that would be the only way to do it, because changing the subpackage name, would collide with all the existing user deployments that use current naming, and require heavy upgrade path surgery).

That does not square. That’s terribly packager-unfriendly. As soon as you start doing that packagers will beg for a way to reconciliate the subpackage metadata with the srpm metadata. And since subpackage metadata won’t exist before the new section, after build, that moves srpm processing in there too no matter how you try to turn things.

@praiskup
Copy link
Member

name.spec doesn't mean you have to generate name.rpm at all. That
said, the main preamble can be just an unused stub for the final RPM
generator, and you can just generate the sub-packages (or even have
one spec file generating one subpackage).

If we happen to let post-build section to modify the main spec preamble (I
hope we don't want to let it touch SourceX, really, but it is too late
anyway - unless we want to re-evaluate %build), we have to expect that
rpmbuild comes with some new option that allows us to generate the
output source RPM correctly. So from mock perspective, we don't have
the needed (hypothetical) rpmbuild feature at this moment.

@nim-nim
Copy link
Author

nim-nim commented Jul 10, 2020

I don’t understand why you say you do not have the needed rpmbuild feature at this moment. Just call rpmbuild -ba on the SRPM you built at first stage. If rpmbuild did not already work (as I am quite sure rpmbuild will still work after the changes Florian is doing) my specs would not autobump using plain rpmbuild.

There’s no deep magic in there, just more stages in rpmbuild generation, that invalidate the "srpm at start of process = srpm at end of process" assumption. But really, I don’t think there was ever any deep commitment to this assumption anywhere, it’s just an implementation artifact of our past way of doing things, that people got used to.

If mock needs something more elaborate, can you just fill what more elaborate thing mock needs (here or directly in rpm upstream issue tracker, I don’t think one is harder for you than the other)? So people like Florian and me, who are working on more dynamic generation, do not forget to take mock needs into account?

@praiskup
Copy link
Member

praiskup commented Jul 10, 2020

I don’t understand why you say you do not have the needed rpmbuild
feature at this moment. Just call rpmbuild -ba on the SRPM you built at
first stage.

Because it is another small step that opens mock for ugly hacks, and it is
even resource wasting.

If rpmbuild did not already work (as I am quite sure rpmbuild will still work
after the changes Florian is doing) my specs would not autobump using plain
rpmbuild.

It actually doesn't quite work, rpmbuild -br doesn't bake the generated
changelog into the source RPM. So you can not run rpm -q -p --changelog
anymore on such source RPM.

There’s no deep magic in there

This is higly POV statement, and I disagree with you.

just more stages in rpmbuild generation, that invalidate the "srpm at start of
process = srpm at end of process" assumption.

There's just no point in breaking this assumption in mock IMO.

I'm pretty stable here, so I am now resigning from the message ping-pong
here till others come with more convincing arguments. I plan to close
this in month or so as wontfix, unless there's more community input.

@nim-nim
Copy link
Author

nim-nim commented Jul 10, 2020

It actually doesn't quite work, rpmbuild -br doesn't bake the generated
changelog into the source RPM. So you can not run rpm -q -p --changelog
anymore.

Of course rpm -q -p --changelog works, either with -bs or -br. I was real careful to make the implementation right, even if it only took me a week-end to write it.

The changelog is not baked in the source file within the srpm at this point, because the build did not progress far enough, to commit that a build happened in the detached file, but rpmbuild already knows the release it is targetting and srpm metadata (release and changelog) reflect this target.

@nim-nim
Copy link
Author

nim-nim commented Jul 10, 2020

I could easily move changelog persistence to %prep, so sources are updated before rpmbuild -br . But that would make any rpmbuild -bs result in a release bump. Right now the release bump is provisional, and because it is not persisted to source file, you can execute as many -bs or -br as you which, only a full build will create a srpm that the next build will bump from.

If I had done otherwise, creating a srpm to feed it to mock or copr or fedpkg, would have persisted the new release and changelog, before the build was effective.

@praiskup
Copy link
Member

I plan to close this in month or so as wontfix, unless there's more community input.

Closing, ideas from the community are still welcome, but I fear there's a very low interest in this topic,
and the overall auto{bump,changelog} is overrated. E.g. after more than one month only 18 people voted:
https://docs.google.com/forms/d/e/1FAIpQLSc_QWWjEnbCKwlrosxt6CiwR37sDWO6mdfpUe4D-CbzKBhRvQ/viewanalytics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants