Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imported binaries not always properly removed from published repository #4373

Closed
jberry-suse opened this issue Jan 18, 2018 · 55 comments
Closed
Labels
Backend Things regarding the OBS backend Bug

Comments

@jberry-suse
Copy link
Contributor

Issue/Feature description

osc api '/public/build/openSUSE:Factory/standard/x86_64/_repository?
view=binaryversions&nometa=1'

<binary name="liblua5_3-32bit.rpm" sizek="118" 
hdrmd5="3970ae129623ebcd89b9cbb0256f13f6" />
<binary name="liblua5_3-5-32bit.rpm" sizek="119" 
hdrmd5="ab89d49de452657f6f6575b83d4ddcd3" />

The first of the two should not exist. Only the rpm with 5_3-5 in the name should exist.

Has occurred at least three times including my original report.

Expected result

Published binaries properly reflect current build.

How to Reproduce

  1. Wait ~3-4 months

Further information

https://lists.opensuse.org/opensuse-buildservice/2017-08/msg00035.html

@jberry-suse
Copy link
Contributor Author

Most recent:

osc api /build/openSUSE:Factory/standard/x86_64/_repository?view=binaryversions\&nometa=1 | grep llvm3

@jberry-suse
Copy link
Contributor Author

@lnussel @DimStar77 who have been involved when these issues were encountered.

@lnussel lnussel added the Bug label Jan 18, 2018
@lnussel
Copy link
Member

lnussel commented Jan 18, 2018

This hits us now with llvm3 32bit packages and blocks stagings atm. Could you please remove the stale binaries?

@hennevogel hennevogel added the Backend Things regarding the OBS backend label Jan 19, 2018
@adrianschroeter
Copy link
Member

IIRC it come in still from another package and we dropped that one .. right?

So I close this report for now.

@jberry-suse
Copy link
Contributor Author

jberry-suse commented Feb 13, 2018

The goal of this issue was the fix the underlying problem not the specific case. From what you wrote the issue is not fixed? The issue was to fix the actual problem so it stops happening every few months rather than having to dig through logs and API calls when repo-checker is not making sense to figure out this is happening again.

@DimStar77
Copy link
Contributor

Just a note, seeing this currently again (yes, I can workaround it, but no, I don't think I should have to)

Situation:

openssl-1_1_0 package has been renamed to openssl-1_1. This was done with a submission of openssl-1_1 in combination with a delete request for openssl-1_1_0.

Delete requests are accepted 'in phases' into Factory, as to not disrubt openSUSE:Factory/snapshot (removal of the source container has an immediate effect there, hence we don't do this anymore)

So, openssl-1_1_0 has been build disabled and the binaries for /standard wipe. osc can confirm this:

$ osc ls -b openSUSE:Factory openssl-1_1_0 -r standard
standard/x86_64
standard/i586

Binaries for /totest and /snapshot are still in place, which is intentional.

Nevertheless, the -32bit packages are still 'offered' to the scheduler, as can be seen on the package lmms for the time being:

$ osc buildinfo openSUSE:Factory lmms standard x86_64
<buildinfo project="openSUSE:Factory" repository="standard" package="lmms" downloadurl="http://download.opensuse.org/repositories">
  <arch>x86_64</arch>
  <error>unresolvable: have choice for libcrypto.so.1.1 needed by libldap-2_4-2-32bit: libopenssl1_1-32bit libopenssl1_1_0-32bit, have choice for libcrypto.so.1.1(OPENSSL_1_1_0) needed by libldap-2_4-2-32bit: libopenssl1_1-32bit libopenssl1_1_0-32bit, have choice for libssl.so.1.1 needed by libldap-2_4-2-32bit: libopenssl1_1-32bit libopenssl1_1_0-32bit, have choice for libssl.so.1.1(OPENSSL_1_1_0) needed by libldap-2_4-2-32bit: libopenssl1_1-32bit libopenssl1_1_0-32bit, have choice for libcrypto.so.1.1 needed by libsnmp30-32bit: libopenssl1_1-32bit libopenssl1_1_0-32bit, have choice for libcrypto.so.1.1(OPENSSL_1_1_0) needed by libsnmp30-32bit: libopenssl1_1-32bit libopenssl1_1_0-32bit, have choice for libssl.so.1.1 needed by libsnmp30-32bit: libopenssl1_1-32bit libopenssl1_1_0-32bit, have choice for libssl.so.1.1(OPENSSL_1_1_0) needed by libsnmp30-32bit: libopenssl1_1-32bit libopenssl1_1_0-32bit</error>

This choice only exists, because the wipebinaries command did not take care of properly disposing of the -32bit packages of the old openssl-1_1_0 package.

(I will aid OBS over this with a Prefer statement; as said, I CAN workaround it, but I should not have to)

@DimStar77 DimStar77 reopened this Mar 1, 2018
@jberry-suse
Copy link
Contributor Author

Another instance in Leap 15.0 which blew time to figure out.

repo-checker sees the following

can't install hdf5-devel-32bit-1.10.0-lp150.4.3.x86_64:
  nothing provides hdf5-devel = 1.10.0 needed by hdf5-devel-32bit-1.10.0-lp150.4.3.x86_64
    (we have hdf5-devel-1.10.1-lp150.5.3.x86_64)
can't install hdf5-mvapich2-devel-32bit-1.10.0-lp150.4.3.x86_64:
  nothing provides hdf5-mvapich2-devel = 1.10.0 needed by hdf5-mvapich2-devel-32bit-1.10.0-lp150.4.3.x86_64
    (we have hdf5-mvapich2-devel-1.10.1-lp150.5.3.x86_64)
can't install hdf5-openmpi-devel-32bit-1.10.0-lp150.4.3.x86_64:
  nothing provides hdf5-openmpi-devel = 1.10.0 needed by hdf5-openmpi-devel-32bit-1.10.0-lp150.4.3.x86_64
    (we have hdf5-openmpi-devel-1.10.1-lp150.5.5.x86_64)

The important binaries being complained about are 1.10.0 instead of 1.10.1 and are half a year older than the rest.

  • 25-Nov-2017 12:13
  • 22-Apr-2018 16:35

Would appreciate someone purging them and perhaps fixing this.

@jberry-suse
Copy link
Contributor Author

In the future perhaps OBS team should read through repo-checker output and find these ones since release team is currently paying for this.

@adrianschroeter
Copy link
Member

I can not see the openssl example problem anymore, but I think it is too late.

The hdf5 example is again a mis-usage of OBS disable feature. The old binaries are still there in hdf5:serial package.

Sorry, I have not seen a single time a problem which was not caused by setup problems. Either caused
by

  • using build disable flags
  • left over aggregates
  • unclean rebuild mode of repository

I am not sure that I will even look into the issue the next time if you still have this setup :/

@DimStar77
Copy link
Contributor

wiping 'disabled packages' is invalid setup?

@adrianschroeter
Copy link
Member

adrianschroeter commented Apr 25, 2018 via email

@DimStar77
Copy link
Contributor

  • there is no maintenance in Tumbleweed;
  • build disabled / wiped packages are always transient stated in TW (delete requests, making sure not to impact /snapshot directly)
  • at least in case of openssl, there was no multibuild involved
  • The issue is that ::import:* is not properly removed by wipebinaries (it is always -32bit stuff hanging back)

@adrianschroeter
Copy link
Member

adrianschroeter commented Apr 25, 2018 via email

@mmohring
Copy link
Contributor

mmohring commented Apr 25, 2018

On Mittwoch, 25. April 2018, 09:26:20 CEST wrote Dominique Leuenberger:
The issue is that ::import:* is not properly removed by wipebinaries (it is always -32bit stuff hanging back)

I can confirm that: if an _aggregate created ::import:*. then you deactivate the package with the _aggregate, then you wipe the binaries, that in this case those binaries still are in :full, but the package shows there are no binaries. The binaries also remain in :repo, so they will still be published.

@adrianschroeter
Copy link
Member

just checked, the wipe is indeed keeping the imports on purpose. We have additional code to ensure this.
not sure if we can/want to change this,since it would be a possible incompatible change.

(again, that problem would not exists if you would turn the package state into "excluded" instead of buid disabled + manual events. The state would also be reproducable then).

@DimStar77
Copy link
Contributor

not sure if we can/want to change this,since it would be a possible incompatible change.

Maybe an extended API, which can trigger the removal of those files? osc wipebinaries --with-imports ?

(again, that problem would not exists if you would turn the package state into "excluded" instead of buid disabled + manual events. The state would also be reproducable then).

There is a difference on what is disabled on Leap and what on TW, maybe we have to split this; and to my knowledge, there is no 'external' chance to change the build state to excluded, short of changing the .spec file. And even that: a package changing from succeeded to excluded leaves it's binaries back, no? so wipebinaries would still be nescessary (and sharing sources between TW/SLE/Leap would become a pain, if packagers have to set a excludearch: 586 on all Leap packages, except if there is a baselibs.conf)

  • Leap disables all packages, which do not have a baselibs.conf (as leap is x86_64 only, but needs i586 enabled for the biarch stuff; packages that lose their baselibs.conf will be newly disabled - and should have a way to eliminate things they built
  • Tumbleweed has generally only temporarily disabled stuff, while delete requests are being processed

All in all it is confusing that wipebinaries does not wipe all binaries (But I believe you when you say there must be/have been a use-case)

@mlschroe
Copy link
Member

wipebinaries should wipe all binaries iff you wipe all architectures.

@adrianschroeter
Copy link
Member

adrianschroeter commented Apr 25, 2018 via email

@mmohring
Copy link
Contributor

mmohring commented Apr 25, 2018

On Mittwoch, 25. April 2018, 11:16:05 CEST wrote Michael Schroeder:

wipebinaries should wipe all binaries iff you wipe all architectures.

In my testcase, I have switched off a package with the _aggregate and wiped all binaries for all archs. So afterwards the package for this repo shows all empty. But as said, the binaries are still in :full and in :repo. It is problematic that the binaries remain in :full, so I cannot remove e.g. broken binaries, I have to remove them manually in :full and trigger a rescan repo. @mlschroe do you mean with "wipe all architectures" all packages or only this specific package ?

@lnussel
Copy link
Member

lnussel commented Apr 25, 2018 via email

@mlschroe
Copy link
Member

Martin: :repo is the published area, it will only get wiped if publishing is enabled. With the new full tree handling, :full is always in sync with the build area, so I can't tell if your case is correct or not.

(I sure hope that you use the "new full handling", i.e. that you don't force the old handling via a new_full_handling = 0 entry in BSConfig.pm)

@mlschroe
Copy link
Member

Ludwig: I don't even know your use case. Why don't you wipe all architectures? But I think you do that, contrary to what Adrian said. Can you please confirm this?

@mlschroe
Copy link
Member

(And why do you think that disabling the build has something to do with wipe?)

@mlschroe
Copy link
Member

Ah, found it. Wipe is special as it ignores the build enable/disable flag, but the created export job doesn't do that. So the export is always dropped if the x86_64 arch is disabled. Fixing...

@mlschroe
Copy link
Member

Hmm, or maybe not... Digging deeper...

@jberry-suse
Copy link
Contributor Author

jberry-suse commented Apr 27, 2018

Bug, feature, whatever we are calling it, the problem exists. The recent addition is a new method for setting packages as excluded to avoid the disabled blank.

@mlschroe
Copy link
Member

Bah. Wild guessing doesn't help at all. Please stop that.

@jberry-suse
Copy link
Contributor Author

We are on different pages apparently. Having debugged this problem numerous times and found the stale data there is no doubt the problem exists.

@mlschroe
Copy link
Member

The thing is that we don't know what's going on and if there's a bug or not. And I don't see any stale entries for your hdf5 examples.

@DimStar77
Copy link
Contributor

❤️ please - we try to work together and find solutions. Re-iterating the same points over and over leads nowhere.

Fact is: the release team (and probably many other OBS users) have the 'disable' switch at hand to 'no longer build a package' - but using this (in the webui exposed feature) results in what the OBS-Team defined as 'invalid setup'. Anybody disabling the build of a package (where sources want to be kept, but no longer be built) and that package happened to have a baselibs.conf will run into the issue that the binaries are not completely removable using osc wipebinaries - if it helps, I can setup a 'demo project' (which can stay there for the time being to help everybody get the same view on the issue). Shall I?

@mlschroe
Copy link
Member

Yes please. I'd love to find out what's going on.

@mlschroe
Copy link
Member

(And I never said it's an invalid setup. It just has some drawbacks that can be avoided with the new "onlybuild" feature.)

Note that osc wipebinaries -a x86_64 will leave the 32bit imports intact. osc wipebinaries -a i586 will remove the binaries in the i586 tree and the corresponding imports in the x86_64 tree.

@mlschroe mlschroe reopened this Apr 27, 2018
@mlschroe
Copy link
Member

Oops, did not mean to reopen ;)

@DimStar77
Copy link
Contributor

(And I never said it's an invalid setup. It just has some drawbacks that can be avoided with the new "onlybuild" feature.)

Adrian did

ok, so I setup home:dimstar:issue4373 with a branch of 'libproxy'; once it all buiilt complete, I build disabled it (osc meta pkg -e) and then wiped binaries osc wipebinaries --build-disabled) The result now is:

> osc api /public/build/home:dimstar:issue4373/openSUSE_Factory/x86_64/_repository?view=binaryversions
<binaryversionlist>
  <binary name="libproxy1-32bit.rpm" sizek="70" hdrmd5="760ffb6b136ed0dbfad2d0a2609af121" metamd5="9ad64b99b43415880d1d7978944d30fc" />
  <binary name="libproxy1-32bit-debuginfo.rpm" sizek="316" hdrmd5="a7358b0f10b8673e40f68cb863b41f60" metamd5="9ad64b99b43415880d1d7978944d30fc" />
</binaryversionlist>

so, as the ticket is about: the --32bit packages remained back and have not been wiped

@jberry-suse
Copy link
Contributor Author

My reaction was to:

Please don't try to add more workarounds over your build-disabled workaround.

Which was in response to me finding another instance of the bad data looking to be cleaned up. This response again implies using a feature exposed by OBS is wrong which is just baffling, but anyway. The "invalid setup" is related to that same opinion expressed multiple times above and wasn't directed at you. You just stepped in and started saying what you didn't say which no one is disputing.

lnussel went ahead and added the entries for the "workaround"/"feature" to the Leap 15.0 prjconf so we'll see how it works out.

@mlschroe
Copy link
Member

DimStar: thanks for setting up that demo project. From what I see now is that the 32bit packages get removed from the package container but stay in the _repository tree. So I always looked at wrong parts of the code, I thought it would be a bug in the 43bit export/import code, but now it seems to be the the _repository handling code. Which completely surprises me, that part has been rock solid in the past.

So this is absolutly a bug, at no point in time the _repository tree must get out of sync with the build tree. Pretty amazing and somewhat scary.

@mlschroe mlschroe reopened this Apr 27, 2018
@mlschroe
Copy link
Member

Ok, found it. As suspected this has nothing to do with the disabled flag. It's a bug in the wipe code, it leaves the imported rpms in place when calculating the _repository tree but later deletes the from the build tree. When then the import event is received that is supposed to delete all 32bit packages it thinks that there's no work to do because the files are already gone. But they are still in the _repository tree.

So the good news is that this only happens with wipe, normal obs operation is not affected.

@mlschroe
Copy link
Member

Fixed with commit e0427a2

@mlschroe
Copy link
Member

(Btw obs can rebuild the _repository packages from the built tree. This will get rid of all stale entries. Just tell us the projects where we should do this.)

(obs_admin --rebuild-full-tree for the curious.)

@adrianschroeter
Copy link
Member

adrianschroeter commented Apr 29, 2018

Just for correctness, while there was an issue with wiping in openssl case (but not in hdf5 case), the build-disabled flag is still a bad/invalid/... setup for distributions. You still put a lot pain to others, it is maybe not the problem of the release team, but the problem of maintenance afterwards.

Also rebuilding of trees can only be done by OBS admins, so it is definitive a bad idea to rely on this for the release team. (and using complete different mechanic in factory makes no sense either).

PLEASE switch to the excluded state using only buildflags before release to avoid this.

(And regarding being "baffeled" about this, we keep saying this since beginning, we had a meeting regarding this setup a few month ago with release managers, so it should definitive not be surprising)

@jberry-suse
Copy link
Contributor Author

A practical alternative has only existed for a few days. Not terribly useful to point out undesirable approach without a workable alternative.

All of that being a side-discussion to a valid bug which was fixed 2 days ago. I am baffled it was so hard to admit as much. Sure benefits of alternative approach, but certainly wasn't "invalid". Just word games that drug out a resolution.

@jberry-suse
Copy link
Contributor Author

The hdf5 binaries are no longer present and the package sources were not updated (ie they were wiped out likely with the migration to excluded workflow). I appreciate the fix and alternative workflow so I won't have to debug this once a month.

@adrianschroeter
Copy link
Member

adrianschroeter commented Apr 30, 2018

The problem is that single event emission never really guarantees a reproducible state. You will always need to debug stuff what happens depending on the order of them. Also an inconsistency can happen again at any time due to external events. And you will never be sure that a staging (project linked) project will behave the same. (And it is also very time consuming on my side to debug such setups).

"Invalid" is a correct term when taking any external branches into account, like it happens with maintenance. Every single maintenance incident needs additional manual care or you release unwanted binaries. There this was and still is an invalid setup when you want to do a proper maintenance. Yes, not your problem.

@mlschroe
Copy link
Member

Can you two please do me a favor and stop bickering about the setup for the 32bit packages in this issue? It has absolutly nothing to do with this bug and the stale binaries in the _repository tree.

In fact, if you just disable the i586 arch and then do a 'osc wipebinaries --build-disabled' the 32bit packages will be erased like they should. You only get stale entries if you wipe the x86_64 architecture, and it does not matter at all if something is build disabled or not. You'll trigger the bug by wiping any package that has a baselibs setup, it's just that in most cases the 32bit packages will get rebuilt and so the stale entries will get replaced.

Anyway, the bug is fixed. For the baselibs setup in leap/opensuse there's now the onlybuild feature that makes the setup a bit easier and less error prone. Everyone should be happy and the world's a better place. Please move on.

@adrianschroeter
Copy link
Member

adrianschroeter commented Apr 30, 2018 via email

@jberry-suse
Copy link
Contributor Author

Can you two please do me a favor and stop bickering about the setup for the 32bit packages in this issue? It has absolutly nothing to do with this bug and the stale binaries in the _repository tree.

Literally my point. Not sure if my English is bad or what.

@adrianschroeter If what you say is true it would seem removing the disabled feature from OBS would be appropriate. I can only imagine how frustrating this must be for some poor sole who encounters this in their home project and does not have the understanding or resources to debug it. Otherwise if fixed by @mlschroe then again this discussion is about a different topic (ie maintenance workflow).

jberry-suse added a commit to jberry-suse/openSUSE-release-tools that referenced this issue May 10, 2018
…BS bugs.

Related to openSUSE/open-build-service#4373 as disabling s390x leaves old
binaries in repo-md while publishing new ones :(((((((.
jberry-suse added a commit to jberry-suse/openSUSE-release-tools that referenced this issue May 10, 2018
…BS bugs.

Related to openSUSE/open-build-service#4373 as disabling s390x leaves old
binaries in repo-md while publishing new ones :(((((((.
jberry-suse added a commit to jberry-suse/openSUSE-release-tools that referenced this issue May 11, 2018
…BS bugs.

Related to openSUSE/open-build-service#4373 as disabling s390x leaves old
binaries in repo-md while publishing new ones :(((((((.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend Things regarding the OBS backend Bug
Projects
None yet
Development

No branches or pull requests

7 participants