-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please support generation of AppStream metadata automatically #75
Comments
Hi Richard, Anyway if COPR supports generating appdata I personally don't see this as big issue. COPR should be preferred way of creating 3rd party repos (and the most convenient). |
@hughsie so I've talked with Tomas and the behavior for generating appdata could be by default and use a libappstream-builder lib but there should be compile time option enabling it. If you can implement it we'll be happy to do PR review and merge it. |
@hughsie In my view, I think it's completely fine, as long as there's a compile-time option for adding the libappstream-builder dependency. In fact, this would make things even easier for us in Mageia for supporting AppStream metadata, as well as many others who are consumers of |
I would suggest calling the switch |
@hughsie Are you still interested in doing this? If you are, I'm happy to review and test patches for integrating AppStream functionality into createrepo_c. |
@hughsie If you're still interested in this, I'm happy to review, test, and merge patches to incorporate this functionality into createrepo_c. |
I would really love to see this happen, most methods I have seen around are a workaround just so appstream works with the repository, which is less than ideal, this has the potential to work well with the existing solutions, without much work required for different build systems to enable. |
I'm interested in seeing it done, but alas don't have the time or the permission to spent a week+ on writing the code. In all honesty, appstream-glib and shipping an appstream-data package is really just a a stopgap until we can use ostree+flatpak where AppStream metadata is baked right into the design. |
In the most generous case, where Flatpak takes over the world, it's still not covering everything, as there's still the host components needing AppStream metadata (like drivers, among other things). In this regard, it still makes sense to be able to generate AppStream metadata for rpm repos. But at least if you do get around to doing it, I can review, test, and merge it. And naturally cut a new release with the feature shortly thereafter. |
Minor update here, if someone wants to work on this, please use @ximion's |
@Conan-Kudo Thanks for the heads up. It would be great if this clarification was applied to the repos and documentation, because there is no external indications that one of them is deprecated, or generally any indications of the differences between the two. @hughsie 's repo is still getting commits and occasional fixes, there's no deprecation warnings, the README's don't clarify anything (and indeed @hughsie 's README seems more complete), |
@ximion is writing a new Note that deprecated does not mean dead, it just means that new stuff shouldn't be using it while things slowly move over. GNOME Software moved over in GNOME 40, for example. |
That thing is pretty much complete, |
It looks like that the issue is opened for 5 years therefore it is good time to revisit it. May be we can stat with some clarification. Right now AppStream metadata are generated by a library and then they are added to repository by modifyrepo_c. Are UppStream metadata used only by PackageKit or they are widely used? Why createrepo_c should directly generate this metadata? Modules and comps group are also generated outside of createrepo_c. And what about merging repositories and merging AppStream metadata? Is this functionality supported by a library? |
gnome-software, KDE apper and software center, cockpit, fwpud and others. In all honestly, tons of stuff :)
I believe libappstream supports this already. |
I tried to understand the mechanism how AppStream metadata works and I discovered that they are not defined in repomd file (like rpms, modules, filelists, comps, advisory, ...) but they are stored in RPM ( |
@hughsie Thank you very much for information, but I tried to find such a data in Fedora repositories in repomd.xml, but there is nothing like that. Please can you point me once again? |
I... thought the blog post should explain everything... i.e. you generate the appstream metadata using appstream-builder and then use modifyrepo to include it if createrepo has already been called. |
I think the confusing part is that the Fedora repositories themselves don't work this way, they use this crazy approach with packing the metadata into the |
This missing feature is why the Fedora repositories don't work that way. If this feature was implemented, then Fedora would use it. |
I've always considered the In Debian, this is implemented by having our archive software add & sign the AppStream data with the rest of the metadata, and then having APT, our package management tool, download this data on the client. APT will then invoke The AppStream metadata itself is generated by appstream-generator on Debian/Ubuntu in a sandboxed environment (the software doing font rendering and image scaling from 3rd-party sources scared the security people). The generator tool is a pretty heavyweight solution that takes care of a lot of stuff specific to Linux distributions and Debian in particular, for example it has some very complicated logic to hunt down the right icon for an application across all packages without having to trace dependencies and extract half of the archive into a temporary location. On the other hand, the Which of these is right for this application I don't know, but I can definitely help in case any of these tools is missing anything you need. The generator even reads repomd files and can handle RPMs, but I don't think this feature is used much at the moment. |
Thank you very much for clarification and for additional information. If I will summary what I've got from discussion.
|
Am I correct in thinking that generating the metadata requires deep inspection of the RPM file, so it needs to be available? |
It does, yes. What makes appstream repodata generation slow for Fedora and COPR right now is that we have to generate the rpm repodata, and then scan through the RPMs again to pull the necessary contents out for appstream repodata generation. That's why Fedora doesn't do it now, because it's too slow. Once integrated into createrepo_c, it would be possible to assemble the appstream and rpm repodata in one step as each RPM is read and the data is pulled, which would make it faster. |
Isn't the difference that |
Yeah, I'm curious how much time it would actually save (not that the feature isn't a good idea necessarily). The bottleneck is almost certainly going to be unpacking the RPM archives rather than finding or opening the RPMs. createrepo_c only has to read the headers, so there's limited overlap in the work being performed. |
Actually, What would though speedup the Copr use-case a lot would be the |
I think then my advice to Fedora would be to stop shipping applications in rpm packages, and we should speed up the transition to Flatpak and OSTree metadata. |
Please don't make unproductive comments. This is not ever going to happen. |
What happens when I get bored of creating the appstream-data package updates? I think I'm the only person that's ever actually done it. |
The RPM team (specifically @ffesti) is aware of this. Extending the base RPM format to incorporate AppStream data has been discussed before, but the result of those discussions is that it would make the RPM headers ridiculously big. To writ: the problem is that AppStream data is extremely rich. The following components are generally part of AppStream metadata:
We can't embed this in the RPM header. The best we could do is include pointers to the payload regions for the data files so that we don't have to scan the whole RPM for them. But that's still a fair bit extra to pull off during RPM generation. But the thing is, scanning the RPMs for these files is not particularly slow. The problem is that we have to load all the RPMs twice today, since we read them one time for createrepo_c, and then read them again for appstream-builder. Two separate processes. If you decide not to integrate the two processes, and @hughsie decides to stop making |
The performance issues do not exist with appstream-generator - it works absolutely fine on massive Debian repositories, and Ubuntu and Arch also use it without reported issues . Still, it's expensive to run (needs quite a bit of memory), which is why we only do it every 6h, but that is absolutely sufficient (the archive doesn't get published any faster anyway). Its RPM backend exists, but I would guess its the least-tested backend. There's no reason why createrepo_c couldn't use appstream-generator or implement something that fits its needs better based on libappstream-compose (knowing which packages are new is a huge advantage). Also, dpkg/rpm have nothing to do with AppStream metadata, it is truly part of the repository metadata, so very much in the hands of DNF/Zypper/APT and the respective repository layouts that distributions use. So, while it's unified in the Debian-based world, I wonder whether it is reasonable at all to expect any unification in the RPM world with its different package management tools. Also, holding back features for one distribution while waiting on others is not a great plan (especially since openSUSE actually already has AppStream data as part of its repository metadata, AFAIK it's only Fedora that doesn't implement this yet). |
I'm not sure it makes a difference for us, the main issue will be that it requires the RPMs to be available at all times, which is slightly in conflict with one of the main features, which is to on-demand download them only when needed. Plus we already have to deal with libmodulemd separately, and also we're not using the literal createrepo_c binary, where the support would primarily be added. Our main requirements are just that it needs to be possible to hand the RPM file directly to the library one at a time to incrementally build up the xml instead of pointing it at a directory, but that's fundamentally an appstream requirement. It might already be possible, I haven't looked deeply at the APIs recently and can't look it up at the moment. |
On Debian, we pull the packages on-demand via a network mount. On Ubuntu,
It's not really a requirement. On Debian we do have to scan multiple packages (simply because icons may be split out into a -data package and we do need to find those for AppStream), but we do later build the YAML data incrementally from cached data and don't need to re-scan stuff that was already scanned. Same applies to Arch & Co which do use the XML format. |
This. And in Copr, this is on a different level of magnitude. After each build, we call createrepo_c. I.e., every 2 minutes. And if appstream-builder runs 10 minutes... |
@xsuchy By the way, I would make sure COPR gets upgraded to createrepo_c 0.20.1 for the
Strictly speaking, couldn't You mentioned that building the appstream metadata is very expensive in terms of memory - well, so is |
Separate question: Does appstream metadata really need to be generated for all packages, or only the latest version of each package present? Because that's another potential optimization and one that |
Only the latest version, and So, for newer implementations, please use |
Right, that's what I meant, it's not helpful to have all these tools with similar names :) So if that's the case, then maybe COPR should start with using a workflow based on
|
@dralley That woud make sense to at least test :-) I see two potential issues with asgen: 1) It's written in D, so compared to C has limited platform support (may not be a problem, arm64 and amd64 are well supported) and 2) It was originally written for Debian/Ubuntu, so it might have a bunch of workflow assumptions that do not jive well at all with createrepo_c's expected workflow. |
Spinning off to separate discussion ximion/appstream-generator#104 |
Probably I overlooked something, but is there any support of merging AppStream metadata in any of libraries? |
I've run some tests, but sadly there's no obvious difference. I replied directly into #323. |
I believe both libappstream and libappstream-glib support this. |
Mergin stuff is most painless on a per-component level (replace one component entirely with another). Both libraries do also support merging data between components, but that's an extremely messy thing (and leads to annoying issues later, when you want to figure out where broken metadata actually came from...). So, "replace component if the same name" is IMHO the better thing to do when merging :-) |
I also don't think any sophisticated merging is needed, at least if I'm not missing something. On metadata Not sure about the library API, but if there's a way to point the library at RPM filename, and get the metadata, this sounds like a perfectly valid (opt-in, both on build and runtime) RFE. |
LibAppStream can do the merging you want without any issues, however:
That isn't so simple, unfortunately. Because packagers like to split data across multiple packages, e.g. place icons in a |
The appstream-builder (the utility used for modifying metadata) is very I/O demanding. It is known to fail even for not really huge projects. Related: hughsie/appstream-glib#301 The tool is also about to be replaced by appstream-generator or alike: rpm-software-management/createrepo_c#75 Relates: fedora-copr#2358 Fixes: fedora-copr#2419
The appstream-builder (the utility used for modifying metadata) is very I/O demanding. It is known to fail even for not really huge projects. Related: hughsie/appstream-glib#301 The tool is also about to be replaced by appstream-generator or alike: rpm-software-management/createrepo_c#75 Relates: fedora-copr#2358 Fixes: fedora-copr#2419
The appstream-builder (the utility used for modifying metadata) is very I/O demanding. It is known to fail even for not really huge projects. Related: hughsie/appstream-glib#301 The tool is also about to be replaced by appstream-generator or alike: rpm-software-management/createrepo_c#75 Relates: fedora-copr#2358 Fixes: fedora-copr#2419
The appstream-builder (the utility used for modifying metadata) is very I/O demanding. It is known to fail even for not really huge projects. Related: hughsie/appstream-glib#301 The tool is also about to be replaced by appstream-generator or alike: rpm-software-management/createrepo_c#75 Relates: #2358 Fixes: #2419
A lot of people ship free and nonfree code in addon yum repos for Fedora. They add the rpms to a directory, run createrepo_c and then tell the world about their awesome new repo. Some even get selected by the Fedora workstation group to be included by default in Fedora. The new users fire up gnome-software or apper and searches for the awesome new tool, but nothing is found. I normally have to point them at https://blogs.gnome.org/hughsie/2016/04/27/3rd-party-fedora-repositories-and-appstream/ and get them to update their release tooling.
Could we move to a model where createrepo_c automatically generates the AppStream metadata (either default on, or default off) by calling the appstream-builder executable if it is installed? The other alternative is I write a patch for createrepo_c to use libappstream-builder.so, but that has some deps that you might find unpalatable. I'm open for ideas and am willing to write patches if you agree if this is something you'd permit me to do.
Thanks!
The text was updated successfully, but these errors were encountered: