New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefix the updateinfo file with its hash #35
Conversation
So, don't merge this right now. The unit tests pass, but I'd like to give it some more testing before I feel confident enough in this change. However, I wanted to make a pull request ASAP so that we can start the discussion around the change. All in all, is this a change you'd want? I strongly feel that it's a good change, but I'd understand if you don't want it. After all, it doesn't really matter for Fedora. |
This is similar to what is done with other Yum repository metadata. It is not just about consistency though, it fixes real issues. For example, at work, we use the Amazon CDN to implement mirroring of our Yum repositories. All files are cached on the CDN for the default duration (i think it's 24 hours), except repomd.xml which is never cached. This works great for RPMS (they never change once published) and all the metadata files (their names are prefixed with their hash, and as such the CDN sees them as different files). However, we keep running into trouble with updateinfo.xml.gz: the CDN keeps distributing the old file even though a new one was generated. Because the updateinfo metadata can be quite big, we'd rather prefix its name with a hash (like other Yum repository metadata), and let the CDN handle it, rather than telling the CDN to never cache it, which would put more load on our primary repository server.
All other Yum repository metadata are already using this stronger hash, so let's be consistent and more secure.
If someone tries to uncomment these for debugging purpose, they might not realize that they don't reflect reality any more, and might waste lots of time trying to figure out what is going on.
Now that we prefix the file name with its hash, each written file will have a different name. As a result, we need to remove the previous versions when we load it, otherwise we end up accumulating updateinfo.xml.gz metadata files.
I agree that the filename inconsistency is an eyesore, and I'm definitely in favor of fixing it. So I haven't tested this, but I think we might be able to do something like: --- a/bodhi/metadata.py
+++ b/bodhi/metadata.py
@@ -28,9 +28,12 @@ from urlgrabber.grabber import URLGrabError, urlgrab
from bodhi.util import get_repo_tag
from bodhi.model import PackageBuild, PackageUpdate
from bodhi.buildsys import get_session
-from bodhi.modifyrepo import RepoMetadata
from bodhi.exceptions import RepositoryNotFound
+import sys
+sys.path.insert(0, '/usr/share/createrepo')
+from modifyrepo import RepoMetadata
+
from yum.update_md import UpdateMetadata
log = logging.getLogger(__name__)
@@ -320,6 +323,7 @@ class ExtendedMetadata(object):
for arch in os.listdir(self.repo):
try:
repomd = RepoMetadata(join(self.repo, arch, 'repodata'))
+ repomd.unique_md_filenames = True
log.debug("Inserting updateinfo.xml.gz into %s/%s" % (self.repo, arch))
repomd.add(self.doc)
except RepositoryNotFound:
@@ -336,6 +340,7 @@ class ExtendedMetadata(object):
urlgrab(tags_url, filename=local_tags)
for arch in os.listdir(self.repo):
repomd = RepoMetadata(join(self.repo, arch, 'repodata'))
+ repomd.unique_md_filenames = True
repomd.add(local_tags)
except Exception, e:
log.exception(e) |
Amazing! I'll rework the patches to do that, then. |
So, I've been on this for a few hours, and it's not great. As a first test, I tried merely replacing Unfortunately, that all breaks on EL6: https://bugzilla.redhat.com/show_bug.cgi?id=1073239 The easy way out is to have But then: https://bugzilla.redhat.com/show_bug.cgi?id=1073256 |
Now that we prefix the updateinfo metadata with its hash, the Bodhi masher must be taught to find it. The bonus point of using a glob is that it even handles migration from non-prefixed to prefixed file name. :)
This assertion was failing due to a syntax error in my masher/updateinfo code. Adding this message allowed me to find it quickly, and it could be useful in the future.
Just found an issue with the first implementation, so I just pushed a fix. I finally found the time to test it, and our production instance is now running with these changes applied, happily prefixing the |
Awesome! Looks good to me 馃憤 |
Prefix the updateinfo file with its hash
All yum repository metadata files are prefixed by their hash. (except
repomd.xml
, for obvious reasons)For example:
However, the updateinfo one is not, which is a bit of an eyesore. 馃槈
It's more than just cosmetics and consistency, though, this is causing us a real issue.
At work, we use the Amazon CDN to implement mirroring of our Yum repositories. All files are cached on the CDN for the default duration (i think it's 24 hours), except
repomd.xml
which is never cached.This works great for RPMS (they never change once published) and all the metadata files (their names are prefixed with their hash, and as such the CDN sees them as different files).
However, we keep running into trouble with
updateinfo.xml.gz
: the CDN keeps distributing the old file even though a new one was generated.Because the updateinfo metadata can be quite big, we'd rather prefix its name with a hash (like other Yum repository metadata), and let the CDN handle it, rather than telling the CDN to never cache it, which would put more load on our primary repository server.