Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefix the updateinfo file with its hash #35

Merged
merged 7 commits into from Mar 12, 2014
Merged

Prefix the updateinfo file with its hash #35

merged 7 commits into from Mar 12, 2014

Conversation

bochecha
Copy link
Contributor

@bochecha bochecha commented Mar 4, 2014

All yum repository metadata files are prefixed by their hash. (except repomd.xml, for obvious reasons)

For example:

# sha256sum *
26ccf3fcca97e8dc2f8a3ce6d4b2f49e482b1d8d51cdf93438cdc08e3d4b6e4a  26ccf3fcca97e8dc2f8a3ce6d4b2f49e482b1d8d51cdf93438cdc08e3d4b6e4a-primary.xml.gz

However, the updateinfo one is not, which is a bit of an eyesore. 馃槈

It's more than just cosmetics and consistency, though, this is causing us a real issue.

At work, we use the Amazon CDN to implement mirroring of our Yum repositories. All files are cached on the CDN for the default duration (i think it's 24 hours), except repomd.xml which is never cached.

This works great for RPMS (they never change once published) and all the metadata files (their names are prefixed with their hash, and as such the CDN sees them as different files).

However, we keep running into trouble with updateinfo.xml.gz: the CDN keeps distributing the old file even though a new one was generated.

Because the updateinfo metadata can be quite big, we'd rather prefix its name with a hash (like other Yum repository metadata), and let the CDN handle it, rather than telling the CDN to never cache it, which would put more load on our primary repository server.

@bochecha
Copy link
Contributor Author

bochecha commented Mar 4, 2014

So, don't merge this right now. The unit tests pass, but I'd like to give it some more testing before I feel confident enough in this change.

However, I wanted to make a pull request ASAP so that we can start the discussion around the change.

All in all, is this a change you'd want?

I strongly feel that it's a good change, but I'd understand if you don't want it. After all, it doesn't really matter for Fedora.

Mathieu Bridon added 4 commits March 4, 2014 17:50
This is similar to what is done with other Yum repository metadata.

It is not just about consistency though, it fixes real issues.

For example, at work, we use the Amazon CDN to implement mirroring of
our Yum repositories. All files are cached on the CDN for the default
duration (i think it's 24 hours), except repomd.xml which is never
cached.

This works great for RPMS (they never change once published) and all
the metadata files (their names are prefixed with their hash, and as
such the CDN sees them as different files).

However, we keep running into trouble with updateinfo.xml.gz: the CDN
keeps distributing the old file even though a new one was generated.

Because the updateinfo metadata can be quite big, we'd rather prefix
its name with a hash (like other Yum repository metadata), and let the
CDN handle it, rather than telling the CDN to never cache it, which
would put more load on our primary repository server.
All other Yum repository metadata are already using this stronger hash,
so let's be consistent and more secure.
If someone tries to uncomment these for debugging purpose, they might
not realize that they don't reflect reality any more, and might waste
lots of time trying to figure out what is going on.
Now that we prefix the file name with its hash, each written file will
have a different name.

As a result, we need to remove the previous versions when we load it,
otherwise we end up accumulating updateinfo.xml.gz metadata files.
@lmacken
Copy link
Contributor

lmacken commented Mar 6, 2014

I agree that the filename inconsistency is an eyesore, and I'm definitely in favor of fixing it.

So bodhi.modifyrepo needs to go away. It went upstream into createrepo a long time ago, and has since gained support for unique metadata naming.

I haven't tested this, but I think we might be able to do something like:

--- a/bodhi/metadata.py
+++ b/bodhi/metadata.py
@@ -28,9 +28,12 @@ from urlgrabber.grabber import URLGrabError, urlgrab
 from bodhi.util import get_repo_tag
 from bodhi.model import PackageBuild, PackageUpdate
 from bodhi.buildsys import get_session
-from bodhi.modifyrepo import RepoMetadata
 from bodhi.exceptions import RepositoryNotFound

+import sys
+sys.path.insert(0, '/usr/share/createrepo')
+from modifyrepo import RepoMetadata
+
 from yum.update_md import UpdateMetadata

 log = logging.getLogger(__name__)
@@ -320,6 +323,7 @@ class ExtendedMetadata(object):
         for arch in os.listdir(self.repo):
             try:
                 repomd = RepoMetadata(join(self.repo, arch, 'repodata'))
+                repomd.unique_md_filenames = True
                 log.debug("Inserting updateinfo.xml.gz into %s/%s" % (self.repo, arch))
                 repomd.add(self.doc)
             except RepositoryNotFound:
@@ -336,6 +340,7 @@ class ExtendedMetadata(object):
                 urlgrab(tags_url, filename=local_tags)
                 for arch in os.listdir(self.repo):
                     repomd = RepoMetadata(join(self.repo, arch, 'repodata'))
+                    repomd.unique_md_filenames = True
                     repomd.add(local_tags)
             except Exception, e:
                 log.exception(e)

@bochecha
Copy link
Contributor Author

bochecha commented Mar 6, 2014

So bodhi.modifyrepo needs to go away. It went upstream into createrepo a long time ago, and has since gained support for unique metadata naming.

Amazing!

I'll rework the patches to do that, then.

@bochecha
Copy link
Contributor Author

bochecha commented Mar 6, 2014

So, I've been on this for a few hours, and it's not great.

As a first test, I tried merely replacing bodhi/modifyrepo.py by /usr/share/creatrepo/modifyrepo.py, just for easy testing.

Unfortunately, that all breaks on EL6: https://bugzilla.redhat.com/show_bug.cgi?id=1073239

The easy way out is to have bodhi/metadata.py write its minidom.Document instance to a temp file, and pass the path to this file instead of the minidom.Document instance.

But then: https://bugzilla.redhat.com/show_bug.cgi?id=1073256

Mathieu Bridon added 3 commits March 12, 2014 19:51
Now that we prefix the updateinfo metadata with its hash, the Bodhi
masher must be taught to find it.

The bonus point of using a glob is that it even handles migration from
non-prefixed to prefixed file name. :)
This assertion was failing due to a syntax error in my masher/updateinfo
code.

Adding this message allowed me to find it quickly, and it could be
useful in the future.
@bochecha
Copy link
Contributor Author

Just found an issue with the first implementation, so I just pushed a fix.

I finally found the time to test it, and our production instance is now running with these changes applied, happily prefixing the updateinfo.xml.gz file with its hash. 馃槃

@lmacken
Copy link
Contributor

lmacken commented Mar 12, 2014

Awesome! Looks good to me 馃憤

lmacken added a commit that referenced this pull request Mar 12, 2014
Prefix the updateinfo file with its hash
@lmacken lmacken merged commit 3a3cc31 into fedora-infra:develop Mar 12, 2014
@bochecha bochecha deleted the updateinfo branch March 13, 2014 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants