Possibility to auto-disable outdated mirrors #150

Open
poeml opened this Issue Jun 5, 2015 · 0 comments

1 participant

@poeml
Owner
                                                                               [          ]

Issue migrated (2015-06-05) from old issue tracker http://mirrorbrain.org/issues/issue150

Title    Possibility to auto-disable outdated mirrors
 Priority   feature      Status    chatting
Superseder             Nosy List   poeml, rhertzog
Assigned To poeml       Keywords

msg543 (view) Author: rhertzog Date: 2014-02-17.15:14:50

mirrorbrain regularly checks that mirrors are online and working but it doesn't
detect mirrors that are stale and outdated. It would be really useful if we
could teach MirrorBrain how to detect outdated mirrors so that it could disable
them automatically.

The simple answer would be to have a parameter that we can point to a script
that will test the mirror and let mirrorbrain know if it's up-to-date (exit
code=0), outdated (exit code=1) or if there was an error (any other exit code).
The informations about the mirror to check would be provided either via
environment variables or via command line parameters. That way we can implement
any policy... but it requires scripting skills.

Another approach could be to define a path on the mirror that must be in sync
between the mirrors (same size and same SHA1 checksum) and the master copy to
consider the mirror up-to-date. But since synchronizations takes time, we must
be able to define some grace period before deciding to disable the mirror.

Or better, we could implement the first setting and provide a sample script that
implements the second solution while hooking into the mirrobrain.conf to get the
required parameters.

msg547 (view) Author: poeml Date: 2014-02-20.01:23:05

Very good idea. This would make MirrorBrain useful in more scenarios.
The current mirror checking is so minimal, that it's amazing that we got
so far with it. Historically, checking mirror freshness was neglected
since it's okay for file trees where files never change in-place, but
have their names changing (at least incrementing a counter). Thus, files
that change but keep their identical names were always a problem. At
openSUSE, requests on some of those files were never redirected to
mirrors therefore. It may be complicated or impossible for admins to get
rid of those files, of course.

Fedora solved the same issue by having their redirector replying with a
Metalink with a Metalink protocol extension, that lists several
variants of a file (which might be encountered on a mirror). The
redirector effectively tells the client, if the mirror has this file it's
okay, and if has a different file, it's also okay.

Scanning the mirrors more deeply, including mtime, file size and
calculating hashes isn't really realistic in many cases I think (it
might be in some of course). A compromise could be mtime and file size,
same as rsync does it (unless forced to look into files with -c).
But only rsync scanning would achieve this reliably. HTTP scanning is
more fragile, and FTP scanning isn't perfect either (character set
issues, time format not standardized).

This just as background. The idea to check the sync status of the
mirrors would be a big step forward.

I agree with making the check adaptable, and creating a useful default
check. There's a script to create a small timestamp file, which could be
used to detect the "sync age". Another check could be for a certain
arbitrary file. It would be easy to say "mb, use only mirrors that have
file foo" or "mb, use only mirrors where timestamp is not older than 12
hours" or "mb, use only mirrors where the content of file bar is
identical to our local copy".

A mirmon-like status report could be generated at the same time.

Several times, I wonder whether /etc/mirrorbrain.conf should contain a
setting for the DocRoot of Apache (which is the root of the file tree).
That would be very handy to implement checks, create timestamps and
further things from a 'mb' command with few effort for the user.
(The 'mb makehashes' call would also be less complicated, and less
error-prone.) This setting is needed I think.

Further notes:

  • I committed a small function in r8481 that serves to find a random file in a local file tree, which could be used for some fully automatic test (the admin doesn't even need to specify a file then). A function that I recently wrote when I felt that mirror checking needed to be advanced finally...
  • There's 'mb test', which doesn't do much yet, but could be the container for the new functionality. (I also need to check what kind of functionality is in mb/mb/testmirror.py, maybe there's something useful already.)
  • Especially for a mirror that's newly added to the database, the first thing that one wants to know is if the mirror is working and if it was correctly configured (the mirror itself, but also its URLs in the mb database). It should be easy to run a test and see if everything is fine. Thinking of automatic plausibility tests...
History
         Date           User   Action            Args
2014-02-20 01:23:05 poeml    set    messages: + msg547
                                      status: unread -> chatting
2014-02-17 21:29:18 poeml    set    assignedto: poeml
                                      nosy: + poeml
2014-02-17 15:14:50 rhertzog create

(end of migrated issue)
@poeml poeml added the enhancement label Jun 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment