Debian mirrors HTTP redirector
Fetching latest commit…
Cannot retrieve the latest commit at this time
|dump-db.pl||Allow the file name of the db to be dumped to be specified as 1st arg|
Intro ===== This is a work in progress. Please do send patches and provide feedback. Thanks! The project is similar to mirrorbrain (.org) and fedora's mirrors system. However, it has a few differences (this list is not intended to be complete): * it is very specific to the way Debian mirrors are constructed. Details regarding architectures and the different mirror types are taken into consideration. * because of the previous point and considering many mirrors only support http, it does not perform a full mirror scan. Mirrorbrain does. There's a tool to detect inconsistencies between what the mirrors master list claims a mirror contains and what it actually contains. * it aims to be httpd-independent. Mirrorbrain requires apache. * IPv6 support * no DBMS. Although using a DBMS could provide some advantages, at the moment the Storable database seems to be enough. The idea is to keep everything simple. * easy deployment Live instance ============= There's a live instance of this code (but not necessarily the latest and greatest revision) at http://http.debian.net/ There's some more documentation there. It should be imported into the repository, however. TODO ==== There are some TODOs and FIXMEs in the source code, additionally, what needs to be done is: * Switch to Plack. Plack allows the redirector to run as a cgi or the many variations that exist. Using it should also make it easier to add an option to serve the file ourselves. Some mirror admins are interested in doing that to offload their mirrors. * Use the zebra dump parser to, try to, make better decisions. * Improve the mirror checker. Monitor mirrors (probably by ping), check IPv6 support (one time check, like --check-architectures) * Better IPv6 support. Better handling of teredo clients. * Consider splitting mirrors on subsets based on the master trace stamp. Getting started =============== Required packages: libcgi-simple-perl libgeo-ip-perl wget Run ./update.sh, it will download the geoip databases, the mirrors list, build the database used by the redirector, and check the mirrors for errors. Check the first lines of redir.pl for the invocation (or look at the example below.) Keeping everything in shape =========================== update.sh should be run at least once a month. update.pl should be run whenever the master list changes check.pl should be run multiple times a day  the script rebuilds the database, so any info collected by check.pl regarding the availability of mirrors is lost. check.pl --check-architectures should be run after update.pl, this is done automatically when running update.sh.  it really depends on the kind of setup one wants. Real life testing ================= If using apache, assuming you have the redir script in a /cgi alias with +execcgi, you can: RewriteEngine On RewriteRule ^/?debian-(security|backports)/(.*) /cgi/redir.pl?mirror=$1&url=$2 [PT] RewriteRule ^/?debian-archive/(.*) /cgi/redir.pl?mirror=old&url=$1 [PT] RewriteRule ^/?debian/(.*) /cgi/redir.pl?mirror=archive&url=$1 [PT] # mirror:// method support: RewriteRule ^/?debian-(security|backports)\.list(?:$|\?(.+)) /cgi/redir.pl?mirror=$1.list$2 [QSA,PT] RewriteRule ^/?debian-archive\.list(?:$|\?(.+)) /cgi/redir.pl?mirror=old.list$1 [QSA,PT] RewriteRule ^/?debian\.list(?:$|\?(.+)) /cgi/redir.pl?mirror=archive.list$1 [QSA,PT] You can for example make it listen on 127.0.1.10, setup a vhost, and use the following on your sources.list: deb http://127.0.1.10/debian/ sid main deb-src http://127.0.1.10/debian/ sid main deb http://127.0.1.10/debian-security/ testing/updates main deb http://security.debian.org/ testing/updates main Note: accessing the redirector from a local IP address is not ideal and may only work with hacks. Assumptions =========== The current IPv6 support is based on the following assumptions on the mirror's version 6 connectivity: * The IPv4 and IPv6 addresses belong to the same server * The IPv4 and IPv6 addresses are on the same AS Understanding the db ==================== The database consists of (mostly inverted) indexes that are supposed to provide fast and cheap lookups. In order to save space on the database, a few unusual things are done. For example, hash entries with undef as value are valid. undef is smaller in a Storable database than an integer. Any script using the database should therefore test for 'exists' instead of 'defined'. To better understand what the database looks like, run ./dump-db.pl | pager Credits ======= "This product includes GeoLite data created by MaxMind, available from http://maxmind.com/"