Skip to content
This repository

Debian mirrors HTTP redirector

tree: 164392ee6a
README
Intro
=====

This is a work in progress. Please do send patches and provide feedback.
Thanks!

The project is similar to mirrorbrain (.org) and fedora's mirrors
system. However, it has a few differences (this list is not intended to
be complete):

* it is very specific to the way Debian mirrors are constructed.
Details regarding architectures and the different mirror types are
taken into consideration.
* because of the previous point and considering many mirrors only
support http, it does not perform a full mirror scan. Mirrorbrain does.
There's a tool to detect inconsistencies between what the mirrors master
list claims a mirror contains and what it actually contains.
* it aims to be httpd-independent. Mirrorbrain requires apache.
* IPv6 support
* no DBMS. Although using a DBMS could provide some advantages, at the
moment the Storable database seems to be enough. The idea is to keep
everything simple.
* easy deployment

Live instance
=============

There's a live instance of this code (but not necessarily the latest
and greatest revision) at http://http.debian.net/

There's some more documentation there. It should be imported into the
repository, however.

TODO
====

There are some TODOs and FIXMEs in the source code, additionally, what
needs to be done is:

* Switch to Plack. Plack allows the redirector to run as a cgi or the many
variations that exist. Using it should also make it easier to add an
option to serve the file ourselves. Some mirror admins are interested
in doing that to offload their mirrors.

* Use the zebra dump parser to, try to, make better decisions.

* Improve the mirror checker. Monitor mirrors (probably by ping), check
IPv6 support (one time check, like --check-architectures)

* Better IPv6 support. Better handling of teredo clients.

* Consider splitting mirrors on subsets based on the master trace
stamp.

Getting started
===============

Required packages:
    libcgi-simple-perl
    libgeo-ip-perl
    wget

Run ./update.sh, it will download the geoip databases, the mirrors
list, build the database used by the redirector, and check the mirrors
for errors.

Check the first lines of redir.pl for the invocation (or look at the
example below.)

Keeping everything in shape
===========================

update.sh should be run at least once a month.
update.pl should be run whenever the master list changes[1]
check.pl should be run multiple times a day[2]

[1] the script rebuilds the database, so any info collected by check.pl
regarding the availability of mirrors is lost.
check.pl --check-architectures should be run after update.pl, this is
done automatically when running update.sh.

[2] it really depends on the kind of setup one wants.

Real life testing
=================

If using apache, assuming you have the redir script in a /cgi alias
with +execcgi, you can:

   RewriteEngine On
   RewriteRule ^/?debian-(security|backports)/(.*) /cgi/redir.pl?mirror=$1&url=$2 [PT]
   RewriteRule ^/?debian-archive/(.*) /cgi/redir.pl?mirror=old&url=$1 [PT]
   RewriteRule ^/?debian/(.*) /cgi/redir.pl?mirror=archive&url=$1 [PT]

   # mirror:// method support:
   RewriteRule ^/?debian-(security|backports)\.list(?:$|\?(.+)) /cgi/redir.pl?mirror=$1.list$2 [QSA,PT]
   RewriteRule ^/?debian-archive\.list(?:$|\?(.+)) /cgi/redir.pl?mirror=old.list$1 [QSA,PT]
   RewriteRule ^/?debian\.list(?:$|\?(.+)) /cgi/redir.pl?mirror=archive.list$1 [QSA,PT]

You can for example make it listen on 127.0.1.10, setup a vhost, and
use the following on your sources.list:

    deb http://127.0.1.10/debian/ sid main
    deb-src http://127.0.1.10/debian/ sid main

    deb http://127.0.1.10/debian-security/ testing/updates main
    deb http://security.debian.org/ testing/updates main

Note: accessing the redirector from a local IP address is not ideal and
may only work with hacks.

Assumptions
===========

The current IPv6 support is based on the following assumptions on the
mirror's version 6 connectivity:
* The IPv4 and IPv6 addresses belong to the same server
* The IPv4 and IPv6 addresses are on the same AS

Understanding the db
====================

The database consists of (mostly inverted) indexes that are supposed to
provide fast and cheap lookups.

In order to save space on the database, a few unusual things are done.
For example, hash entries with undef as value are valid. undef is smaller
in a Storable database than an integer.
Any script using the database should therefore test for 'exists' instead
of 'defined'.

To better understand what the database looks like, run ./dump-db.pl | pager

Credits
=======

"This product includes GeoLite data created by MaxMind, available from
http://maxmind.com/"
Something went wrong with that request. Please try again.