Title geographic distance ordering for finer mirror selection
Priority wish Status resolved
Superseder Nosy List poeml, theuni
Assigned To poeml Keywords
In some countries (US, Germany) there can be a wealth of mirrors. A specifically
suited mirror can be picked when there's on in the same AS or in the same
network as the client. But otherwise, any mirror from the country will be
picked, which could be from the different end of the country (East coast, West
coast in the us; north/south in Germany).
Another scenario is that no mirror is found in the client's country, but there
are several in its continent. However, which one to choose? Currently, it is a
random choice, which means that, for instance, any European country could be
sent to any European mirror; in most cases, it would probably be better to use a
neighbouring country or the closest one that can be found.
Here's how a finer mirror selection could be achieved in both these cases.
The GeoLite city(!) database provides geographical coordinates. The coordinates
of the mirrors could be filled into the database's mirror records when mirrors
are created, or later, similar as their network and AS data.
When clients' IP addresses are looked up via GeoIP during request processing,
their geographical coordinates would (should) be available as well.
Using a planar approximation of the Haversine formula, the distance to available
mirrors can be calculated, and mirrors sorted by their closeness to the client.
A simple approximation, that is not only easy to implement but also should incur
very low possible overhead, would be described by the following formular:
distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)
Thus, the available mirrors can be ordered according to their distance to the
http://www.movable-type.co.uk/scripts/latlong.html (Haversine formula)
http://www.movable-type.co.uk/scripts/gis-faq-5.1.html (simple approximation)
(The simple approximation should be suitable insofar the 180° border is pretty
much between Alaska and Russia, and other than islands in the Pacific Ocean all
countries should be usefully covered by it. About the Pacific Ocean I don't know
much, but it is very likely that those clients resolve to satellite links
anyway, and need to be treated specially.)
There is one problem, though: We use a weighted randomization to assign mirrors
that would be equally suitable. That is useful (and used) for load balancing.
With introducing ordering by geographic distance, the question arises how to use
that together with mirror priorities. Alternatives / thoughts:
Should we let geographic distance override priorities? Can the latter become
obsolete? I don't really think so.
Should we use geographic ordering only in certain scenarios: when no country
mirror is available, and when dealing with certain countries (US)?
Should we use just stratify according to geographic distances, instead of
sorting by distance? That is, select the 3 or 5 mirrors closest by distance, and
then do the weighted randomization on them to pick one?
Should the configured priority value and the measured geographic distance be
mixed together into a single number, that's then used for mirror
Should we do geographic distance ordering only within a set of mirrors that
shares the same priority?
At any rate, the static assignment of fallback mirrors (through the
other_countries field) should have precedence. For instance, Russian clients
should still be sent to German mirrors according to configured country-level
fallback, instead of sending them to China, because it is closer, while China
and Russia entertain few connectivity.
Since geographic distance itself is only a rough approximation of suitability
of a mirror for a client, certain analogies that one could think of, like
"radius to serve", are relatively moot, and not worth implementing.
This needs to be discussed.
I just discovered that the database scheme already contains fields for longitude
and latitude -- I totally had forgotten that. They are just not shown by 'mb',
because they are not used. That's cool because no schema migration is needed...
Geographical coordinates are meanwhile stored in the database for mirrors.
(Was implemented around r7936-r7939).
I haven't received any comments on this, but I'd still regard geographical
distance as a very useful addition to the current mirror selection algorithm.
Something that I didn't realize earlier is that the city version of the GeoIP database
also contains "regions" on a level below country. Here are some examples:
# geoiplookup_city -f /var/lib/GeoIP/GeoLiteCity.dat.updated unixheads.net
Region id: CA
City: Redwood City
It works not only for US states, but also for German Bundesländer and French
# geoiplookup_city -f /var/lib/GeoIP/GeoLiteCity.dat.updated hive.aposso.de
Region id: 02
# geoiplookup_city -f /var/lib/GeoIP/GeoLiteCity.dat.updated ftp.free.fr
Region id: A8
So another approach would be to first introduce this additional level in mirror
selection. While this is a geographical approach, it may not give optimal results e.g.
in Germany, where there is at least one huge autonomous system (680) which comprises
most universities, but geographically spans whole Germany.
Another aspect is that when geographical mirror selection becomes too fine-grained,
then mirror priorization becomes less effective: while it is currently possible to
assign only a few requests to certain mirrors, they would then mandatorily get all
traffic from their "region".
But in the US, adding the GeoIP region (state level) would likely be beneficial.
Two arguments for geographical distance:
1) Mirror selection by geographical distance could make the "region" level in the
US superfluous, achieve the same, and it would work also for locations where there
is no mirror in the exact same state. By geographical distance, it would be easy to
pick a mirror in a neighbour state, instead of falling back to country level.
2) In addition, we could artificially "increase" distances of mirrors with low
priority, to attract more requests to farther, but more powerful mirrors -- and
thereby keeping excessive traffic away from small mirrors.
On the other hand, when we introduce "state" level in mirror selection, we would
also add a state_only flag that restricts mirror use to that state. Maybe also
other_states to add more states to deal with.
Also, an AS match would always take precedence, which helps if a mirror is in an AS
that spans many states.
Experimenting with geographical distance is promising. Here's the findings for a client in Belgium, with no
mirror in the country:
[Fri Country 'BE', Continent 'EU'
[Fri Nov 05 14:49:42 2010] [warn] [client 192.168.0.117] [mod_mirrorbrain] AS '5400', Prefix '18.104.22.168/19',
lat/lng 50.833302,4.333300 state id 11, state 'Brussels Hoofdstedelijk Gewest'
same region: ftp.astral.ro (score 100) (rank 6777729) (dist 19.69)
same region: labby.co.uk (score 80) (rank 8786956) (dist 7.08)
same region: ftp-stud.hs-esslingen.de (score 100) (rank 6322872) (dist 5.27)
same region: ftp.solnet.ch (score 100) (rank 8061858) (dist 4.86)
same region: ftp.cc.uoc.gr (score 100) (rank 8365641) (dist 25.94)
same region: neacm.fe.up.pt (score 100) (rank 502272) (dist 16.17)
same region: ftp.tu-chemnitz.de (score 100) (rank 2606190) (dist 8.58)
same region: ftp.free.fr (score 100) (rank 671658) (dist 2.80)
same region: ftp.fernuni-hagen.de (score 100) (rank 10110186) (dist 3.18)
same region: openoffice.dcc.fc.up.pt (score 100) (rank 3443310) (dist 16.17)
same region: mirror.switch.ch (score 100) (rank 2029689) (dist 5.46)
same region: ftp.uni-erlangen.de (score 100) (rank 3410610) (dist 6.79)
same region: ftp.udc.es (score 100) (rank 3767694) (dist 14.25)
same region: ooo.mirror.dkm.cz (score 100) (rank 2693499) (dist 10.16)
same region: openoffice.cict.fr (score 100) (rank 10623903) (dist 7.79)
same region: ftp.heanet.ie (score 50) (rank 19739735) (dist 10.87)
same region: ftp.snt.utwente.nl (score 100) (rank 8726322) (dist 2.92)
same region: ftp.proxad.net (score 100) (rank 8012808) (dist 2.80)
same region: ftp.sunet.se (score 80) (rank 12016420) (dist 16.07)
same region: ooo.mirror.garr.it (score 100) (rank 5669853) (dist 12.09)
By geographical distance, a mirror in France or Netherlands would have been chosen, which sounds better than e.g.
one in Portugal.
Example of a client in Germany:
Country 'DE', Continent 'EU'
AS '3320', Prefix '22.214.171.124/10', lat/lng 50.666698,12.616700 state id 13, state 'Sachsen'
same country: ftp-stud.hs-esslingen.de (score 100) (rank 4746078) (dist 3.92)
same country: ftp.tu-chemnitz.de (score 100) (rank 951243) (dist 0.34)
same country: ftp.fernuni-hagen.de (score 100) (rank 2549292) (dist 5.19)
same country: ftp.uni-erlangen.de (score 100) (rank 1006179) (dist 1.94)
Here, the mirror in Chemnitz (right around the corner) would have been preferred.
An example with another client in Germany, this time one in the huge AS 680 which sports a lot of mirrors (only 3
in my test setup):
Country 'DE', Continent 'EU'AS '680', Prefix '126.96.36.199/16', lat/lng 51.966702,7.633300 state id 07, state
same AS: ftp.tu-chemnitz.de (score 100) (rank 7522635) (dist 5.40)
same AS: ftp.fernuni-hagen.de (score 100) (rank 2116017) (dist 0.64)
same AS: ftp.uni-erlangen.de (score 100) (rank 3404724) (dist 4.12)
same country: ftp-stud.hs-esslingen.de (score 100) (rank 4702914) (dist 3.56)
Here, the mirror in Hagen is indeed much closer than the other two mirrors in the AS.
Too useful to not be implemented. Committed to trunk:
To be released with the upcoming version 2.14.0.
Looks very useful indeed peter, thanks!
The feature is in production at a few sites since a few days, and I got no
catastrophic failure reports so far.
On the centos-mirror mailing list, some interesting scenarios have been discussed
where the new feature is just what's needed!
Closing as "done"!
Date User Action Args
2010-11-11 16:23:55 poeml set status: testing -> resolved
messages: + msg309
2010-11-06 03:23:59 theuni set nosy: + theuni
messages: + msg306
2010-11-05 22:42:58 poeml set status: chatting -> testing
messages: + msg305
2010-11-05 14:10:47 poeml set messages: + msg304
2010-11-04 19:43:40 poeml set messages: + msg303
2010-11-04 19:32:15 poeml set messages: + msg302
2010-09-06 00:11:32 poeml set messages: + msg224
2010-09-05 23:53:25 poeml set assignedto: poeml
2010-04-23 03:11:34 poeml set messages: + msg184
2009-12-21 19:12:21 poeml set messages: + msg111
2009-12-11 15:56:04 poeml create