geographic distance ordering for finer mirror selection #29

poeml opened this Issue Jun 5, 2015 · 0 comments


None yet

1 participant

poeml commented Jun 5, 2015
                                                                                                                 [          ]

Issue migrated (2015-06-05) from old issue tracker

Title    geographic distance ordering for finer mirror selection
 Priority   wish            Status       resolved
Superseder                Nosy List      poeml, theuni
Assigned To poeml          Keywords

msg104 (view) Author: poeml Date: 2009-12-11.15:56:03

In some countries (US, Germany) there can be a wealth of mirrors. A specifically
suited mirror can be picked when there's on in the same AS or in the same
network as the client. But otherwise, any mirror from the country will be
picked, which could be from the different end of the country (East coast, West
coast in the us; north/south in Germany).

Another scenario is that no mirror is found in the client's country, but there
are several in its continent. However, which one to choose? Currently, it is a
random choice, which means that, for instance, any European country could be
sent to any European mirror; in most cases, it would probably be better to use a
neighbouring country or the closest one that can be found.

Here's how a finer mirror selection could be achieved in both these cases.

The GeoLite city(!) database provides geographical coordinates. The coordinates
of the mirrors could be filled into the database's mirror records when mirrors
are created, or later, similar as their network and AS data.

When clients' IP addresses are looked up via GeoIP during request processing,
their geographical coordinates would (should) be available as well.

Using a planar approximation of the Haversine formula, the distance to available
mirrors can be calculated, and mirrors sorted by their closeness to the client.
A simple approximation, that is not only easy to implement but also should incur
very low possible overhead, would be described by the following formular:

distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)

Thus, the available mirrors can be ordered according to their distance to the

References: (Haversine formula) (simple approximation)

(The simple approximation should be suitable insofar the 180° border is pretty
much between Alaska and Russia, and other than islands in the Pacific Ocean all
countries should be usefully covered by it. About the Pacific Ocean I don't know
much, but it is very likely that those clients resolve to satellite links
anyway, and need to be treated specially.)

There is one problem, though: We use a weighted randomization to assign mirrors
that would be equally suitable. That is useful (and used) for load balancing.
With introducing ordering by geographic distance, the question arises how to use
that together with mirror priorities. Alternatives / thoughts:

  • Should we let geographic distance override priorities? Can the latter become
    obsolete? I don't really think so.
  • Should we use geographic ordering only in certain scenarios: when no country
    mirror is available, and when dealing with certain countries (US)?
  • Should we use just stratify according to geographic distances, instead of
    sorting by distance? That is, select the 3 or 5 mirrors closest by distance, and
    then do the weighted randomization on them to pick one?
  • Should the configured priority value and the measured geographic distance be
    mixed together into a single number, that's then used for mirror
  • Should we do geographic distance ordering only within a set of mirrors that
    shares the same priority?
  • At any rate, the static assignment of fallback mirrors (through the
    other_countries field) should have precedence. For instance, Russian clients
    should still be sent to German mirrors according to configured country-level
    fallback, instead of sending them to China, because it is closer, while China
    and Russia entertain few connectivity.
  • Since geographic distance itself is only a rough approximation of suitability
    of a mirror for a client, certain analogies that one could think of, like
    "radius to serve", are relatively moot, and not worth implementing.

This needs to be discussed.

msg111 (view) Author: poeml Date: 2009-12-21.19:12:21

I just discovered that the database scheme already contains fields for longitude
and latitude -- I totally had forgotten that. They are just not shown by 'mb',
because they are not used. That's cool because no schema migration is needed...

msg184 (view) Author: poeml Date: 2010-04-23.03:11:34

Geographical coordinates are meanwhile stored in the database for mirrors.

(Was implemented around r7936-r7939).

msg224 (view) Author: poeml Date: 2010-09-06.00:11:32

I haven't received any comments on this, but I'd still regard geographical
distance as a very useful addition to the current mirror selection algorithm.

msg302 (view) Author: poeml Date: 2010-11-04.19:32:15

Something that I didn't realize earlier is that the city version of the GeoIP database
also contains "regions" on a level below country. Here are some examples:

geoiplookup_city -f /var/lib/GeoIP/GeoLiteCity.dat.updated

Continent: NA
Country: US
Region id: CA
Region: California
City: Redwood City
Latitude: 37.491402
Longitude: -122.210999

It works not only for US states, but also for German Bundesländer and French

geoiplookup_city -f /var/lib/GeoIP/GeoLiteCity.dat.updated

Continent: EU
Country: DE
Region id: 02
Region: Bayern
City: Gunzenhausen
Latitude: 49.099998
Longitude: 10.750000

geoiplookup_city -f /var/lib/GeoIP/GeoLiteCity.dat.updated

Continent: EU
Country: FR
Region id: A8
Region: Ile-de-France
City: Paris
Latitude: 48.866699
Longitude: 2.333300

So another approach would be to first introduce this additional level in mirror
selection. While this is a geographical approach, it may not give optimal results e.g.
in Germany, where there is at least one huge autonomous system (680) which comprises
most universities, but geographically spans whole Germany.

Another aspect is that when geographical mirror selection becomes too fine-grained,
then mirror priorization becomes less effective: while it is currently possible to
assign only a few requests to certain mirrors, they would then mandatorily get all
traffic from their "region".

But in the US, adding the GeoIP region (state level) would likely be beneficial.

msg303 (view) Author: poeml Date: 2010-11-04.19:43:40

Two arguments for geographical distance:

  1. Mirror selection by geographical distance could make the "region" level in the
    US superfluous, achieve the same, and it would work also for locations where there
    is no mirror in the exact same state. By geographical distance, it would be easy to
    pick a mirror in a neighbour state, instead of falling back to country level.

  2. In addition, we could artificially "increase" distances of mirrors with low
    priority, to attract more requests to farther, but more powerful mirrors -- and
    thereby keeping excessive traffic away from small mirrors.

On the other hand, when we introduce "state" level in mirror selection, we would
also add a state_only flag that restricts mirror use to that state. Maybe also
other_states to add more states to deal with.

Also, an AS match would always take precedence, which helps if a mirror is in an AS
that spans many states.

msg304 (view) Author: poeml Date: 2010-11-05.14:10:47

Experimenting with geographical distance is promising. Here's the findings for a client in Belgium, with no
mirror in the country:

[Fri Country 'BE', Continent 'EU'
[Fri Nov 05 14:49:42 2010] [warn] [client] [mod_mirrorbrain] AS '5400', Prefix '',
lat/lng 50.833302,4.333300 state id 11, state 'Brussels Hoofdstedelijk Gewest'
same region: (score 100) (rank 6777729) (dist 19.69)
same region: (score 80) (rank 8786956) (dist 7.08)
same region: (score 100) (rank 6322872) (dist 5.27)
same region: (score 100) (rank 8061858) (dist 4.86)
same region: (score 100) (rank 8365641) (dist 25.94)
same region: (score 100) (rank 502272) (dist 16.17)
same region: (score 100) (rank 2606190) (dist 8.58)
same region: (score 100) (rank 671658) (dist 2.80)
same region: (score 100) (rank 10110186) (dist 3.18)
same region: (score 100) (rank 3443310) (dist 16.17)
same region: (score 100) (rank 2029689) (dist 5.46)
same region: (score 100) (rank 3410610) (dist 6.79)
same region: (score 100) (rank 3767694) (dist 14.25)
same region: (score 100) (rank 2693499) (dist 10.16)
same region: (score 100) (rank 10623903) (dist 7.79)
same region: (score 50) (rank 19739735) (dist 10.87)
same region: (score 100) (rank 8726322) (dist 2.92)
same region: (score 100) (rank 8012808) (dist 2.80)
same region: (score 80) (rank 12016420) (dist 16.07)
same region: (score 100) (rank 5669853) (dist 12.09)

By geographical distance, a mirror in France or Netherlands would have been chosen, which sounds better than e.g.
one in Portugal.

Example of a client in Germany:

Country 'DE', Continent 'EU'
AS '3320', Prefix '', lat/lng 50.666698,12.616700 state id 13, state 'Sachsen'
same country: (score 100) (rank 4746078) (dist 3.92)
same country: (score 100) (rank 951243) (dist 0.34)
same country: (score 100) (rank 2549292) (dist 5.19)
same country: (score 100) (rank 1006179) (dist 1.94)

Here, the mirror in Chemnitz (right around the corner) would have been preferred.

An example with another client in Germany, this time one in the huge AS 680 which sports a lot of mirrors (only 3
in my test setup):

Country 'DE', Continent 'EU'AS '680', Prefix '', lat/lng 51.966702,7.633300 state id 07, state
same AS: (score 100) (rank 7522635) (dist 5.40)
same AS: (score 100) (rank 2116017) (dist 0.64)
same AS: (score 100) (rank 3404724) (dist 4.12)
same country: (score 100) (rank 4702914) (dist 3.56)

Here, the mirror in Hagen is indeed much closer than the other two mirrors in the AS.

msg305 (view) Author: poeml Date: 2010-11-05.22:42:58

Too useful to not be implemented. Committed to trunk:

To be released with the upcoming version 2.14.0.

msg306 (view) Author: theuni Date: 2010-11-06.03:23:58

Looks very useful indeed peter, thanks!


msg309 (view) Author: poeml Date: 2010-11-11.16:23:54

The feature is in production at a few sites since a few days, and I got no
catastrophic failure reports so far.

On the centos-mirror mailing list, some interesting scenarios have been discussed
where the new feature is just what's needed!

Closing as "done"!

         Date          User  Action            Args
2010-11-11 16:23:55 poeml  set    status: testing -> resolved
                                    messages: + msg309
2010-11-06 03:23:59 theuni set    nosy: + theuni
                                    messages: + msg306
2010-11-05 22:42:58 poeml  set    status: chatting -> testing
                                    messages: + msg305
2010-11-05 14:10:47 poeml  set    messages: + msg304
2010-11-04 19:43:40 poeml  set    messages: + msg303
2010-11-04 19:32:15 poeml  set    messages: + msg302
2010-09-06 00:11:32 poeml  set    messages: + msg224
2010-09-05 23:53:25 poeml  set    assignedto: poeml
2010-04-23 03:11:34 poeml  set    messages: + msg184
2009-12-21 19:12:21 poeml  set    messages: + msg111
2009-12-11 15:56:04 poeml  create

(end of migrated issue)
@poeml poeml closed this Jun 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment