New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block fastpokemap, currently 13% of tile usage. #78

Merged
merged 1 commit into from Aug 10, 2016

Conversation

Projects
None yet
10 participants
@zerebubuth
Copy link
Contributor

zerebubuth commented Aug 10, 2016

The extra load appears to be a major contributory factor to slow tile loading and the general melting of tile servers.

The owner of the site was contacted a short time ago, probably not long enough to have reacted yet. The site is definitely in violation of the acceptable use policy.

The question is; should we go ahead with the block immediately, or give some grace period?

@Firefishy

This comment has been minimized.

Copy link
Member

Firefishy commented Aug 10, 2016

👍

@Zverik

This comment has been minimized.

Copy link

Zverik commented Aug 10, 2016

Why two regexps instead of one?

@zerebubuth

This comment has been minimized.

Copy link
Contributor

zerebubuth commented Aug 10, 2016

To block *.fastpokemaps.com and fastpokemaps.com, but not thefastpokemaps.com. It could be done with one regexp if we wanted to be less specific. But there's no real problem using two.

@Zverik

This comment has been minimized.

Copy link

Zverik commented Aug 10, 2016

I'd use https?://([^.]*\.)?fastpokemap\.com/
Regarding the subject, I wonder if there is a bug in our tile serving stack (e.g. tilecdn / caches) that causes this load.

@zerebubuth

This comment has been minimized.

Copy link
Contributor

zerebubuth commented Aug 10, 2016

Arguably the bug is allowing any referer other than openstreetmap.org. But that's definitely a decision of policy rather than technical ability. What do you think the bug might be? All the tile cache and tile server setup is in Chef, and the munin shows what's going on, if you need to dig into details.

Regarding the regexp - yeah, that looks the same. Personally, I don't find it any clearer to read with the extra optional ()? section.

@gravitystorm

This comment has been minimized.

Copy link
Contributor

gravitystorm commented Aug 10, 2016

Block immediately.

  • Causing more requests than openstreetmap.org itself
  • Tileservers are completely overloaded
  • Mappers are being affected (outdated tiles, dropped rendering requests >200 per second on yevaud alone)

@openstreetmap-mirror openstreetmap-mirror merged commit 23c77c8 into openstreetmap:master Aug 10, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@harry-wood

This comment has been minimized.

Copy link
Contributor

harry-wood commented Aug 11, 2016

"We bypassed the block already" https://twitter.com/FastPokeMapCom/status/763641567454322688 And more recently this (and beginning of a conversation with me) https://twitter.com/FastPokeMapCom/status/763640367631695872

@Zverik

This comment has been minimized.

Copy link

Zverik commented Aug 11, 2016

So... What can we do? Can you get a sample of headers sent from this website and see what else can we use to block it?

(I was reluctant to block it, but after the owner's comments I'm all for it)

@Zverik

This comment has been minimized.

Copy link

Zverik commented Aug 11, 2016

That's what the browser sends. I expect a couple more added on the way.

@zerebubuth

This comment has been minimized.

Copy link
Contributor

zerebubuth commented Aug 11, 2016

One possibility is to block anything without a referer header. It seems that modern browsers allow a range of options for referer headers, but I don't see faking a referer as something the standard would allow. The downside of this option is that it will also affect anyone else currently not sending a referer (mostly apps), or people loading single tiles in their browsers.

Another option is to modify the Cross-Origin headers, returning Access-Control-Allow-Origin: $ORIGIN when $ORIGIN != fastpokemap.com. This is a more targeted block but, while it seems technically possible, I'm not sure it's simple (or possible?) to do in the squid proxy layer. @Firefishy does it sound possible?

@pnorman

This comment has been minimized.

Copy link
Contributor

pnorman commented Aug 11, 2016

The downside of this option is that it will also affect anyone else currently not sending a referer (mostly apps),

Apps should have a referer or non-browser user agent. If they don't have either they're violating the usage policy.

or people loading single tiles in their browsers.

This is more of a concern. It's not a common usecase for most sites, but it is relatively common for us.

Could we severely rate limit per IP requests with a browser user-agent and no referer?

@zerebubuth

This comment has been minimized.

Copy link
Contributor

zerebubuth commented Aug 11, 2016

Distinguishing a browser from a non-browser by UA is not easy. It might be possible, but we don't have the mechanism in place for doing that today. Most browsers do send "Mozilla" somewhere in their User-Agent, so perhaps that's good enough.

The delay pools / IP limits seem to be configurable, but definitely not something I'm familiar with. If anyone knows how Squid configs work then help would be greatly appreciated.

@harry-wood

This comment has been minimized.

Copy link
Contributor

harry-wood commented Aug 11, 2016

Seems the FastPokeMap developer had a change of heart: 1 2 3 . So that's nice. Wishing them well with setting up an OpenStreetMap powered tile server elsewhere (exactly what we'd encourage people to do!)

@tjhorner

This comment has been minimized.

Copy link

tjhorner commented Aug 12, 2016

"general melting of tile servers" 😆

@Komzpa

This comment has been minimized.

Copy link

Komzpa commented Aug 12, 2016

Does this mean that OSM can't handle the scale by 13%?

@zerebubuth

This comment has been minimized.

Copy link
Contributor

zerebubuth commented Aug 12, 2016

It's on top of a 66% increase in general traffic since April. So the additional 13% was the straw that broke the camel's back.

So; yes - we were already at full capacity before this happened, and there are plans to add a 3rd rendering server (although that may not resolve it unless we can also coordinate render jobs between them). Balancing budget priorities is difficult, though, as we've also had database server issues recently, and the budget isn't limitless 😞

@Zverik

This comment has been minimized.

Copy link

Zverik commented Aug 12, 2016

Does TileCDN help, or the load is specifically on rendering servers?

@pnorman

This comment has been minimized.

Copy link
Contributor

pnorman commented Aug 12, 2016

It helps, but it's the rendering servers which are over capacity.

Note: over capacity means failing to keep up with updates, not always a complete outage.

@zerebubuth

This comment has been minimized.

Copy link
Contributor

zerebubuth commented Aug 12, 2016

Without the TileCDN it wouldn't be possible to serve the amount of load we have. However, each tile cache can only handle a small "hot" fraction of the rendered tiles, so even if we had many more CDN machines, we'd still have load issues on the render machines eventually.

The rendering machines are, currently, completely independent. This is great for redundancy and fail-over, as they are effectively the same. However, it means duplication of tiles stored on disk and tiles rendered. Duplication of tiles on disk is somewhat desirable in the case of fail-over, but duplicating the renders is entirely pointless.

Adding a 3rd server, therefore, is unlikely to reduce load by 1/3rd on the existing servers from rendering. However, a lot of the load comes from serving still-fresh tiles off disk to "back-stop" the CDN, which would be split amongst the servers (sort of evenly).

What would be great, as @pnorman and I were discussing the other day, is a way to "broadcast" rendered tiles in a PUB-SUB fashion amongst the rendering servers so that they can opportunistically fill their own caches with work from other machines. At the moment, it's no more than an idea, but it seems like a feasible change to renderd.

@Zverik

This comment has been minimized.

Copy link

Zverik commented Aug 12, 2016

If I understood munin graphs correctly, one part of the issue is i/o rates: disks fail to return rendered tiles as quickly. Can this be fixed with upgrading disks on rendering servers? As far as I know, there is some money left from the last year's funding drive, and something like this was on the list.

@pnorman

This comment has been minimized.

Copy link
Contributor

pnorman commented Aug 12, 2016

If I understood munin graphs correctly, one part of the issue is i/o rates: disks fail to return rendered tiles as quickly. Can this be fixed with upgrading disks on rendering servers?

Although the tile store disks are near IO capacity the servers are fairly balanced for their load so this wouldn't help a great deal.

@zerebubuth

This comment has been minimized.

Copy link
Contributor

zerebubuth commented Aug 12, 2016

Spending money is one way to alleviate issues - of which there are many, it's not just the disk I/O.

We spent money on this last year, and some of what was left over was rolled into this year's budget. Sadly, there isn't (ever) enough in the budget to get all the things we need, so we have to make hard decisions about which things we need more than others.

Another part of the solution is good engineering, and if anyone is reading this wondering how they can help - please have a look at mod_tile, renderd and the Chef cookbooks, and see if there's improvements to be made. Reducing duplicate renders has already been mentioned, but other issues could include:

  • There are far more tiles than can be feasibly stored on disk (i.e: petabytes), so the tiles on disk are trimmed by deleting the oldest and least recently used. Crawling the disk to figure out which ones those are also consumes quite a lot of disk I/O. Is there something (e.g: secondary index) which could accelerate that process?
  • Large parts of the world have nothing other than sea in them. Is it possible to mask for these areas, so that tiles for them need not be stored?
  • Many in the mapping industry are moving or have moved to a vector tile based system, and there are several open source stacks (full, or partial) out there. What would the cookbooks look like to run those for OSM?
@pnorman

This comment has been minimized.

Copy link
Contributor

pnorman commented Aug 12, 2016

Adding a 3rd server, therefore, is unlikely to reduce load by 1/3rd on the existing servers from rendering. However, a lot of the load comes from serving still-fresh tiles off disk to "back-stop" the CDN, which would be split amongst the servers (sort of evenly).

What would be great, as @pnorman and I were discussing the other day, is a way to "broadcast" rendered tiles in a PUB-SUB fashion amongst the rendering servers so that they can opportunistically fill their own caches with work from other machines. At the moment, it's no more than an idea, but it seems like a feasible change to renderd.

I've made #85 to discuss this in more detail

@GerdP

This comment has been minimized.

Copy link

GerdP commented Aug 13, 2016

Large parts of the world have nothing other than sea in them. Is it possible to mask for these areas, so that tiles for them need not be stored?

In the mkgmap project we calculate a grid to allow fast coastline calculation.
A zip file is produced which contains an index that either says "all land" or "all sea" or it gives
the name of an osm file containing the coastline polygons for that grid element.
See http://wiki.openstreetmap.org/wiki/Mkgmap/help/options#generating_precompiled_sea_yourself
and http://www.mkgmap.org.uk/
I guess this could easily be changed for this use.

@pnorman

This comment has been minimized.

Copy link
Contributor

pnorman commented Aug 13, 2016

Large parts of the world have nothing other than sea in them. Is it possible to mask for these areas, so that tiles for them need not be stored?

Are there many sea tiles in the store? On z13+ they're only rendered if someone requests the area, which won't happen much.

The assumption that sea tiles have nothing in them may change in the future (gravitystorm/openstreetmap-carto#2278).

@zerebubuth

This comment has been minimized.

Copy link
Contributor

zerebubuth commented Aug 15, 2016

Are there many sea tiles in the store? On z13+ they're only rendered if someone requests the area, which won't happen much.

Just on zooms 18 & 19, there are 274,864 (8.3% of 3,295,258) on orm at the moment. Based on the assumption that all the size 7,124 byte metatiles are sea - they could also be blank land, so perhaps the true figure is lower. That's only 0.4% by size, but still 1.8GB of tiles which needed to be rendered, encoded, written to disk, crawled to be expired and purged, etc...

At lower zooms it becomes more likely that at least one tile in the metatile is not water, so presumably the proportion of water all-water metatiles would decrease.

@zerebubuth

This comment has been minimized.

Copy link
Contributor

zerebubuth commented Aug 16, 2016

At lower zooms it becomes more likely that at least one tile in the metatile is not water, so presumably the proportion of water all-water metatiles would decrease.

I wanted to check this, so I looked at z17. Turns out that 3,077,923 meta tiles have size 7,124 bytes out of 8,560,792 total (36%) and they make up 3% of the 641GB total by size.

So there are quite a lot of empty metatiles - although they're probably quite quick to render.

@pnorman

This comment has been minimized.

Copy link
Contributor

pnorman commented Aug 16, 2016

although they're probably quite quick to render

Yes. I see this a lot in development where I have extracts. My theory is that PostgreSQL finds there are very few rows in the first one or two levels of the index, and both those index pages and the row pages will always be in memory cache. There are then very few results to filter, order, pass to mapnik, and render. I haven't bothered to investigate in any detail, since it's a case of being abnormally fast, not slow.

I don't think we'll be able to save anything on the rendering of empty land/water tiles. Detecting them and deciding that there is nothing at lower zoom levels is style-specific and would require fetching a lower zoom MT before rendering a tile.

There might be some gains to handling them better on disk, but I suspect developer time is better spent on improvements elsewhere.

@pnorman

This comment has been minimized.

Copy link
Contributor

pnorman commented Aug 18, 2016

Hey, fellowmember of FastPokeMap here, is it possible to get access to your infrastructure and we would donate $500/month for the server cost? Or is it too much stress on the server and that wouldn't cover the cost?

We aren't running a commercial tile service or set up to do so, the priority for tile.osm.org is for being part of the feedback loop for mappers which is a quite different goal. I also can't come up with a cost figure, but wouldn't be surprised if the costs imposed by the load FastPokeMap were putting OSM earlier were an order of magnitude higher per month.

Switch2osm has a list of commercial providers. As a rough ballpark of your usage, at 15% of 7.5k tiles/second that's about 1k tiles/second, 2500 million/month. Given the 2 million hit/day figure given, this seems about right. This is in the unlisted enterprise pricing range, but I'd ballpark it at 10-30k USD/month cost, depending on what you can negotiate.

@Zverik

This comment has been minimized.

Copy link

Zverik commented Aug 18, 2016

As an alternative, you can use your own servers or a cloud, and hire a company (e.g. Thunderforest) to set up tile rendering. It may be a lot cheaper.

@zerebubuth

This comment has been minimized.

Copy link
Contributor

zerebubuth commented Aug 18, 2016

The switch2osm site has some instructions on how to build a tile server, although I think they might be slightly out of date. If you start using those as a guide, and if a particular piece of the stack is giving you problems then ask around in IRC or on the project-specific Github.

Setting up a tileserver is, sadly, not as easy as we'd like it to be. Good luck! And if you find the instructions aren't helpful, please get in touch so that we can improve them. @pnorman where's the most appropriate place to file tickets against switch2osm?

@pnorman

This comment has been minimized.

Copy link
Contributor

pnorman commented Aug 18, 2016

@pnorman where's the most appropriate place to file tickets against switch2osm?

The eventual plan is to switch to https://github.com/switch2osm/switch2osm.github.io, but there's no ETA on it since i'm the only person working on it. Right now there's no place to file tickets against switch2osm.org, and tickets against the switch2osm.github.io tile rendering content won't be the most useful since the entire set of guides needs rewriting for 16.04 and other advancements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment