Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gray tiles #366

Closed
nrenner opened this issue Feb 21, 2020 · 8 comments
Closed

Gray tiles #366

nrenner opened this issue Feb 21, 2020 · 8 comments
Labels
service:tiles The raster map on tile.openstreetmap.org

Comments

@nrenner
Copy link

nrenner commented Feb 21, 2020

[Previous discussion in openstreetmap/chef#264, opening a new issue as it doesn't seem clear if the style release really is the true source of the problem. Please let me know if this should be reported somewhere else.]

People still are reporting slow and patchy tile delivery for the standard style on openstreetmap.org (tile.openstreetmap.org) this week.

Screenshot
Screenshot after one minute of loading, from today by kreuzschnabel (map)

Other reports this week:

@tomhughes tomhughes transferred this issue from openstreetmap/chef Feb 21, 2020
@tomhughes
Copy link
Member

Yes demand is in excess of supply, squid is shit and service is collapsing under the load.

Sadly there is very little I can do to help. I can sometime fix an individual case but normally it just moves the problem.

@nrenner
Copy link
Author

nrenner commented Feb 21, 2020

I can reproduce when zooming in somewhere rural (or to z19, or in the ocean) where tiles have not been (re-)rendered yet and panning around a bit.

Don't know if you are already aware or if this helps:

In Chrome, in the network tab of the developer tools (F12) I added the custom response headers x-cache + x-tilerender (see Chrome DevTools Reference), so it shows for all tile requests what cache and tile servers are used:

Screenshot Chrome network tab with custom headers
https://www.openstreetmap.org/#map=13/34.4091/-40.3620

I observe the following pattern when panning around (my IP located in Germany):

  1. for the first few request batches tiles are fast from a German cache (kalessin, katie, keizer, konqi) and odin
  2. in the next batch(es) some of the requests are slower (couple of seconds up to one minute), still from German servers
  3. but then tiles return 404 or take >= one minute, mostly from all kinds of cache and rendering servers

If I directly connect to odin.openstreetmap.org in a custom Leaflet map, everything is fine and most requests are <100ms.

@nrenner
Copy link
Author

nrenner commented Feb 21, 2020

Would it be an option to remove the flag/timestamp from the carto release that marks all tiles as dirty/outdated and just rerender from now on when data changed?

@tomhughes
Copy link
Member

Sounds like you're just hitting the deliberate rate limits if it starts fast and then gets slow after you have reached the bucket limit.

@pnorman pnorman added the service:tiles The raster map on tile.openstreetmap.org label Feb 23, 2020
@Prince-Kassad
Copy link

I just got served a 1 week old tile on z18, and that suggests to me something is really wrong. Tile rendering has never been behind on the order of weeks before the update. Something is going very wrong there.

@nrenner
Copy link
Author

nrenner commented Feb 28, 2020

I wondered where those patchy gray tiles come from, so I made some more tests this week. I would expect that for dropped metatiles all corresponding tiles within that rectangle would be missing, but instead gray tiles appear randomly between loaded tiles.

My test case is a single map call with 12 tiles at zoom 19 somewhere in the ocean. Called at various times this week, each time within a different single metatile, to ensure new rendering. To avoid throttling, I sometimes changed my IP, avoided other map uses and do no panning or zooming, just copying the URL hash from another map.

My observations:

  • gray tiles are 404 (not found) responses from other caches than the German ones
    • unfortunately 404 responses have no x-tilerender header, so hard to tell what render server they come from, but I assume from one of the three still dropping tiles (scorch, rhaegal and pyrene, see Renderd throughput and mod_tile HTTP response codes stats)
  • the number of requests forwarded to other caches (and thus 404s) increases as servers are busier, with a peak in the afternoon, probably corresponding to e.g. renderd throughput (odin)
  • everything is as expected at late evening or night (German caches and odin)

An extreme example from yesterday at 16:28 (UTC) is this screenshot with four 404s, ten caches and successful requests to five (!) different render servers:

Screenshot 2020-02-27 17-33-26 CET

So for a single map call with 12 tiles, the same metatile was rendered five times instead of just once (see status request to odin, ysera, pyrene, scorch, bowser).

The 404 caches are all forwarding to rhaegal (according to Squid relay stats and nslookup: angor, sarkany, drogon, viserion).

Now I wonder:
Why are individual requests within a single map call distributed to random caches?
Why is there a failover/load balancing to other caches at all, as I haven't seen any 404s from odin?

@fracgiu
Copy link

fracgiu commented Mar 4, 2020

I have the same problem I guess and I have some information more. In my case I go on Internet via Proxy, that requires authentication.

Without entering the details, when I open the HTML (before the map is loaded), sometime the page ask me for the proxy credentials, sometimes not (I don't know what's the logic but it's about cache). When the page ask for proxy credentials and I insert them everything works fine otherwise I have those errors (Open Layer Engine seems to be "offline", piece of the map are shown sometime depending on what there is in the cache I guess).

But also, even when is "offline" (and not working) sometime (but rarely) doing "Zoom In" and "Zoom out" the browser (not always) ask me for proxy credentials and the Open Layer Engine come "back online" and start to work again.

Now...as workaround maybe we can think about do a "request on the Internet" before loading everything or I don't know "skip the cache" in some way (I tried with the meta tag but nothing). Any idea?

Hope also this can help you.

@pnorman
Copy link
Collaborator

pnorman commented Dec 18, 2020

We have different render servers and are moving to a commercial CDN, so any capacity issues at the time this issue was opened are likely to be different now.

@pnorman pnorman closed this as completed Dec 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service:tiles The raster map on tile.openstreetmap.org
Projects
None yet
Development

No branches or pull requests

5 participants