New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stability issues on nominatim service #337
Comments
|
Current working theory is that something hits the server so badly that the kernel does not manage to handle all the incoming requests fast enough, Then clients resent their requests actually making the problem worse and also making it look to dulcy like their are SYN flooding. So we end up with thousands of connections in SYN_SENT. The problems occur only between 6am and 6pm CET. This might just coincide with the load on the machine, it might also be that the culprit is only active then. Still can't make out any patterns in the requests. At this point I fear that we need a frontend server or CDN to handle the load and keep off all blocked traffic (http requests and permanently blocked requests). At the very least a fronting server would ensure that openstreetmap.org is not affected by these issues. Another measure I'd like to propose is to stop sending 403/429 to blocked IPs and send a 200 instead with a block message in the display name. 403/429 just seem to encourage clients to resent their request on the spot even increasing the load on the server. A regular 200 answer would need to be processed by them first which gives us at least a tiny amount of breathing space. |
|
Thanks for working on that |
|
It's my understanding the opentreetmap.org servers struggle at high traffic lately and it's not clear yet if any configuration change (network, webserver, Nominatim server) can fix that or if "simply" more hardware is needed. If a production system relies on the public service then it's better to install your own (http://nominatim.org/release-docs/latest/admin/Installation/ global or regional extract) or use a third-party provider (many have free tiers or trials if the number of requests per day or month is low). Last section on https://operations.osmfoundation.org/policies/nominatim/ links to a couple. |
|
I've just read through openstreetmap/openstreetmap-website lib/osm.rb, where the http_client for polling the different APIs like nominatim is called. It appears to me that the start page is not reusing the connections established to nominatim. |
|
The lookups are cached though so there will likely only be a very minimal number of queries from the main website, certainly as a proportion of the total load. |
|
Traffic from openstreetmap.org is not the issue. It makes up less than 0.01% of the requests we get (not including ID which is a less benign client, although I think this is being fixed). The current problem is very likely a rouge external scripted client. It's active from 6-6 CET only and comes from a country where 1st Nov is a holiday. |
|
What's the IP address ? Can you not just blacklist it ? (guess it's not so simple !) |
|
I don't think the address has been identified or we would have done... |
|
And 1st Nov is a public holiday in a lot of countries. |
|
Hope that the culprit can be stopped. Don't people read the usage policy. Glad that someone is looking at the issue. Keep up the good work. |
|
@Firefishy found the magic setting to mitigate the issue. dulcy now seems to be able to cope with the heavy network traffic. |
Dulcy has serious networking issues since 21th Oct. There are a lot of connections hanging in SYN_SENT or ESTABLISHED state according to https://munin.openstreetmap.org/openstreetmap.org/dulcy.openstreetmap.org/fw_conntrack.html. Reboot hasn't helped and I can't find any pattern in the IPs concerned. Might be a rouge app. We might simply be reaching capacity. There might be another issue.
This is now starting to impact normal operations.
The text was updated successfully, but these errors were encountered: