Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nominatim slow query #1139

Closed
pedropinheiro75 opened this issue Aug 14, 2018 · 13 comments
Closed

Nominatim slow query #1139

pedropinheiro75 opened this issue Aug 14, 2018 · 13 comments

Comments

@pedropinheiro75
Copy link

pedropinheiro75 commented Aug 14, 2018

I recently imported the nominatim with full planet and queries are undesirable performance. The first query, before the cache is created, takes longer than 20 seconds and sometimes I see this error:

Internal Server Error
Nominatim has encountered an internal error while accessing the database. This may happen because the database is broken or because of a bug in the software. If you think it is a bug, feel free to report it over on Github. Please include the URL that caused the problem and the complete error details below.
Message: Could not lookup place
SQL Error: DB Error: unknown error
Details:
SELECT     osm_type,    osm_id,    class,    type,    admin_level,    rank_search,    rank_address,    min(place_id) AS place_id,    min(parent_place_id) AS parent_place_id,    -1 as housenumber,    country_code,get_address_by_language(place_id,-1,ARRAY['short_name','name','brand','official_name','ref','type']) AS langaddress,    get_name_by_language(name,ARRAY['short_name','name','brand','official_name','ref','type']) AS placename,    get_name_by_language(name, ARRAY['ref']) AS ref,    avg(ST_X(centroid)) AS lon,     avg(ST_Y(centroid)) AS lat,     COALESCE(importance,0.75-(rank_search::float/40)) AS importance, (SELECT max(ai_p.importance * (ai_p.rank_address + 2))   FROM place_addressline ai_s, placex ai_p   WHERE ai_s.place_id = min(CASE WHEN placex.rank_search < 28 THEN placex.place_id ELSE placex.parent_place_id END)     AND ai_p.place_id = ai_s.address_place_id      AND ai_s.isaddress      AND ai_p.importance is not null) AS addressimportance,    (extratags->'place') AS extra_place  FROM placex WHERE place_id in (92672355,196842012,4488269,196732908,48310082,160137,91687842,70711723,83690356,77366185,74053027,92460750,82583385,113120861,177719652,86034442,82755806,76866792,83301006,92789547)    AND (        placex.rank_address between 0 and 30     OR (extratags->'place') = 'city'       )     AND linked_place_id is null  GROUP BY      osm_type,      osm_id,      class,      type,      admin_level,      rank_search,      rank_address,      housenumber,     country_code,      importance,      langaddress,      placename,      ref,      extratags->'place'  [nativecode=ERROR:  canceling statement due to statement timeout
CONTEXT:  PL/pgSQL function get_addressdata(bigint,integer) line 103 at FOR over SELECT rows
PL/pgSQL function get_address_by_language(bigint,integer,text[]) line 12 at FOR over SELECT rows]

Due to the delay of the queries I thought the problem was the lack of some index and then I executed:

nohup ./utils/setup.php --create-search-indices --ignore-errors &

Output:

2018-08-13 20:59:56 == Create Search indices
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
ERROR:  relation "idx_word_word_id" already exists
ERROR:  relation "idx_search_name_nameaddress_vector" already exists
ERROR:  relation "idx_search_name_name_vector" already exists
ERROR:  relation "idx_search_name_centroid" already exists
ERROR:  relation "idx_place_addressline_address_place_id" already exists
ERROR:  relation "idx_placex_rank_address" already exists
ERROR:  relation "idx_placex_pendingsector" already exists
ERROR:  relation "idx_placex_parent_place_id" already exists
ERROR:  relation "idx_placex_reverse_geometry" already exists
ERROR:  relation "idx_location_area_country_place_id" already exists
ERROR:  relation "idx_osmline_parent_place_id" already exists
ERROR:  relation "idx_search_name_country_centroid" already exists
NOTICE:  index "place_id_idx" does not exist, skipping
ERROR:  relation "idx_place_osm_unique" already exists
ERROR:  relation "idx_postcode_id" already exists
ERROR:  relation "idx_postcode_postcode" already exists
Summary of warnings:


2018-08-14 02:23:14 == Setup finished.

What does this line mean? NOTICE: index "place_id_idx" does not exist, skipping

After that, I turn on the postgresql log and called the search api with these parameters:

http://nominatim.dev/nominatim/search.php?format=jsonv2&addressdetails=1&q=cancun

Logs:

2018-08-14 14:30:49 UTC [2489-1] www-data@nominatim LOG:  duration: 16092.775 ms  statement: SELECT     osm_type,    osm_id,    class,    type,    admin_level,    rank_search,    rank_address,    min(place_id) AS place_id,    min(parent_place_id) AS parent_place_id,    -1 as housenumber,    country_code,get_address_by_language(place_id,-1,ARRAY['short_name:en-en','name:en-en','short_name:en','name:en','short_name','name','brand','official_name:en-en','official_name:en','official_name','ref','type']) AS langaddress,    get_name_by_language(name,ARRAY['short_name:en-en','name:en-en','short_name:en','name:en','short_name','name','brand','official_name:en-en','official_name:en','official_name','ref','type']) AS placename,    get_name_by_language(name, ARRAY['ref']) AS ref,    avg(ST_X(centroid)) AS lon,     avg(ST_Y(centroid)) AS lat,     COALESCE(importance,0.75-(rank_search::float/40)) AS importance, (SELECT max(ai_p.importance * (ai_p.rank_address + 2))   FROM place_addressline ai_s, placex ai_p   WHERE ai_s.place_id = min(CASE WHEN placex.rank_search < 28 THEN placex.place_id ELSE placex.parent_place_id END)     AND ai_p.place_id = ai_s.address_place_id      AND ai_s.isaddress      AND ai_p.importance is not null) AS addressimportance,    (extratags->'place') AS extra_place  FROM placex WHERE place_id in (197441826,111603036,182595459,139694930,144830394,118512715,74049435,99932703,159450111,141912308,112119380,80303695,80305679,112647794,80522782,115827503,128389572,149585949,74675442,150225528)    AND (        placex.rank_address between 0 and 30     OR (extratags->'place') = 'city'       )     AND linked_place_id is null  GROUP BY      osm_type,      osm_id,      class,      type,      admin_level,      rank_search,      rank_address,      housenumber,     country_code,      importance,      langaddress,      placename,      ref,      extratags->'place'

2018-08-14 14:31:35 UTC [2490-1] ERROR:  function transliteration(text) does not exist at character 23
2018-08-14 14:31:35 UTC [2490-2] HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
2018-08-14 14:31:35 UTC [2490-3] QUERY:  SELECT gettokenstring(transliteration(name))
2018-08-14 14:31:35 UTC [2490-4] CONTEXT:  PL/pgSQL function public.make_standard_name(text) line 5 at assignment
    automatic analyze of table "nominatim.public.placex"

Looking at the logs I thought some functions might be missing, and execute this:

nohup ./utils/setup.php --create-functions --ignore-errors &

2018-08-14 14:48:04 == Create Functions
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
NOTICE:  drop cascades to function get_addressdata(bigint,integer)
NOTICE:  drop cascades to function get_wikipedia_match(hstore,character varying)
Summary of warnings:


2018-08-14 14:48:04 == Setup finished.

Some info about my installation:

Nominatim version 3.1.0

Ec2 t2.2xlarge with T2 Unlimited | vCPU 8 | 32GB RAM | SSD 870 GB gp2 2610/3000 IOPS

 Postgresql configurations:
- shared_buffers = 2GB
- maintenance_work_mem = 10GB
- work_mem = 50MB
- effective_cache_size = 24GB
- synchronous_commit = off
- checkpoint_timeout = 10min
- checkpoint_completion_target = 0.9
- fsync = on
- full_page_writes = on
- random_page_cost = 1.1
- wal_buffers = 16MB
- max_worker_processes = 8

df -h

Filesystem      Size  Used Avail Use% Mounted on
udev             16G     0   16G   0% /dev
tmpfs           3.2G  9.5M  3.2G   1% /run
/dev/xvda1      844G  674G  170G  80% /
tmpfs            16G  4.0K   16G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/loop0       13M   13M     0 100% /snap/amazon-ssm-agent/295
/dev/loop1       87M   87M     0 100% /snap/core/4917
/dev/loop2       87M   87M     0 100% /snap/core/4830
/dev/loop3       87M   87M     0 100% /snap/core/5145
tmpfs           3.2G     0  3.2G   0% /run/user/1000

free -h 

              total        used        free      shared  buff/cache   available
Mem:            31G        171M         29G        115M        1.5G         30G
Swap:            0B          0B          0B

nominatim=# \d search_name
                Table "public.search_name"
       Column       |          Type           | Modifiers
--------------------+-------------------------+-----------
 place_id           | bigint                  |
 importance         | double precision        |
 search_rank        | smallint                |
 address_rank       | smallint                |
 name_vector        | integer[]               |
 nameaddress_vector | integer[]               |
 country_code       | character varying(2)    |
 centroid           | geometry(Geometry,4326) |
Indexes:
    "idx_search_name_centroid" gist (centroid)
    "idx_search_name_name_vector" gin (name_vector) WITH (fastupdate=off)
    "idx_search_name_nameaddress_vector" gin (nameaddress_vector) WITH (fastupdate=off)
    "idx_search_name_place_id" btree (place_id)

nominatim=# \d placex
                   Table "public.placex"
     Column      |            Type             | Modifiers
-----------------+-----------------------------+-----------
 place_id        | bigint                      | not null
 parent_place_id | bigint                      |
 linked_place_id | bigint                      |
 importance      | double precision            |
 indexed_date    | timestamp without time zone |
 geometry_sector | integer                     |
 rank_address    | smallint                    |
 rank_search     | smallint                    |
 partition       | smallint                    |
 indexed_status  | smallint                    |
 osm_id          | bigint                      | not null
 osm_type        | character(1)                | not null
 class           | text                        | not null
 type            | text                        | not null
 name            | hstore                      |
 admin_level     | smallint                    |
 address         | hstore                      |
 extratags       | hstore                      |
 geometry        | geometry(Geometry,4326)     | not null
 wikipedia       | text                        |
 country_code    | character varying(2)        |
 housenumber     | text                        |
 postcode        | text                        |
 centroid        | geometry(Geometry,4326)     |
Indexes:
    "idx_place_id" UNIQUE, btree (place_id)
    "idx_placex_adminname" btree (make_standard_name(name -> 'name'::text), rank_search) WHERE osm_type = 'N'::bpchar AND rank_search < 26
    "idx_placex_geometry" gist (geometry)
    "idx_placex_linked_place_id" btree (linked_place_id) WHERE linked_place_id IS NOT NULL
    "idx_placex_osmid" btree (osm_type, osm_id)
    "idx_placex_parent_place_id" btree (parent_place_id) WHERE parent_place_id IS NOT NULL
    "idx_placex_pendingsector" btree (rank_search, geometry_sector) WHERE indexed_status > 0
    "idx_placex_rank_address" btree (rank_address)
    "idx_placex_rank_search" btree (rank_search)
    "idx_placex_reverse_geometry" gist (geometry) WHERE rank_search <> 28 AND (name IS NOT NULL OR housenumber IS NOT NULL) AND (class <> ALL (ARRAY['waterway'::text, 'railway'::text, 'tunnel'::text, 'bridge'::text, 'man_made'::text]))
Triggers:
    placex_before_delete AFTER DELETE ON placex FOR EACH ROW EXECUTE PROCEDURE placex_delete()
    placex_before_insert BEFORE INSERT ON placex FOR EACH ROW EXECUTE PROCEDURE placex_insert()
    placex_before_update BEFORE UPDATE ON placex FOR EACH ROW EXECUTE PROCEDURE placex_update()

I did not come up with a solution with no command executed

What else can I do to try to solve this?

Firstly I posted this issue in the OpenStreetMap Help and by @mtmail recommendation I am moving it to github.

@pedropinheiro75
Copy link
Author

pedropinheiro75 commented Aug 14, 2018

@mtmail about Try running test queries with 'EXPLAIN', see a similar discussion at #1023 (comment)_

Here it is:

nominatim=# EXPLAIN SELECT place_id,0::int as exactmatch FROM search_name
nominatim-# WHERE name_vector @> ARRAY[5639492] ORDER BY
nominatim-# (CASE WHEN importance = 0 OR importance IS NULL THEN 0.75001-(search_rank::float/40) ELSE importance END) DESC LIMIT 20;
                                                                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=9809.24..9809.29 rows=20 width=18)
   ->  Sort  (cost=9809.24..9830.51 rows=8507 width=18)
         Sort Key: (CASE WHEN ((importance = '0'::double precision) OR (importance IS NULL)) THEN ('0.75001'::double precision - ((search_rank)::double precision / '40'::double precision)) ELSE importance END) DESC
         ->  Bitmap Heap Scan on search_name  (cost=78.03..9582.87 rows=8507 width=18)
               Recheck Cond: (name_vector @> '{5639492}'::integer[])
               ->  Bitmap Index Scan on idx_search_name_name_vector  (cost=0.00..75.90 rows=8507 width=0)
                     Index Cond: (name_vector @> '{5639492}'::integer[])
(7 rows)

nominatim=# EXPLAIN ANALYSE SELECT place_id,0::int as exactmatch FROM search_name
nominatim-# WHERE name_vector @> ARRAY[5639492] ORDER BY
nominatim-# (CASE WHEN importance = 0 OR importance IS NULL THEN 0.75001-(search_rank::float/40) ELSE importance END) DESC LIMIT 20;
                                                                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=9809.24..9809.29 rows=20 width=18) (actual time=2.235..2.236 rows=1 loops=1)
   ->  Sort  (cost=9809.24..9830.51 rows=8507 width=18) (actual time=2.233..2.234 rows=1 loops=1)
         Sort Key: (CASE WHEN ((importance = '0'::double precision) OR (importance IS NULL)) THEN ('0.75001'::double precision - ((search_rank)::double precision / '40'::double precision)) ELSE importance END) DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on search_name  (cost=78.03..9582.87 rows=8507 width=18) (actual time=2.192..2.193 rows=1 loops=1)
               Recheck Cond: (name_vector @> '{5639492}'::integer[])
               Heap Blocks: exact=1
               ->  Bitmap Index Scan on idx_search_name_name_vector  (cost=0.00..75.90 rows=8507 width=0) (actual time=1.846..1.846 rows=1 loops=1)
                     Index Cond: (name_vector @> '{5639492}'::integer[])
 Planning time: 0.745 ms
 Execution time: 2.282 ms
(11 rows)

@mtmail
Copy link
Collaborator

mtmail commented Aug 15, 2018

OK, those queries are fast. It might be worth trying an EXPLAIN on the query from the first error message SELECT osm_type, osm_id, class, type, admin_level...

Do you have monitoring on disk IO, e.g. via AWS Cloudwatch(?).

Do you remember how long the initial import ran?

I see the GP storage type is network mounted (https://aws.amazon.com/ebs/details/) and the T2 instances are not EBS optimized (https://aws.amazon.com/ec2/instance-types/). I can't tell if that means the disk IO is slow or unsuitable for small burst, I only have experience with the super small T2.micro instances.

@pedropinheiro75
Copy link
Author

Here is the EXPLAIN from the first error:

EXPLAIN SELECT     osm_type,    osm_id,    class,    type,    admin_level,    rank_search,    rank_address,    min(place_id) AS place_id,    min(parent_place_id) AS parent_place_id,    -1 as housenumber,    country_code,get_address_by_language(place_id,-1,ARRAY['short_name','name','brand','official_name','ref','type']) AS langaddress,    get_name_by_language(name,ARRAY['short_name','name','brand','official_name','ref','type']) AS placename,    get_name_by_language(name, ARRAY['ref']) AS ref,    avg(ST_X(centroid)) AS lon,     avg(ST_Y(centroid)) AS lat,     COALESCE(importance,0.75-(rank_search::float/40)) AS importance, (SELECT max(ai_p.importance * (ai_p.rank_address + 2))   FROM place_addressline ai_s, placex ai_p   WHERE ai_s.place_id = min(CASE WHEN placex.rank_search < 28 THEN placex.place_id ELSE placex.parent_place_id END)     AND ai_p.place_id = ai_s.address_place_id      AND ai_s.isaddress      AND ai_p.importance is not null) AS addressimportance,    (extratags->'place') AS extra_place  FROM placex WHERE place_id in (92672355,196842012,4488269,196732908,48310082,160137,91687842,70711723,83690356,77366185,74053027,92460750,82583385,113120861,177719652,86034442,82755806,76866792,83301006,92789547)    AND (        placex.rank_address between 0 and 30     OR (extratags->'place') = 'city'       )     AND linked_place_id is null  GROUP BY      osm_type,      osm_id,      class,      type,      admin_level,      rank_search,      rank_address,      housenumber,     country_code,      importance,      langaddress,      placename,      ref,      extratags->'place';

 HashAggregate  (cost=68.63..105383.77 rows=20 width=180)
   Group Key: placex.osm_type, placex.osm_id, placex.class, placex.type, placex.admin_level, placex.rank_search, placex.rank_address, placex.housenumber, placex.country_code, placex.importance, get_address_by_language(placex.place_id, '-1'::integer, '{short_name,name,brand,official_name,ref,type}'::text[]), get_name_by_language(placex.name, '{short_name,name,brand,official_name,ref,type}'::text[]), get_name_by_language(placex.name, '{ref}'::text[]), (placex.extratags -> 'place'::text)
   ->  Index Scan using idx_place_id on placex  (cost=0.57..67.53 rows=20 width=180)
         Index Cond: (place_id = ANY ('{92672355,196842012,4488269,196732908,48310082,160137,91687842,70711723,83690356,77366185,74053027,92460750,82583385,113120861,177719652,86034442,82755806,76866792,83301006,92789547}'::bigint[]))
         Filter: ((linked_place_id IS NULL) AND (((rank_address >= 0) AND (rank_address <= 30)) OR ((extratags -> 'place'::text) = 'city'::text)))
   SubPlan 1
     ->  Aggregate  (cost=5264.97..5264.98 rows=1 width=10)
           ->  Nested Loop  (cost=1.15..5264.92 rows=5 width=10)
                 ->  Index Scan using idx_place_addressline_place_id on place_addressline ai_s  (cost=0.57..1779.24 rows=1246 width=8)
                       Index Cond: (place_id = min(CASE WHEN (placex.rank_search < 28) THEN placex.place_id ELSE placex.parent_place_id END))
                       Filter: isaddress
                 ->  Index Scan using idx_place_id on placex ai_p  (cost=0.57..2.79 rows=1 width=18)
                       Index Cond: (place_id = ai_s.address_place_id)
                       Filter: (importance IS NOT NULL)
(14 rows)

Yes, I have detailed AWS CloudWatch monitoring.

What I noticed was that even running slow queries like this:

2018-08-15 19:40:39 UTC [1821-1] www-data@nominatim LOG:  duration: 11714.008 ms  statement: SELECT     osm_type,    osm_id,    class,    type,    admin_level,    rank_search,    rank_address,    min(place_id) AS place_id,    min(parent_place_id) AS parent_place_id,    -1 as housenumber,    country_code,get_address_by_language(place_id,-1,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS langaddress,    get_name_by_language(name,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS placename,    get_name_by_language(name, ARRAY['ref']) AS ref,    avg(ST_X(centroid)) AS lon,     avg(ST_Y(centroid)) AS lat,     COALESCE(importance,0.75-(rank_search::float/40)) AS importance, (SELECT max(ai_p.importance * (ai_p.rank_address + 2))   FROM place_addressline ai_s, placex ai_p   WHERE ai_s.place_id = min(CASE WHEN placex.rank_search < 28 THEN placex.place_id ELSE placex.parent_place_id END)     AND ai_p.place_id = ai_s.address_place_id      AND ai_s.isaddress      AND ai_p.importance is not null) AS addressimportance,    (extratags->'place') AS extra_place  FROM placex WHERE place_id in (658333,64429858,6441754,6487545,151535658,63824784,47915977,21898420,112495770,103601664,118742537,104093519,103465203,117810658,114175437,117250188,82721506,95352494,99283832,143236712)    AND (        placex.rank_address between 0 and 30     OR (extratags->'place') = 'city'       )     AND linked_place_id is null  GROUP BY      osm_type,      osm_id,      class,      type,      admin_level,      rank_search,      rank_address,      housenumber,     country_code,      importance,      langaddress,      placename,      ref,      extratags->'place'

the use of CPU and Disk remains almost null. (I waited a few minutes to get this print)

image

My initial setup it took about three days. For that I set up the swap , could this mess up something?

In relation about EBS on T2 instance I have another Nominatim instance with full planet with T2 Instances and everything works fine.

@lonvia
Copy link
Member

lonvia commented Aug 15, 2018

Please run ANALYSE on your database and see if that helps. If it doesn't, enable long query logging in your postgres config (log_min_duration_statement = 1000), reload the postgres service and rerun the slow query. Then report the exact query you tried and the postgres log results.

@mtmail
Copy link
Collaborator

mtmail commented Aug 15, 2018

My initial setup it took about three days

Three days is pretty good. Doubling hardware and faster discs might get you to two days. I was just wondering if took two weeks (we have user reporting that on underpowered hardware).

@pedropinheiro75
Copy link
Author

@lonvia after run psql -d nominatim -c 'ANALYZE VERBOSE'

http://nominatim.dev/nominatim/search.php?format=jsonv2&accept-language=pt-br&addressdetails=1&q=verdan

http://nominatim.dev/nominatim/search.php?format=jsonv2&accept-language=pt-br&addressdetails=1&q=netuno

http://nominatim.dev/nominatim/search.php?format=jsonv2&accept-language=pt-br&addressdetails=1&q=plutão
2018-08-16 13:29:29 UTC [2037-1] LOG:  database system was shut down at 2018-08-16 13:29:20 UTC
2018-08-16 13:29:29 UTC [2037-2] LOG:  MultiXact member wraparound protections are now enabled
2018-08-16 13:29:29 UTC [2041-1] LOG:  autovacuum launcher started
2018-08-16 13:29:29 UTC [2035-3] LOG:  database system is ready to accept connections
2018-08-16 13:29:29 UTC [2043-1] [unknown]@[unknown] LOG:  incomplete startup packet
2018-08-16 13:41:17 UTC [2168-1] www-data@nominatim LOG:  duration: 2185.926 ms  statement: SELECT     osm_type,    osm_id,    class,    type,    admin_level,    rank_search,    rank_address,    min(place_id) AS place_id,    min(parent_place_id) AS parent_place_id,    -1 as housenumber,    country_code,get_address_by_language(place_id,-1,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS langaddress,    get_name_by_language(name,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS placename,    get_name_by_language(name, ARRAY['ref']) AS ref,    avg(ST_X(centroid)) AS lon,     avg(ST_Y(centroid)) AS lat,     COALESCE(importance,0.75-(rank_search::float/40)) AS importance, (SELECT max(ai_p.importance * (ai_p.rank_address + 2))   FROM place_addressline ai_s, placex ai_p   WHERE ai_s.place_id = min(CASE WHEN placex.rank_search < 28 THEN placex.place_id ELSE placex.parent_place_id END)     AND ai_p.place_id = ai_s.address_place_id      AND ai_s.isaddress      AND ai_p.importance is not null) AS addressimportance,    (extratags->'place') AS extra_place  FROM placex WHERE place_id in (45050252,22415400,22203205,19410621,4163997)    AND (        placex.rank_address between 0 and 30     OR (extratags->'place') = 'city'       )     AND linked_place_id is null  GROUP BY      osm_type,      osm_id,      class,      type,      admin_level,      rank_search,      rank_address,      housenumber,     country_code,      importance,      langaddress,      placename,      ref,      extratags->'place'
2018-08-16 13:41:54 UTC [2171-1] www-data@nominatim LOG:  duration: 2715.655 ms  statement: SELECT     osm_type,    osm_id,    class,    type,    admin_level,    rank_search,    rank_address,    min(place_id) AS place_id,    min(parent_place_id) AS parent_place_id,    -1 as housenumber,    country_code,get_address_by_language(place_id,-1,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS langaddress,    get_name_by_language(name,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS placename,    get_name_by_language(name, ARRAY['ref']) AS ref,    avg(ST_X(centroid)) AS lon,     avg(ST_Y(centroid)) AS lat,     COALESCE(importance,0.75-(rank_search::float/40)) AS importance, (SELECT max(ai_p.importance * (ai_p.rank_address + 2))   FROM place_addressline ai_s, placex ai_p   WHERE ai_s.place_id = min(CASE WHEN placex.rank_search < 28 THEN placex.place_id ELSE placex.parent_place_id END)     AND ai_p.place_id = ai_s.address_place_id      AND ai_s.isaddress      AND ai_p.importance is not null) AS addressimportance,    (extratags->'place') AS extra_place  FROM placex WHERE place_id in (134143511,53735345,197242096,43840262,14573170)    AND (        placex.rank_address between 0 and 30     OR (extratags->'place') = 'city'       )     AND linked_place_id is null  GROUP BY      osm_type,      osm_id,      class,      type,      admin_level,      rank_search,      rank_address,      housenumber,     country_code,      importance,      langaddress,      placename,      ref,      extratags->'place'
2018-08-16 13:42:12 UTC [2172-1] www-data@nominatim LOG:  duration: 11558.166 ms  statement: SELECT     osm_type,    osm_id,    class,    type,    admin_level,    rank_search,    rank_address,    min(place_id) AS place_id,    min(parent_place_id) AS parent_place_id,    -1 as housenumber,    country_code,get_address_by_language(place_id,-1,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS langaddress,    get_name_by_language(name,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS placename,    get_name_by_language(name, ARRAY['ref']) AS ref,    avg(ST_X(centroid)) AS lon,     avg(ST_Y(centroid)) AS lat,     COALESCE(importance,0.75-(rank_search::float/40)) AS importance, (SELECT max(ai_p.importance * (ai_p.rank_address + 2))   FROM place_addressline ai_s, placex ai_p   WHERE ai_s.place_id = min(CASE WHEN placex.rank_search < 28 THEN placex.place_id ELSE placex.parent_place_id END)     AND ai_p.place_id = ai_s.address_place_id      AND ai_s.isaddress      AND ai_p.importance is not null) AS addressimportance,    (extratags->'place') AS extra_place  FROM placex WHERE place_id in (89121656,100474154,129955308,110139220,178136713,127603481,127603054,128135891,128135499,118073268,103931881,166244432,164342431,87947148,97083575,114067315,172273665,177880016,72092325,93795339)    AND (        placex.rank_address between 0 and 30     OR (extratags->'place') = 'city'       )     AND linked_place_id is null  GROUP BY      osm_type,      osm_id,      class,      type,      admin_level,      rank_search,      rank_address,      housenumber,     country_code,      importance,      langaddress,      placename,      ref,      extratags->'place'

@lonvia
Copy link
Member

lonvia commented Aug 16, 2018

Wait, are you saying that the slow queries only happen after the database was restarted and things settle down after a while? That would be perfectly normal. Postgresql has to fetch many gigabytes of index data before it can serve the queries efficiently.

@pedropinheiro75
Copy link
Author

pedropinheiro75 commented Aug 16, 2018

I understand. Extremely grateful for the help.

There's something else, can you tell me if it's normal?
If I run the same query several times the caches is created and everything works incredibly fast. If I create an AMI and lauch a new machine based on it, the same query in the first call I always see:

Internal Server Error

Nominatim has encountered an internal error while accessing the database. This may happen because the database is broken or because of a bug in the software. If you think it is a bug, feel free to report it over on Github. Please include the URL that caused the problem and the complete error details below.

Message: Could not lookup place

SQL Error: DB Error: unknown error

Details:

SELECT     osm_type,    osm_id,    class,    type,    admin_level,    rank_search,    rank_address,    min(place_id) AS place_id,    min(parent_place_id) AS parent_place_id,    -1 as housenumber,    country_code,get_address_by_language(place_id,-1,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS langaddress,    get_name_by_language(name,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS placename,    get_name_by_language(name, ARRAY['ref']) AS ref,    avg(ST_X(centroid)) AS lon,     avg(ST_Y(centroid)) AS lat,     COALESCE(importance,0.75-(rank_search::float/40)) AS importance, (SELECT max(ai_p.importance * (ai_p.rank_address + 2))   FROM place_addressline ai_s, placex ai_p   WHERE ai_s.place_id = min(CASE WHEN placex.rank_search < 28 THEN placex.place_id ELSE placex.parent_place_id END)     AND ai_p.place_id = ai_s.address_place_id      AND ai_s.isaddress      AND ai_p.importance is not null) AS addressimportance,    (extratags->'place') AS extra_place  FROM placex WHERE place_id in (89121656,100474154,129955308,110139220,178136713,127603481,127603054,128135891,128135499,118073268,103931881,166244432,164342431,87947148,97083575,114067315,172273665,177880016,72092325,93795339)    AND (        placex.rank_address between 0 and 30     OR (extratags->'place') = 'city'       )     AND linked_place_id is null  GROUP BY      osm_type,      osm_id,      class,      type,      admin_level,      rank_search,      rank_address,      housenumber,     country_code,      importance,      langaddress,      placename,      ref,      extratags->'place'  [nativecode=ERROR:  canceling statement due to statement timeout
CONTEXT:  PL/pgSQL function get_addressdata(bigint,integer) line 103 at FOR over SELECT rows
PL/pgSQL function get_address_by_language(bigint,integer,text[]) line 12 at FOR over SELECT rows]

and after a few calls I get status code 200 normally.

Postgresql Log:

2018-08-16 14:32:53 UTC [1541-1] LOG:  database system was shut down at 2018-08-16 14:32:52 UTC
2018-08-16 14:32:53 UTC [1541-2] LOG:  MultiXact member wraparound protections are now enabled
2018-08-16 14:32:53 UTC [1539-3] LOG:  database system is ready to accept connections
2018-08-16 14:32:53 UTC [1545-1] LOG:  autovacuum launcher started
2018-08-16 14:32:54 UTC [1547-1] [unknown]@[unknown] LOG:  incomplete startup packet
Interrupt requested
Interrupt requested
2018-08-16 14:33:56 UTC [1561-1] www-data@nominatim ERROR:  canceling statement due to statement timeout
2018-08-16 14:33:56 UTC [1561-2] www-data@nominatim CONTEXT:  PL/pgSQL function get_addressdata(bigint,integer) line 103 at FOR over SELECT rows
        PL/pgSQL function get_address_by_language(bigint,integer,text[]) line 12 at FOR over SELECT rows
2018-08-16 14:33:56 UTC [1561-3] www-data@nominatim STATEMENT:  SELECT     osm_type,    osm_id,    class,    type,    admin_level,    rank_search,    rank_address,    min(place_id) AS place_id,    min(parent_place_id) AS parent_place_id,    -1 as housenumber,    country_code,get_address_by_language(place_id,-1,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS langaddress,    get_name_by_language(name,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS placename,    get_name_by_language(name, ARRAY['ref']) AS ref,    avg(ST_X(centroid)) AS lon,     avg(ST_Y(centroid)) AS lat,     COALESCE(importance,0.75-(rank_search::float/40)) AS importance, (SELECT max(ai_p.importance * (ai_p.rank_address + 2))   FROM place_addressline ai_s, placex ai_p   WHERE ai_s.place_id = min(CASE WHEN placex.rank_search < 28 THEN placex.place_id ELSE placex.parent_place_id END)     AND ai_p.place_id = ai_s.address_place_id      AND ai_s.isaddress      AND ai_p.importance is not null) AS addressimportance,    (extratags->'place') AS extra_place  FROM placex WHERE place_id in (89121656,100474154,129955308,110139220,178136713,127603481,127603054,128135891,128135499,118073268,103931881,166244432,164342431,87947148,97083575,114067315,172273665,177880016,72092325,93795339)    AND (        placex.rank_address between 0 and 30     OR (extratags->'place') = 'city'       )     AND linked_place_id is null  GROUP BY      osm_type,      osm_id,      class,      type,      admin_level,      rank_search,      rank_address,      housenumber,     country_code,      importance,      langaddress,      placename,      ref,      extratags->'place'
Interrupt requested
Interrupt requested
2018-08-16 14:36:45 UTC [1574-1] www-data@nominatim ERROR:  canceling statement due to statement timeout
2018-08-16 14:36:45 UTC [1574-2] www-data@nominatim CONTEXT:  PL/pgSQL function get_addressdata(bigint,integer) line 103 at FOR over SELECT rows
        PL/pgSQL function get_address_by_language(bigint,integer,text[]) line 12 at FOR over SELECT rows
2018-08-16 14:36:45 UTC [1574-3] www-data@nominatim STATEMENT:  SELECT     osm_type,    osm_id,    class,    type,    admin_level,    rank_search,    rank_address,    min(place_id) AS place_id,    min(parent_place_id) AS parent_place_id,    -1 as housenumber,    country_code,get_address_by_language(place_id,-1,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS langaddress,    get_name_by_language(name,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS placename,    get_name_by_language(name, ARRAY['ref']) AS ref,    avg(ST_X(centroid)) AS lon,     avg(ST_Y(centroid)) AS lat,     COALESCE(importance,0.75-(rank_search::float/40)) AS importance, (SELECT max(ai_p.importance * (ai_p.rank_address + 2))   FROM place_addressline ai_s, placex ai_p   WHERE ai_s.place_id = min(CASE WHEN placex.rank_search < 28 THEN placex.place_id ELSE placex.parent_place_id END)     AND ai_p.place_id = ai_s.address_place_id      AND ai_s.isaddress      AND ai_p.importance is not null) AS addressimportance,    (extratags->'place') AS extra_place  FROM placex WHERE place_id in (89121656,100474154,129955308,110139220,178136713,127603481,127603054,128135891,128135499,118073268,103931881,166244432,164342431,87947148,97083575,114067315,172273665,177880016,72092325,93795339)    AND (        placex.rank_address between 0 and 30     OR (extratags->'place') = 'city'       )     AND linked_place_id is null  GROUP BY      osm_type,      osm_id,      class,      type,      admin_level,      rank_search,      rank_address,      housenumber,     country_code,      importance,      langaddress,      placename,      ref,      extratags->'place'
Interrupt requested
Interrupt requested
2018-08-16 14:37:19 UTC [1576-1] www-data@nominatim ERROR:  canceling statement due to statement timeout
2018-08-16 14:37:19 UTC [1576-2] www-data@nominatim CONTEXT:  PL/pgSQL function get_addressdata(bigint,integer) line 103 at FOR over SELECT rows
        PL/pgSQL function get_address_by_language(bigint,integer,text[]) line 12 at FOR over SELECT rows
2018-08-16 14:37:19 UTC [1576-3] www-data@nominatim STATEMENT:  SELECT     osm_type,    osm_id,    class,    type,    admin_level,    rank_search,    rank_address,    min(place_id) AS place_id,    min(parent_place_id) AS parent_place_id,    -1 as housenumber,    country_code,get_address_by_language(place_id,-1,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS langaddress,    get_name_by_language(name,ARRAY['short_name:pt-br','name:pt-br','short_name:pt','name:pt','short_name','name','brand','official_name:pt-br','official_name:pt','official_name','ref','type']) AS placename,    get_name_by_language(name, ARRAY['ref']) AS ref,    avg(ST_X(centroid)) AS lon,     avg(ST_Y(centroid)) AS lat,     COALESCE(importance,0.75-(rank_search::float/40)) AS importance, (SELECT max(ai_p.importance * (ai_p.rank_address + 2))   FROM place_addressline ai_s, placex ai_p   WHERE ai_s.place_id = min(CASE WHEN placex.rank_search < 28 THEN placex.place_id ELSE placex.parent_place_id END)     AND ai_p.place_id = ai_s.address_place_id      AND ai_s.isaddress      AND ai_p.importance is not null) AS addressimportance,    (extratags->'place') AS extra_place  FROM placex WHERE place_id in (89121656,100474154,129955308,110139220,178136713,127603481,127603054,128135891,128135499,118073268,103931881,166244432,164342431,87947148,97083575,114067315,172273665,177880016,72092325,93795339)    AND (        placex.rank_address between 0 and 30     OR (extratags->'place') = 'city'       )     AND linked_place_id is null  GROUP BY      osm_type,      osm_id,      class,      type,      admin_level,      rank_search,      rank_address,      housenumber,     country_code,      importance,      langaddress,      placename,      ref,      extratags->'place'

Can AMI be messing something up?

@lonvia
Copy link
Member

lonvia commented Aug 18, 2018

I think you have a wrong idea what 'caching' means in this context. It means that Postgresql fetches data from disk into RAM. Once the most frequently used data is there, it can answer many queries from the data that is cached in RAM and thus is much faster than before. When Postgresql is restarted, it forgets about all the data it has in RAM and needs to start fresh. If you create an AMI and spin it up on another machine, that is the same as restarting Postgresql. So yes, it is normal, that here too the first queries are slow.

@pedropinheiro75
Copy link
Author

pedropinheiro75 commented Aug 19, 2018

I always worked with Nominatim with an incredible performance, I did not know that the creation of the cache could take so long.
I wanted to use the AMI for fallback in case something happens in my instance within an autoscaling.
I will adapt a process to load the cache.
Thank you.

@louispotok
Copy link

@gnosis75 I'm trying to do almost exactly the same thing as you - were you able to figure out a good way to warm up the cache after creating an instance from the AMI?

@Emil-Zhou
Copy link

@gnosis75 +1

@mtmail
Copy link
Collaborator

mtmail commented May 6, 2019

@Emil-Zhou utils/warm.php was written for this purpose.

(Comments on year-long closed issues are easy to miss. There's a geocoding mailing list https://wiki.openstreetmap.org/wiki/Mailing_lists for more general question.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants