Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of separate place_classtype_* tables #1652

Open
lonvia opened this issue Jan 29, 2020 · 7 comments
Open

Get rid of separate place_classtype_* tables #1652

lonvia opened this issue Jan 29, 2020 · 7 comments

Comments

@lonvia
Copy link
Member

lonvia commented Jan 29, 2020

The place_classtype_* tables are there so that places can be searched by their type, for example restaurants in Hickeroy. They are extra tables because it is not possible to have a combined index of columns where one column is a geometry. So we copy out the place_id and geometry of places of a certain class/type into a separate table and create an index over the geometry there.

We should replace this mechanism with simple partial indexes. Todo:

  • change script utils/specialphrases.php. For each class/type create and index of the form CREATE INDEX class_type_??? ON placex USING gist(centroid) WHERE class = ?? and type = ??.
  • Find queries that use place_classtype_* and replace those with a query on placex. (Check for the existence of a index class_type_?? to find out if the combination of class and type is supported.)
  • Remove updates of place_classtype_*
  • Run performance tests to find out if the performance for queries is equal to the solution with separate tables.
@mark-lester
Copy link

mark-lester commented Jun 6, 2020

its 47,000+ lines, I suggest as part of this job

  1. split into language specific files country/[country-code].csv
  2. de normalise into amenity, type, [phrases...], i.e one line per amenity/type per file
  3. use csv format for accessibility

I am looking at the wrong file. I guess we keep the postgress SQL as is, if I feel we need to do the above I can make a script to gen the sql from country specific input files

@lonvia
Copy link
Member Author

lonvia commented Jun 6, 2020

its 47,000+ lines

Do you mean the file emitted by ./utils/specialphrases.php --wiki-import? That is not meant to be human readable, it's meant to be readable by Postgresql. So CSV would be rather counterproductive. The largest part is about creating search terms (the select getorcreate_amenity queries). This should no be touched by this issue. The issue is only about the last part where the tables are created. That needs to be replaced by creating the indexes on placex.

@mark-lester
Copy link

yes sorry. i am just feeling my way. are those phrases generated form somewhere else, I guess they must be, its a database worth. I am in the right file, but i wont do anything till i have touched base multiple times

@lonvia
Copy link
Member Author

lonvia commented Jun 6, 2020

./utils/specialphrases.php downloads them from the OSM wiki.

@mark-lester
Copy link

mark-lester commented Jun 6, 2020

yes sorry, i realized that as soon as i said it, its the input/output to the program i am supposed to be fixing, this wont be the last stupid thing i say, please bear with me

@lonvia
Copy link
Member Author

lonvia commented Sep 29, 2020

So the catch with this approach is actually the index creation itself. So far we would create a helper index over placex(class, type) which would make the creation of the place_classtype tables reasonably fast. When creating the indexes on placex directly, the helper index is not used. We end up with 350 sequential scans over placex. That's an order of magnitude slower.

@lonvia lonvia removed the help wanted label Oct 8, 2020
@lonvia
Copy link
Member Author

lonvia commented Oct 8, 2020

I've played with this in https://github.com/lonvia/Nominatim/tree/remove-special-classtype-tables-II. The most promising approach to replace the classtype tables is a unified index over the centroid column that includes the type in the index. Unfortunately we have gist indexes with included data only from Postgresql 12. So I'm putting this on the back burner for a year or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants