Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - State value returned on query #102

Closed
daryltucker opened this issue Aug 18, 2014 · 53 comments
Closed

Feature Request - State value returned on query #102

daryltucker opened this issue Aug 18, 2014 · 53 comments
Milestone

Comments

@daryltucker
Copy link

I would like to see more of OSM's 'address' information returned upon querying photon. I specifically need the state field.

Here are two examples that would satisfy my needs:

{
    "features": [
        {
            "properties": {
                "osm_key": "boundary",
                "name": "San Francisco",
                "osm_id": 111968,
                "osm_value": "administrative",
                "state": "California"
                "country": "United States of America"
            },
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -122.4192704,
                    37.7792768
                ]
            }
        },
    ]
}



{
    "features": [
        {
            "properties": {
                "osm_key": "boundary",
                "name": "San Francisco",
                "osm_id": 111968,
                "osm_value": "administrative",
                "country": "United States of America"
            },
            address": {
                "city": "San Francisco",
                "county": "San Francisco City and County",
                "state": "California",
                "country": "United States of America",
                "country_code": "us"
            }
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -122.4192704,
                    37.7792768
                ]
            }
        },
    ]
}
@daryltucker
Copy link
Author

I attempted to make the adjustments by looking at this commit as an example. After making modifications, the state field was not seen in the reponse.

Looking deeper, and finding NominatimConnector.java, I now feel as though the state isn't imported into photon.

I've created a fork, if you're interested in taking a look. Java is definitely not my strong suit, so I'll hold off on the pull request.

@christophlingg
Copy link
Member

I see that state attribute is necessary for the US, I'll have a look on it.

@daryltucker
Copy link
Author

Awww, I was hoping this would get the 1.1 Milestone tag :P

Either way, I'm glad to see you working on photon. It's been very helpful :D

@christophlingg
Copy link
Member

@lonvia: we are using nominatim's get_addressdata to annotate a placex entry with additional information (like city, street, postcode). Could you give us a hint on how to extract the state information? see also AddressRow model

@christophlingg
Copy link
Member

@lonvia
Copy link
Collaborator

lonvia commented Oct 7, 2014

@christophlingg the patch solves half of the problem. Like is_city(), it should also check for boundaries with the correct admin_level (it's 4 for state IIRC).

@christophlingg
Copy link
Member

thanks for your help! yes it should be 4. the wiki page says 8 but I remember there was a special nominatim mapping that doubled the values.

@lonvia
Copy link
Collaborator

lonvia commented Oct 7, 2014

Jup, rank_search and rank_address double admin_levels. admin_level contains the original values as it was mapped.

@christophlingg
Copy link
Member

he @daryltucker , i created a new branch and committed all necessary changes. Do you have time for running a test import and verify that results are as expected?

@daryltucker
Copy link
Author

@christophlingg Exciting news! I'll be able to run some tests in the morning.

Thanks again!

@christophlingg
Copy link
Member

I checked the state feature with my local dataset – only iceland though. It was working as expected, however we should make a world-wide test to see if we the state property is set right in all countries.

Photon has a json export feature: we can export nominatim data to a json file which can then be used to import it on other machines. I do not have a world-wide nominatim setup, but I could import this json data on photon.komot.de (on a different port) and we can evaluate the data together. @daryltucker do you have nominatim running with world-wide data, otherwise maybe @lonvia could help us out - once more. I am also curious how her changes improved the performance.

I uploaded a special jar for this branch: http://photon.komoot.de/data/photon-0.2-SNAPSHOT.jar

json export works like this:

java -Xmx4g -jar photon-0.2-SNAPSHOT.jar -json /somewhere/photon_dump.json -host localhost -port 5432 -database nominatim_island -user christoph -password xxx

@lonvia
Copy link
Collaborator

lonvia commented Oct 24, 2014

@christophlingg I can do another export. In fact, I've already one lying around done with the performance improvements (just still have a logistics problem when it comes to 5GB files). Export now takes somewhat around a day, much better than before. I'll redo the export with the states and see to it that I put it somewhere accessible next week.

@christophlingg
Copy link
Member

great! I can get you an account on photon.komoot.de, so you could transfer it via scp. If that helps?

@lonvia
Copy link
Collaborator

lonvia commented Oct 24, 2014

Doesn't help much, I'm afraid. The problem is on the other side. It's called Telekom.

@daryltucker
Copy link
Author

@christophlingg My instance is Global OSM, yes. Plans on testing over the weekend.

@lonvia
Copy link
Collaborator

lonvia commented Oct 25, 2014

@daryltucker I'll leave the creation of the dump to you then.

@daryltucker
Copy link
Author

[main] INFO de.komoot.photon.importer.nominatim.NominatimConnector - start importing documents from nominatim ...

Thanks for providing the special snapshot. I'll provide the json file when it's complete. I'll send you a key for scp (via email).

@daryltucker
Copy link
Author

After an hour, I have no additional logging shown on my terminal and the .json file is 0 bytes.

Should I keep going? It is slowing down the server tremendously.

@lonvia
Copy link
Collaborator

lonvia commented Oct 30, 2014

It starts with sequentially scanning placex, which takes some time and server resources. It should be finished soon with that and start to show some progress.

@daryltucker
Copy link
Author

[main] INFO de.komoot.photon.importer.nominatim.NominatimConnector - imported 40,470,000 documents [522.8/second]

Good news :D

@christophlingg christophlingg modified the milestones: 0.2, 0.1.2 Oct 31, 2014
@daryltucker
Copy link
Author

Compressing and sending your way very shortly.

@christophlingg
Copy link
Member

👍

@daryltucker
Copy link
Author

Any word/updates?

@christophlingg
Copy link
Member

I checked if for places in Austria, Germany and USA and results were as expected.

@amnesia7
Copy link
Contributor

@christophlingg I assume this is different to the change that will add the county to the results for UK searches for #120?
It also seems odd that http://photon.komoot.de:2333/api?q=trafalgar%20square%20london doesn't actually mention London anywhere in the main result whereas http://photon.komoot.de/api?q=trafalgar%20square%20london at least mentions Greater London...is that right?

@daryltucker
Copy link
Author

Woohoo! I am super-excited to get to looking at this new feature!

@christophlingg
Copy link
Member

I checked various countries and the state attribute was always set right. As far as this is concerned I would not hesitate to make a release with this new feature.

However, I investigated what @amnesia7 was reporting, I rewrote some parts of the importer and I unfortunately a bug was introduced. Sometimes Nominatim marks several relations being a city for a place. In the case of trafalgar square it is Westminster and Greater London. Now Westminster is mistakenly assigned as city of all places in London. This is a data bug we don't want to have in the next release.

I don't really how this wrong assignment could have happened, I rewrote the code slightly but it has not changed: before -> after

Maybe @lonvia can help us here? I knew we talked about this problem when we implemented the code in Berlin. Maybe due to the speed improvements the ordering of the address lines has changed and that's why the last address row is not the right city any more?

It's a pity that we encounter this bug short before getting finished ;-( Probably we need to fix the bug and export the json just another time.

@lonvia
Copy link
Collaborator

lonvia commented Dec 23, 2014

@christophlingg We should be able to fix all that by looking into extratags->place and if it says city or state prefer that over the one we determine from the address_rank. Apparently, I had forgotten that when changing the importer. I can have a go at implementing it in theory but don't have the means to test it at the moment. If I make a pull request, can you take over and check if it fixes all the issues?

@christophlingg
Copy link
Member

Would be great if you could commit a pfix for that. You know the nominatim internals best.

I don't have any testing data at the moment, but we will find ways to test it.

I am still wondering why it worked previously. We did not consider extratags before so maybe it worked only by luck...

@lonvia
Copy link
Collaborator

lonvia commented Dec 24, 2014

Previously we used the internal address computation which already does take the place tag into account.

@christophlingg
Copy link
Member

I created a new snapshot version of photon that has lonvia's changes: http://photon.komoot.de/data/photon-0.2-SNAPSHOT.jar

For a new evaluation round (and a release) I need the help of someone with a global nominatim database. @lonvia , @daryltucker or @FelixWittmann do you have time and resources for a json export?

java -jar photon-0.2-SNAPSHOT.jar -json /home/xxx/photon.json -host localhost -port 5432 -database nominatim -user nominatim -password …

@daryltucker
Copy link
Author

Heck yeah! I'll get to testing this on Friday.

Are you interested in the JSON export this time, too?

@christophlingg
Copy link
Member

This would be great! That's the missing piece to get this feature released. And I don't have a global nominatim database myself.

Happy new year btw, here in Europe we are very close ;-)

@daryltucker
Copy link
Author

It has begun! I'll let you know once it's available to you guys.

@christophlingg
Copy link
Member

great!

@daryltucker
Copy link
Author

You guys should see photon-GLOBAL-2015-01-05.json.tar.bz2 :D

@christophlingg
Copy link
Member

can you let me know the complete path? cannot find it in your home directory

@christophlingg
Copy link
Member

the new data is indexed and available on port 2333

ich checked some place in Austria and Germany, unfortunately I found a wrong assignment in munich. e.g.

http://photon.komoot.de:2333/api?q=pizza+call+dachauer
http://photon.komoot.de:2333/api?q=frauenkirche

And also example @amnesia7 reported is not fixed:

http://photon.komoot.de:2333/api?q=trafalgar%20square%20london

It seems like the changes we made are not working. Quite disappointing ;-( @daryltucker are you sure you were using the 0.2-SNAPSHOT version?

@christophlingg
Copy link
Member

one positive thing: i fixed a bug in the schema which apparently reduced the index size from 66 to 46 GB

@lonvia
Copy link
Collaborator

lonvia commented Jan 11, 2015

There was a copy'n'paste error in my pull request. In this line the binarySearch should be against 'place' not 'osmValue'. I hope it fixes the problem.

@christophlingg
Copy link
Member

@lonvia, it fixed the problem in deed! I created a bunch of test cases to double check it. All known cases are fixed and tested. I think we can dare another json export, @daryltucker are you still available?

I uploaded the latest jar to http://photon.komoot.de/data/photon-0.2-SNAPSHOT.jar

@daryltucker
Copy link
Author

Running JSON Export

  • java -jar photon-0.2-SNAPSHOT.jar -json photon-GLOBAL-2015-01-12.json ...

4fd13d81183e89a8ee63f0a3addeadda photon-0.2-SNAPSHOT.jar

@daryltucker
Copy link
Author

Compressing

  • time tar -cj photon-GLOBAL-2015-01-12.json | pigz > photon-GLOBAL-2015-01-12.json.tar.bz2

@christophlingg
Copy link
Member

great, let me know when it's on the server!

@daryltucker
Copy link
Author

The above compression did not end up with a proper bz2 file. Trying the following:

time tar cvf photon-GLOBAL-2015-01-12.json.tar.bz2 --use-compress-prog=pbzip2 photon-GLOBAL-2015-01-12.json

@christophlingg
Copy link
Member

import is running, the first documents are popping up: http://photon.komoot.de:2333/api?q=paris

this will take now 2-3 hours

@christophlingg
Copy link
Member

the data is great, I could not find a bug anymore. I'll shut down this instance and will prepare a release! 👯

@christophlingg
Copy link
Member

I just released the new version, photon.komoot.de is now running the new version and exposes the state information!

Without the help of @daryltucker and @lonvia this would not have been possible, great!

@amnesia7
Copy link
Contributor

@christophlingg I'm not sure if this is fallout of the state change or not but I don't think the sydney opera house is in antarctica: http://photon.komoot.de/api/?q=sydney%20opera%20house

@lonvia
Copy link
Collaborator

lonvia commented Jan 16, 2015

The Antarctica thing is a data problem, people keep claiming Antarctica for their country. Nominatim tends to turn facts around in this case and claim the country for Antarctica. There is a workaround for that in the code somewhere but I'm not sure if @daryltucker has a recent enough version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants