-
-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Questions] Setting up Pelias for a custom dataset #88
Comments
Questions related to /suggest API:I have managed to load a few custom layers - one POI (landmarks) and a number of address (suburbs, addresses, etc). My _/search?input=east perth_ However, when perform /suggest, e.g. the result is always based on _/suggest?input=east perth&lat=-32.117534&lon=115.932385_ _/suggest?input=east%20perth&lat=-32.117534&lon=115.932385&layers=suburbs_ _/suggest?input=east%20perth&lat=-32.117534&lon=115.932385&layers=addresses_ Based on my study on mapzen demo web site : https://pelias.mapzen.com/suggest?bbox=-17.518344187852207,143.1298828125,-37.0551771066608,89.736328125&input=east+pe&lat=-27.722&lon=116.433&size=10&zoom=5 My questions: I believe AnswerYou mentioned that you added these custom layers to https://github.com/pelias/api/blob/master/helper/layers.js Can you tell me where you added these layers in that file or share lines 15-17? layers = expand_aliases('poi', layers, ['geoname','osmnode','osmway']);
layers = expand_aliases('admin', layers, ['admin0','admin1','admin2','neighborhood','locality','local_admin']);
layers = expand_aliases('address', layers, ['osmaddress','openaddresses']); Did you keep the aliases 'poi', 'admin', 'address' ?? The reason I ask is because there is another file/abstraction called Its in querymixer.json that we define the composition rules for {
"suggest": [
{
"layers": ["poi", "admin", "address"],
"precision": [5, 3, 1]
},
{
"layers": ["admin"],
"precision": []
},
{
"layers": ["poi", "admin", "address"],
"precision": [3],
"fuzzy": "AUTO"
}
],
...
} In the above code snippet, we define multiple suggesters that target various alias layers (poi, admin, address) - You could change these to your individual layers as well (for ex: landmarks, suburbs, address etc). When you pass a layer param - this gets overridden with the given layer name. {
"layers": ["address"],
"precision": [5, 3, 1]
} In the above code snippet, it takes only the address layer (or alias layer) and uses precision levels 5, 3 and 1 (precision level is elasticsearch's way of representing a certain geo hash - this simply means higher the precision level number, higher the geohash precision - closer to the lat/lon passed. See es documentation The reason we do query mixing manually is because of a Elasticsearch bug that is being addressed here but it will land in elasticsearch's 2.0.0 release (eta one month) For now, As far as scoring goes, there is a weights.js file as part of the suggester-pipeline (so, if you change this you will have to re-import). We assign weights based on the layer to which the doc belongs to. And lets say you add your custom layers with a weight in the given weights file and reimport - you should start to see score corresponding to said weights. Scoring varies for |
Follow upThe custom layers added in layers.js is as below : layers = expand_aliases('poi', layers, ['geoname','osmnode','osmway']);
layers = expand_aliases('admin', layers, ['admin0','admin1','admin2','neighborhood','locality','local_admin']);
layers = expand_aliases('address', layers, ['landmarks','suburbs','addresses','stops']); I had previously removed all the layers in "poi" alias and leave it empty as ('poi, layers, [' ']) but decided to leave it intact. Question, I guess I need to leave 'poi' and 'admin' aliases even though i don't use them ?? With the aliases above remain intact, no changes done to querymixer.json => I did try setting different layer names there but that doesn't change much to the What makes the results different is the "weights" - follow your suggestions - I have re-imported all the data with same weights, that significantly improve the /suggest results however with "East Perth" - I noticed that I still don't have any result from "suburbs" layer. So have make the "suburbs" having more weights than all the rest of the layers. With that I get the following results: _/suggest?input=east%20perth&lat=-31.946284&lon=115.845469_ {
"type":"FeatureCollection",
"features":[
{
"type":"Feature",
"properties":{
"id":"218",
"layer":"suburbs",
"name":"EAST PERTH",
"admin0":"Western Australia",
"admin1":"Perth",
"admin2":"EAST PERTH",
"text":"EAST PERTH, Perth"
},
"geometry":{
"type":"Point",
"coordinates":[
115.876925,
-31.956572
]
}
},
{
"type":"Feature",
"properties":{
"id":"9762",
"layer":"landmarks",
"name":"EAST PERTH BACKPACKERS",
"admin0":"Western Australia",
"admin1":"Perth",
"admin2":"EAST PERTH BACKPACKERS",
"text":"EAST PERTH BACKPACKERS, Perth"
},
"geometry":{
"type":"Point",
"coordinates":[
115.872754,
-31.958912
]
}
},
{
"type":"Feature",
"properties":{
"id":"9880",
"layer":"landmarks",
"name":"EAST PERTH FOOTBALL",
"admin0":"Western Australia",
"admin1":"Perth",
"admin2":"EAST PERTH FOOTBALL",
"text":"EAST PERTH FOOTBALL, Perth"
},
"geometry":{
"type":"Point",
"coordinates":[
115.842871,
-31.935228
]
}
},
{
"type":"Feature",
"properties":{
"id":"9435",
"layer":"landmarks",
"name":"EAST PERTH CP",
"admin0":"Western Australia",
"admin1":"Perth",
"admin2":"EAST PERTH CP",
"text":"EAST PERTH CP, Perth"
},
"geometry":{
"type":"Point",
"coordinates":[
115.87493,
-31.961996
]
}
}
],
"bbox":[
115.842871,
-31.961996,
115.876925,
-31.935228
],
"date":1430899935731
} Guess "weight" is the only thing that I need to fine tune to get the best results for time being. Question: Is this https://github.com/pelias/api/blob/master/helper/category_weights.js related to the category values if exists in the dataset ? AnswerIf you only care about your datasets - I'd suggest you go with the following layer configuration layers = expand_aliases('poi', layers, ['landmarks']);
layers = expand_aliases('admin', layers, ['suburbs']);
layers = expand_aliases('address', layers, ['addresses','stops']);
Querymixer basically looks at querymixer.json which looks like the following for /suggest {
"suggest": [
{
"layers": ["poi", "admin", "address"],
"precision": [5, 3, 1]
},
{
"layers": ["admin"],
"precision": []
},
{
"layers": ["poi", "admin", "address"],
"precision": [3],
"fuzzy": "AUTO"
}
],
...
} You could also mess with this file and make it look like the following and see if you get better results.. {
"suggest": [
{
"layers": ["landmarks", "suburbs", "address", "stops"],
"precision": [5, 3, 1]
},
{
"layers": ["suburbs"],
"precision": []
},
{
"layers": ["landmarks", "suburbs", "address", "stops"],
"precision": [3],
"fuzzy": "AUTO"
}
],
...
} weights could be used as a sorting mechanism in case you have many matches and you want to boost results from a certain dataset (for ex: suburbs) higher than another dataset say landmarks - you would set the weights like so.. suburbs: 10
landmarks: 5
... And yes, category weights is used if you have category values (for ex: https://github.com/pelias/openstreetmap/blob/master/config/category_map.js) |
Thanks @hkrishna for putting this page up. :) I have tried your suggestion - making changes to querymixer - results improved accordingly. And with the weights setting combination (all layers having similar weights and only suburbs higher) I can get result from suburb layer return on top all the time. Thanks for all the helps given. :) |
Thank you @flotpk for asking all the right questions :) Your feedback is really valuable for us. Keep us posted with your thoughts, concerns and suggestions that could improve Pelias as a product. |
HI @hkrishna, I have a few more questions regarding the setting in queryMixer.json. {
"suggest": [
{
"layers": ["landmarks", "suburbs", "address", "stops"],
"precision": [5, 3, 1]
},
{
"layers": ["suburbs"],
"precision": []
},
{
"layers": ["landmarks", "suburbs", "address", "stops"],
"precision": [3],
"fuzzy": "AUTO"
}
],
...
} Looking at the above setting: (b) What's the purpose of using "precision": [] ? (c) What's the purpose of the last section with precision = 3 and fuzzy = AUTO ? Can I fine tune the setting here if I would like /suggest API to return result for the following case: e.g. input=20A Street Name Based on the above, can I fine tune the setting in order for /suggest to return 20 Street Name when I enter 20A Street Name ?? |
Recently, @flotpk and I have been having a great conversation about setting up Pelias with custom datasets (writing pelias importers, getting the API to work with the new data layers etc) and I thought it would be nice to put it out on github for others to look at and contribute to.
I have boiled it down to a simple Question and answer format
Is there any mailing list where I can post questions / problems ?
you could email pelias@mapzen.com or just open an issue at https://github.com/pelias/pelias/issues and add a label called 'question' (github is preferred because we like thing to be as open as possible)
Been searching through and can't find any documentation about mapping and importing custom dataset (other than the openstreetmap, geonames, openaddresses, quattroshapes data). Would be great if you could point me to the guide if there is one.
unfortunately we do not have a documentation for writing a custom importer. I have opened an issue and we'll get to is as soon as we can :) It would be great if you can add a comment of your findings and how you got it working.
Is there any roadmap for Pelias release - e.g. first release for the stable / beta release ?
there is definitely a roadmap - everything with a label
v1.0.0
in the pelias org is for the stable/beta/v1 release. here is an alternative view of all the github repos in pelias org https://waffle.io/pelias/pelias?milestone=Pelias%20v1.0.0if I understand correctly, all the new layer / type created must be based on "Pelias schema". Can I simply add a new column by changing schema/mappings/document.js ??
Yes - you can add a custom column to document.js and give it an appropriate type/ partial mapper.
in my custom type, i have inserted address no+street name to
address. name
column and address no toaddress.number
and street name toaddress.street
separately... when I perform search through Pelias API /search, does Pelias search against name column or ??Right now,
/search
is done againstname.default
column only - so, as long as you set that column with the complete street address you should be good to go.address
object was added to schema to enable the API to query smartly, fallback on the street if it doesnt find a house number, street interpolations etc - however this is a work in progress and will get implemented once we have a good address parser in place.wish to understand how the
/search
results were generated or sorted - based on what criteria ??Currently we have a bunch of groovy scripts that takes popularity, population, category weights etc into account - you can see the order in which they affect the sorting in the API here
The text was updated successfully, but these errors were encountered: