Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Questions] Setting up Pelias for a custom dataset #88

Closed
hkrishna opened this issue May 7, 2015 · 5 comments
Closed

[Questions] Setting up Pelias for a custom dataset #88

hkrishna opened this issue May 7, 2015 · 5 comments

Comments

@hkrishna
Copy link
Contributor

hkrishna commented May 7, 2015

Recently, @flotpk and I have been having a great conversation about setting up Pelias with custom datasets (writing pelias importers, getting the API to work with the new data layers etc) and I thought it would be nice to put it out on github for others to look at and contribute to.

I have boiled it down to a simple Question and answer format

Is there any mailing list where I can post questions / problems ?

you could email pelias@mapzen.com or just open an issue at https://github.com/pelias/pelias/issues and add a label called 'question' (github is preferred because we like thing to be as open as possible)

Been searching through and can't find any documentation about mapping and importing custom dataset (other than the openstreetmap, geonames, openaddresses, quattroshapes data). Would be great if you could point me to the guide if there is one.

unfortunately we do not have a documentation for writing a custom importer. I have opened an issue and we'll get to is as soon as we can :) It would be great if you can add a comment of your findings and how you got it working.

Is there any roadmap for Pelias release - e.g. first release for the stable / beta release ?

there is definitely a roadmap - everything with a label v1.0.0 in the pelias org is for the stable/beta/v1 release. here is an alternative view of all the github repos in pelias org https://waffle.io/pelias/pelias?milestone=Pelias%20v1.0.0

if I understand correctly, all the new layer / type created must be based on "Pelias schema". Can I simply add a new column by changing schema/mappings/document.js ??

Yes - you can add a custom column to document.js and give it an appropriate type/ partial mapper.

in my custom type, i have inserted address no+street name to address. name column and address no to address.number and street name to address.street separately... when I perform search through Pelias API /search, does Pelias search against name column or ??

Right now, /search is done against name.default column only - so, as long as you set that column with the complete street address you should be good to go. address object was added to schema to enable the API to query smartly, fallback on the street if it doesnt find a house number, street interpolations etc - however this is a work in progress and will get implemented once we have a good address parser in place.

wish to understand how the /search results were generated or sorted - based on what criteria ??

Currently we have a bunch of groovy scripts that takes popularity, population, category weights etc into account - you can see the order in which they affect the sorting in the API here

@hkrishna
Copy link
Contributor Author

hkrishna commented May 7, 2015

Questions related to /suggest API:

I have managed to load a few custom layers - one POI (landmarks) and a number of address (suburbs, addresses, etc). My /search API return results which are quite satisfactory,

_/search?input=east perth_
=> return a list of results from 2 layers (1 from suburbs and 9 from addresses).

However, when perform /suggest, e.g. the result is always based on landmarks layer:

_/suggest?input=east perth&lat=-32.117534&lon=115.932385_
=> this will return 7 results from landmarks layer

_/suggest?input=east%20perth&lat=-32.117534&lon=115.932385&layers=suburbs_
=> this will return one result

_/suggest?input=east%20perth&lat=-32.117534&lon=115.932385&layers=addresses_
=> this will return zero result

Based on my study on mapzen demo web site : https://pelias.mapzen.com/suggest?bbox=-17.518344187852207,143.1298828125,-37.0551771066608,89.736328125&input=east+pe&lat=-27.722&lon=116.433&size=10&zoom=5
=> results return from local_admin and geoname layer

My questions: I believe /suggest should return results from more than one layer and I just saw the demo site did that. But why can't I get results from multiple layers in my pelias instance ?!?!? Is there any configuration/setting files that I need to fine tune or change to get it to work ???

Answer

You mentioned that you added these custom layers to https://github.com/pelias/api/blob/master/helper/layers.js Can you tell me where you added these layers in that file or share lines 15-17?

  layers = expand_aliases('poi',   layers, ['geoname','osmnode','osmway']);
  layers = expand_aliases('admin', layers, ['admin0','admin1','admin2','neighborhood','locality','local_admin']);
  layers = expand_aliases('address', layers, ['osmaddress','openaddresses']);

Did you keep the aliases 'poi', 'admin', 'address' ??

The reason I ask is because there is another file/abstraction called querymixer.json that you need to aware of. https://github.com/pelias/api/blob/master/helper/queryMixer.json

Its in querymixer.json that we define the composition rules for /suggest. For example -

{  
  "suggest": [
    {
      "layers": ["poi", "admin", "address"],
      "precision": [5, 3, 1]
    },
    {
      "layers": ["admin"],
      "precision": []
    },
    {
      "layers": ["poi", "admin", "address"],
      "precision": [3],
      "fuzzy": "AUTO"
    }
  ],
  ...
}

In the above code snippet, we define multiple suggesters that target various alias layers (poi, admin, address) - You could change these to your individual layers as well (for ex: landmarks, suburbs, address etc). When you pass a layer param - this gets overridden with the given layer name.

    {     
      "layers": ["address"],
      "precision": [5, 3, 1]
    }

In the above code snippet, it takes only the address layer (or alias layer) and uses precision levels 5, 3 and 1 (precision level is elasticsearch's way of representing a certain geo hash - this simply means higher the precision level number, higher the geohash precision - closer to the lat/lon passed. See es documentation The reason we do query mixing manually is because of a Elasticsearch bug that is being addressed here but it will land in elasticsearch's 2.0.0 release (eta one month)

For now, /suggest does not take bounding box (bbox) into account. This will be fixed with the new ES release. We have a open ticket for this as well.

As far as scoring goes, there is a weights.js file as part of the suggester-pipeline (so, if you change this you will have to re-import). We assign weights based on the layer to which the doc belongs to. And lets say you add your custom layers with a weight in the given weights file and reimport - you should start to see score corresponding to said weights. Scoring varies for /suggest and /search - for /suggest its simply the weights we define where as for /search, _score is calculated based on text matching and how frequent the search query appears in the whole index (tf-idf: term frequency - inverse document frequency)

@hkrishna
Copy link
Contributor Author

hkrishna commented May 7, 2015

Follow up

The custom layers added in layers.js is as below :

layers = expand_aliases('poi',   layers, ['geoname','osmnode','osmway']);
layers = expand_aliases('admin', layers, ['admin0','admin1','admin2','neighborhood','locality','local_admin']);
layers = expand_aliases('address', layers, ['landmarks','suburbs','addresses','stops']);

I had previously removed all the layers in "poi" alias and leave it empty as ('poi, layers, [' ']) but decided to leave it intact. Question, I guess I need to leave 'poi' and 'admin' aliases even though i don't use them ??

With the aliases above remain intact, no changes done to querymixer.json => I did try setting different layer names there but that doesn't change much to the /suggest output results.

What makes the results different is the "weights" - follow your suggestions - I have re-imported all the data with same weights, that significantly improve the /suggest results however with "East Perth" - I noticed that I still don't have any result from "suburbs" layer. So have make the "suburbs" having more weights than all the rest of the layers. With that I get the following results:

_/suggest?input=east%20perth&lat=-31.946284&lon=115.845469_

{  
   "type":"FeatureCollection",
   "features":[  
      {  
         "type":"Feature",
         "properties":{  
            "id":"218",
            "layer":"suburbs",
            "name":"EAST PERTH",
            "admin0":"Western Australia",
            "admin1":"Perth",
            "admin2":"EAST PERTH",
            "text":"EAST PERTH, Perth"
         },
         "geometry":{  
            "type":"Point",
            "coordinates":[  
               115.876925,
               -31.956572
            ]
         }
      },
      {  
         "type":"Feature",
         "properties":{  
            "id":"9762",
            "layer":"landmarks",
            "name":"EAST PERTH BACKPACKERS",
            "admin0":"Western Australia",
            "admin1":"Perth",
            "admin2":"EAST PERTH BACKPACKERS",
            "text":"EAST PERTH BACKPACKERS, Perth"
         },
         "geometry":{  
            "type":"Point",
            "coordinates":[  
               115.872754,
               -31.958912
            ]
         }
      },
      {  
         "type":"Feature",
         "properties":{  
            "id":"9880",
            "layer":"landmarks",
            "name":"EAST PERTH FOOTBALL",
            "admin0":"Western Australia",
            "admin1":"Perth",
            "admin2":"EAST PERTH FOOTBALL",
            "text":"EAST PERTH FOOTBALL, Perth"
         },
         "geometry":{  
            "type":"Point",
            "coordinates":[  
               115.842871,
               -31.935228
            ]
         }
      },
      {  
         "type":"Feature",
         "properties":{  
            "id":"9435",
            "layer":"landmarks",
            "name":"EAST PERTH CP",
            "admin0":"Western Australia",
            "admin1":"Perth",
            "admin2":"EAST PERTH CP",
            "text":"EAST PERTH CP, Perth"
         },
         "geometry":{  
            "type":"Point",
            "coordinates":[  
               115.87493,
               -31.961996
            ]
         }
      }
   ],
   "bbox":[  
      115.842871,
      -31.961996,
      115.876925,
      -31.935228
   ],
   "date":1430899935731
}

Guess "weight" is the only thing that I need to fine tune to get the best results for time being.

Question: Is this https://github.com/pelias/api/blob/master/helper/category_weights.js related to the category values if exists in the dataset ?

Answer

If you only care about your datasets - I'd suggest you go with the following layer configuration

layers = expand_aliases('poi',   layers, ['landmarks']);
layers = expand_aliases('admin', layers, ['suburbs']);
layers = expand_aliases('address', layers, ['addresses','stops']);

poi stands for 'points of interest' and based on the name 'landmark' I'm assuming it can be considered as a poi. admin is short for administrative boundaries (think city, state, province, country etc) - even suburbs.

Querymixer basically looks at querymixer.json which looks like the following for /suggest

{  
  "suggest": [
    {
      "layers": ["poi", "admin", "address"],
      "precision": [5, 3, 1]
    },
    {
      "layers": ["admin"],
      "precision": []
    },
    {
      "layers": ["poi", "admin", "address"],
      "precision": [3],
      "fuzzy": "AUTO"
    }
  ],
  ...
}

You could also mess with this file and make it look like the following and see if you get better results..

{  
  "suggest": [
    {
      "layers": ["landmarks", "suburbs", "address", "stops"],
      "precision": [5, 3, 1]
    },
    {
      "layers": ["suburbs"],
      "precision": []
    },
    {
      "layers": ["landmarks", "suburbs", "address", "stops"],
      "precision": [3],
      "fuzzy": "AUTO"
    }
  ],
  ...
}

weights could be used as a sorting mechanism in case you have many matches and you want to boost results from a certain dataset (for ex: suburbs) higher than another dataset say landmarks - you would set the weights like so..

suburbs: 10
landmarks: 5
...

And yes, category weights is used if you have category values (for ex: https://github.com/pelias/openstreetmap/blob/master/config/category_map.js)

@flotpk
Copy link

flotpk commented May 8, 2015

Thanks @hkrishna for putting this page up. :)

I have tried your suggestion - making changes to querymixer - results improved accordingly. And with the weights setting combination (all layers having similar weights and only suburbs higher) I can get result from suburb layer return on top all the time.

Thanks for all the helps given. :)

@hkrishna
Copy link
Contributor Author

hkrishna commented May 8, 2015

Thank you @flotpk for asking all the right questions :) Your feedback is really valuable for us. Keep us posted with your thoughts, concerns and suggestions that could improve Pelias as a product.

@hkrishna hkrishna removed the question label May 8, 2015
@hkrishna hkrishna changed the title [FAQ] About all things Pelias [Questions] Setting up Pelias for a custom dataset May 8, 2015
@flotpk
Copy link

flotpk commented Jul 16, 2015

HI @hkrishna, I have a few more questions regarding the setting in queryMixer.json.

{  
  "suggest": [
    {
      "layers": ["landmarks", "suburbs", "address", "stops"],
      "precision": [5, 3, 1]
    },
    {
      "layers": ["suburbs"],
      "precision": []
    },
    {
      "layers": ["landmarks", "suburbs", "address", "stops"],
      "precision": [3],
      "fuzzy": "AUTO"
    }
  ],
  ...
}

Looking at the above setting:
(a) I noticed the order of precision either 5,31 or 1,3,5 will return the results in different order, would you be able to explain how the order is used in searching or displaying the result

(b) What's the purpose of using "precision": [] ?

(c) What's the purpose of the last section with precision = 3 and fuzzy = AUTO ?

Can I fine tune the setting here if I would like /suggest API to return result for the following case:
e.g.
input=20 Street Name
output=20 Street Name, 205 Street Name

e.g. input=20A Street Name
output= 205 Street Name

Based on the above, can I fine tune the setting in order for /suggest to return 20 Street Name when I enter 20A Street Name ??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants