-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wordending #56
Comments
Yohan told me this is something we should try out to see which option has the best tradeoff between performance and storage size. |
Yes, sure. Maybe there is even a better, less hacky way of doing this. E.g. like the cross_fields approach and still using nedge gram where it would just boost |
These docs seems to be more current + better example |
btw. the cross_fields approach might make the collector field obsolete. we introduced it to have equal idf for all fields. But I haven't been aware of this feature so far... @yohanboniface , we could even give different scores to each field, not only distinguish between name and collector. And much more important, we aren't forced to copy each time the default fields into the language specific collectors. this opens the door for multilingual support of all languages in osm as we save a lot of storage size... |
Yes, kind of recent feature but we'll have to try if this solves our problem. |
|
btw,
On the search logic part, the more up to date branch is https://github.com/komoot/photon/tree/positivescoring Also: add tests! :) |
Probably we also need a mailing list. Should I create a google group or one at openstreetmap? |
Re tests: do you mean creating Java test suite (master) or adding others? I could go to create Java test stuff |
I'd go for geocoding@openstreetmap.org, to keep the argument open instead of having a mailing dedicated for photon, then one for pelias, etc.
I was referring to search tests, like those, but all tests are good ;) BTW, Christoph already started on the Java side I think. |
Hmmh, 'geocoding' mainly sends issues. I would prefer a list dedicated to discussion where nominatim and photon would be okay but there are similar projects like e.g. GraphHopper and OSRM which have separate lists ;)
Ok, we still need some more lightweight test cases in Java I think. I've create a PR for that. See e.g. this |
I like the idea of a mailing list and would go for a photon specific mailing list as geocoding is super generic. Do you know who we can approach for setting up a new osm mailing list? Great commit, peter! |
@christophlingg I'll give you the mail via mail ;) |
Using wordending is kind of a workaround for nedgegram searches like
berlin erlange
which would match
berlinerstraße erlangen
but better should only match stuff like 'berlin erlange*'.When this workaround is used - why not avoid edge ngram at all and tokenize the query, plus do a prefix query for the last term? This would save space and memory with same quality. The only problem could be performance but my simple tests for small data don't tell me problems there.
The text was updated successfully, but these errors were encountered: