Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wordending #56

Open
karussell opened this issue May 28, 2014 · 13 comments
Open

Wordending #56

karussell opened this issue May 28, 2014 · 13 comments

Comments

@karussell
Copy link
Collaborator

Using wordending is kind of a workaround for nedgegram searches like

berlin erlange

which would match berlinerstraße erlangen but better should only match stuff like 'berlin erlange*'.

When this workaround is used - why not avoid edge ngram at all and tokenize the query, plus do a prefix query for the last term? This would save space and memory with same quality. The only problem could be performance but my simple tests for small data don't tell me problems there.

@christophlingg
Copy link
Member

Yohan told me this is something we should try out to see which option has the best tradeoff between performance and storage size.

@karussell
Copy link
Collaborator Author

Yes, sure. Maybe there is even a better, less hacky way of doing this. E.g. like the cross_fields approach and still using nedge gram where it would just boost berlin erlanger* more than berlin* erlangen somehow.

@karussell
Copy link
Collaborator Author

These docs seems to be more current + better example

@christophlingg
Copy link
Member

btw. the cross_fields approach might make the collector field obsolete. we introduced it to have equal idf for all fields. But I haven't been aware of this feature so far...

@yohanboniface , we could even give different scores to each field, not only distinguish between name and collector. And much more important, we aren't forced to copy each time the default fields into the language specific collectors. this opens the door for multilingual support of all languages in osm as we save a lot of storage size...

@karussell
Copy link
Collaborator Author

Yes, kind of recent feature but we'll have to try if this solves our problem.

@yohanboniface
Copy link
Collaborator

cross_fields can't work with fuzzy atm.

@yohanboniface
Copy link
Collaborator

btw, wordending is not the hotest topic if you have time to spend on search logic. Two things we are on:

On the search logic part, the more up to date branch is https://github.com/komoot/photon/tree/positivescoring

Also: add tests! :)

@karussell
Copy link
Collaborator Author

Probably we also need a mailing list. Should I create a google group or one at openstreetmap?

@karussell
Copy link
Collaborator Author

Re tests: do you mean creating Java test suite (master) or adding others? I could go to create Java test stuff

@yohanboniface
Copy link
Collaborator

Probably we also need a mailing list. Should I create a google group or one at openstreetmap?

I'd go for geocoding@openstreetmap.org, to keep the argument open instead of having a mailing dedicated for photon, then one for pelias, etc.

Re tests: do you mean creating Java test suite (master) or adding others? I could go to create Java test stuff

I was referring to search tests, like those, but all tests are good ;) BTW, Christoph already started on the Java side I think.

@karussell
Copy link
Collaborator Author

Hmmh, 'geocoding' mainly sends issues. I would prefer a list dedicated to discussion where nominatim and photon would be okay but there are similar projects like e.g. GraphHopper and OSRM which have separate lists ;)

I was referring to search tests, like those, but all tests are good ;) BTW, Christoph already started on the Java side I think.

Ok, we still need some more lightweight test cases in Java I think. I've create a PR for that. See e.g. this

@christophlingg
Copy link
Member

I like the idea of a mailing list and would go for a photon specific mailing list as geocoding is super generic. Do you know who we can approach for setting up a new osm mailing list?

Great commit, peter!

@karussell
Copy link
Collaborator Author

@christophlingg I'll give you the mail via mail ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants