Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for accented characters #89

Merged
merged 4 commits into from Jan 10, 2017

Conversation

FrivalszkyP
Copy link
Contributor

I added an iconv conversion to every matched string so that accented characters are transliterated to their unaccented versions before searching. This is simple multilanguage support and this is how Google works, for example. For example if you search for "perez" it will return results containing "pérez" too.

I don't think it is necessary for this to be a configurable option because I can't really think of a situation where this would not come in handy.

Copy link
Contributor

@flaviocopes flaviocopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it yet, but wonder how this is going to play with the fact Grav can handle locales different than en_US https://github.com/getgrav/grav/blob/develop/system/src/Grav/Common/Grav.php#L137-L150

@mahagr
Copy link
Member

mahagr commented Dec 19, 2016

Maybe there's a need to whitelist languages where this is allowed. This only works with languages that use latin characters.

@FrivalszkyP
Copy link
Contributor Author

Right, right. It should be tested with non-latin alphabets.

@FrivalszkyP
Copy link
Contributor Author

I added a config option so that you can add this to your simplesearch.yaml file:

ignore_accented_characters: true

This way we don't have to maintain a whitelist of allowed languages etc. but the site administrator can decide if they want this feature or not.

I also refactored the code a little bit so it is more maintainable but it also comes with a little bit of overhead. I think it's acceptable.

@flaviocopes
Copy link
Contributor

Looks good, the only thing is, add an entry for the option in the blueprints.yaml file, so it's configurable from the Admin interface

Added "ignore_accented_characters" so that it can be updated via the admin page.
@FrivalszkyP
Copy link
Contributor Author

Added the changes, if you like it, please do a squash & merge if possible!

@flaviocopes flaviocopes merged commit 9b6b702 into getgrav:develop Jan 10, 2017
@flaviocopes
Copy link
Contributor

Merged, thanks!

@A----
Copy link

A---- commented Jan 25, 2017

The en_US locale has to be installed for this feature to work, or it wont be able to properly transliterate characters: string(5) "p?rez"

(Also, this option could be documentated in the README.md.)

@flaviocopes
Copy link
Contributor

Documented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants