Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for a transliteration callback #2618

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hanoii
Copy link

@hanoii hanoii commented Jun 3, 2016

Update 2016-10-06: This PR, while still open, is not currently being considered for merge. I need this on several projects and I want to reference the built package so that it can be pulled on different instances, so I will try to release packages from time to time. Those can be found on https://github.com/hanoii/chosen/releases. The code difference on those releases is what's referenced on this PR.

I needed a way to transliterate the items and search of what's on the dropdown. This made me do #2523 which was rightfully not accepted. I would still like to push a change that would allow me to do that without maintaining a custom fork of the project. Ir order to do so, I provided a harmless patch that simply adds the option to provide a transliteration callback.

As the previous pull request, this also attempts to fix or help with #536 .

Please double-check that:

  • All changes were made in CoffeeScript files, not JavaScript files.
  • You used Grunt to build the JavaScript files and tested them locally.
  • You've updated both the jQuery and Prototype versions.
  • You haven't manually updated the version number in package.json.
  • If necessary, you've updated the documentation.

@tjschuck
Copy link
Member

tjschuck commented Jun 3, 2016

/cc @pfiller

<td>transliteration_callback</td>
<td><code class="language-javascript">function(value) { return value }</code></td>
<td>
<p>By default, chosen performs no transliteration. This allows you to provide a callback performing your own alterations to the provided string argument. Callback receives a <code class="language-javascript">string</code> and must return a <code class="language-javascript">string</code>.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capitalize "Chosen".

@tjschuck
Copy link
Member

tjschuck commented Jun 3, 2016

This could actually be named even more generally, no? I.e. there's nothing specifically tying this to transliteration. It's actually just an option for a function to arbitrarily modify the search text.

I can't think of an actually good name, but it seems like naming this something like search_text_transformation instead would cover even more cases than just transliteration.

/cc @satchmorun, since he's good at naming things.

@hanoii
Copy link
Author

hanoii commented Jun 3, 2016

yes, I actually thought about that as well when doing this. Somehow I favoured transliteration because I couldn't really think of any other "useful" transformation to do, but what about:

normalisation_callback

I'd think that transliteration is a form of normalisation.

EDIT: One thing that I think this could be used is for fixing spelling mistakes while you type, although it's a pretty edge of a case.

@koenpunt
Copy link
Collaborator

koenpunt commented Jun 6, 2016

It is not really a callback, so I think naming it value_transformer would make more sense IMO.

@hanoii
Copy link
Author

hanoii commented Jun 6, 2016

Well, it is a function that'd be called by chosen, so I think theoretically is a callback, and I like the immediate reasoning that something that is named xxx_callback would be likely a function, but I won't push this too hard, so let me know the naming of your preference and I'll update the PR.

@koenpunt
Copy link
Collaborator

koenpunt commented Jun 6, 2016

A callback is generally called on completion, not in the middle of a transaction. But let's wait for @satchmorun's opinion.

@satchmorun
Copy link
Contributor

I agree with @tjschuck, though i'd be more active tense about it:

transform_search_text

normalize_search_text would also be a-ok by me. transform is a little more general, but normalization is really the only thing this is ever going to get used for.

And since I've been pinged for comment, I'll add that line 159 should perform the regex escaping after the normalization step, so that the transformation function doesn't accidentally break the regexification of the string. So something like: escapedSearchText = this.normalize_search_text(option.search_text).replace(...)

@stof
Copy link
Collaborator

stof commented Jun 8, 2016

Does the replacement of matches actually work fine when using transliteration ? I'm not sure about it

@tjschuck
Copy link
Member

tjschuck commented Jun 8, 2016

@stof What do you mean?

@hanoii Check out @satchmorun's last bit of feedback as well about the function call order.

@hanoii
Copy link
Author

hanoii commented Jun 8, 2016

Ok, attached are the proposed changes, @stof I am using it and just now updated my project with this change and works very good.

I took the liberty to modify the docs stylesheet very slightly so that the look of the table with the <pre> looks better, otherwise it was adding an extra padding.

@tjschuck
Copy link
Member

tjschuck commented Jun 8, 2016

@hanoii If you don't mind, can you back out your stylesheet changes and open a separate PR for those? That should be easy enough to review and merge separately and keep this PR more focused. Thanks!

@hanoii
Copy link
Author

hanoii commented Jun 8, 2016

@tjschuck done. I leave you to fix the styling yourself as you see fit.

@stof
Copy link
Collaborator

stof commented Jun 8, 2016

@hanoii the issue is that you search the startPos based on the normalized version to perform the highlighting replacement, which might be broken in case the normalization callback changes the length of the string (for instance, if you replace ö by oe to do German normalization of accents, the length changes and so positions will be shifted)

@hanoii
Copy link
Author

hanoii commented Jun 8, 2016

@stof Nice call, let me se if there's something to do about that. Any ideas?

@stof
Copy link
Collaborator

stof commented Jun 8, 2016

@hanoii no, I don't have an idea to fix it

@hanoii
Copy link
Author

hanoii commented Jun 9, 2016

Wow, @stof issue really set my brain on fire, if that's a valid english expression, but I couldn't stop thinking about this all day.

See if my latest commit is clear, it's a bit of a tricky logic, it's also probably not very cheap on iterations but I figured it shouldn't be that bad, and I made it so it only happens when an actual normalization happens on the search text.

I'll try to explain it with an example. And I actually gave #1137 a try as well as it was likely trigger some hind of highlighting issue as well.

Let's use the word:

"Speciꜵlwordr&bspeciꜵl"

The normalization converts the character ꜵ to "ao" and removes &.

Let's say we search for "iaolw" which should match "iꜵlw"

So I have two consecutive FORs, the first one start from the beginning of the word, and by removing the first character of the word on each iteration and normalizing it, I test whether the search is in the pattern. Because we are already there, we have a match for sure, all I am trying to find is when it stops matching, that would make the "startpos" of the real search_text regardless of any normalization.

So, some pseudo coding in the format i, str => normalize(str), still match?

Searching for "iaolw" (searchText is normalized as well)

0, "Speciꜵlwordr&bspeciꜵl" => "Speciaolwordrbspeciaol", yes
1, "peciꜵlwordr&bspeciꜵl" => "peciaolwordrbspeciaol", yes
2, "eciꜵlwordr&bspeciꜵl" => "eciaolwordrbspeciaol", yes
3, "ciꜵlwordr&bspeciꜵl" => "ciaolwordrbspeciaol", yes
4, "iꜵlwordr&bspeciꜵl" => "iaolwordrbspeciaol", yes
5, "ꜵlwordr&bspeciꜵl" => "aolwordrbspeciaol", no

Here the first for breaks and we know the start of the highlighting should happen at pos 4 of the real word

Now I do another for with a similar approach to find the length of the highlighting, starting from the previous found position and increasing the length, as soon as it matches the searchText, it means we found our length

So, pseudo coding in the format i, length, str => normalize(str), match?

4,0, "" => "", no
4,1, "i" => "i", no
4,2, "iꜵ" => "iao", no
4,3, "iꜵl" => "iaol", no
4,4, "iꜵlw" => "iaolw", yes

We found our length, 4.

So highlighting goes like this Speciꜵlwordr&bspeciꜵl

I tried a lot of use cases and seems to work very well.

BTW, found a very small bug, completely unrelated to this issue will write an issue but it's very late so just noting it here. If "search_contains" is not set, looking for a letter r in something like "Central African Republic" will match it fine but it will highlight the first r in Central. You can actually reproduce it right now in http://harvesthq.github.io/chosen/.

# logic to highlight results
if this.normalize_search_text(option.search_text) != option.search_text
for i in [0..option.search_text.length - 1] by 1
if new RegExp(this.normalize_search_text(searchText), 'i').test(this.normalize_search_text(option.search_text.substr(i))) is false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is missing the escaping of special regex chars when building the regex

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, you should cache this.normalize_search_text(searchText) (and its escaping, and potentially even the regex itself) in a local variable instead of normalizing again for each iteration. This will be faster

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, we even already have the regex. It is the zregex variable

@hanoii
Copy link
Author

hanoii commented Jun 9, 2016

@stof done.

@stof
Copy link
Collaborator

stof commented Jun 15, 2016

Looks good to me (and btw, I have an issue opened in my work project which would be a 5 min fix if this PR gets merged, so thank you)

@tjschuck
Copy link
Member

@hanoii Can you squash this down to one commit with a good message? Instructions here.

Thanks!

@hanoii
Copy link
Author

hanoii commented Jun 15, 2016

@tjschuck done

@tjschuck
Copy link
Member

@pfiller Want to give this a last look? We've got enough in the current release hopper as well that it's probably worth cutting a release shortly after this one goes in.

return search_text
}</code></pre></td>
<td>
<p>By default, Chosen performs no text normalization. This allows you to provide a callback performing your own alterations to the provided search_text argument, i.e. for transliteration. Callback receives a <code class="language-javascript">string</code> and must return a <code class="language-javascript">string</code>.</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_text should be in a <code> tag, I think

@hanoii hanoii mentioned this pull request Aug 10, 2016
5 tasks
@hanoii
Copy link
Author

hanoii commented Aug 10, 2016

Namely, something like #2666

@hanoii
Copy link
Author

hanoii commented Aug 10, 2016

So, just following what @koenpunt mentioned on #2666, I see why something like that can be a really a one off, and just for me, so I closed that one.

Warning: very very small rant ahead.

On the other hand, this particular PR is not that heavy in terms on logic, and it does help with other issues mentioned here. I am not sure I am in the same page in rejecting this PR (yes, I am a bit biased) but trying to keep it objective, the PR doesn't introduce that much of heavy logic, and it only kicks in when this new feature (normalisation) is actually being used. This means that it doesn't affect or change the way people are using chosen right now, it only allows for a bit of further customisation, so it feels that's being rejected for "not liking it" than anything else.

That being said, I have a huge appreciation with any open source tool out there and I know you put hundreds if not thousands of hours into something like this, so thanks again.

So if there's really no further consideration for something like this, feel free to close this PR and I'll attempt to keep an up to date fork with matching releases to yours.

Thanks!

@hanoii hanoii force-pushed the transliteration branch 2 times, most recently from 564c58e to fe363ef Compare October 5, 2016 23:00
@hanoii
Copy link
Author

hanoii commented Oct 5, 2016

For what it's worth, I just updated the PR to the recent changes.

@alejoar
Copy link

alejoar commented Nov 21, 2016

Just want to say thank you to @hanoii for keeping it up. Your release works great and we just switched to it in our production server. Cheers!

@hanoii
Copy link
Author

hanoii commented Jun 20, 2017

Not sure why the checks failed, but just updated to 1.7.0, rebased and released a new one with this included.

@mdelorimier
Copy link

Hi,

Is there a directly installable version of you PR without compiling coffee script?

Regards :)

@hanoii
Copy link
Author

hanoii commented Nov 23, 2017

@mdelorimier as explained on the summary, I keep releases on https://github.com/hanoii/chosen/releases. I am not keeping up with chosen so quickly though. Will try to release a new one.

@hanoii
Copy link
Author

hanoii commented Nov 23, 2017

I spent quite a bit of time on this issue again. I kind of refactored everything into the newest code from scratch as rebasing was getting harder. Also somewhere in this PR it was requested I think by @koenpunt that the highlight logic should be moved to a separate function. I reverted that for the time being to make it easier to keep modifications up to date with master.

I also added an example to the main index page along the option documentation.

I still thing it's a nice addition to a great library. @tjschuck any reconsideration for something like this?

@hanoii
Copy link
Author

hanoii commented Nov 23, 2017

@mdelorimier a new release is up

@tjschuck
Copy link
Member

@harvesthq/chosen-developers Any of y'all have any followup thoughts on this? Looks like @stof was on board, but then I paused for a final +1 from @pfiller. We all know that's not happening 😁

@adunkman @satchmorun Any thoughts? Particularly as this pertains to fixing #536, which seems like a pretty valid use case (with the caveat that I'm viewing #536 from the lens of a pretty-much ASCII-only language, so I don't know how real the problem of searching with a for á really is in practice).

@koenpunt
Copy link
Collaborator

Well I’d say we’re missing tests. Also, if possible, a bit more code separation would do no harm; the winnow_results function is getting pretty big..

@hanoii
Copy link
Author

hanoii commented Nov 28, 2017

Tests, I wouldn't mind giving them a shot, probably with some assistance if you are willing, but I'd rather do that if/when this PR is accepted for merging.

On the same line, as mentioned above, I wouldn't mind separating winnow_results, but also rather do that either on a second issue after this gets in, because it really simplifies rebasing while chosen moves on its own.

Also if you checkout this branch and look into the docsupport you'll see a nice example of this working.

@quentindemetz
Copy link

I would love to see this merged as well. French has a lot of accents and it's really useful to be able to search without typing them all in. @tjschuck is there anything I can do to help move this along? It's frustrating to see @hanoii 's solid work just sitting idle.
Merci 🇫🇷

@garstud
Copy link

garstud commented Nov 22, 2022

Hello, I have the same transliteration problem with ChosenJS in version 1.6.2.
I tried to integrate the methods stringEscaped and winnow_results from the Hanoi patch, replacing the original methods winnow_results and get_search_regex!
It works correctly and I haven't, for the moment, lifted any side effects that would cause bugs in ChosenJS.
Yet I am wary of using a Patch from 6 years ago.

  • Do you know if the Hanoi patch causes incompatibilities in ChosenJs v1.6+?
  • is there a reason why the transliteration problem has not been fixed in ChosenJS?

Thanks for your feedback.

@stof
Copy link
Collaborator

stof commented Nov 22, 2022

  • is there a reason why the transliteration problem has not been fixed in ChosenJS?

The reason is given in the first lines of the readme: https://github.com/harvesthq/chosen#readme

@garstud
Copy link

garstud commented Nov 22, 2022

Thank you very much for your answer.
I'm not sure I understood. If it is related to the OpenSource contribution system, then I fully agree and the objective of my comment is precisely to advance the subject through my feedback and oriented questions to federate feedback from other people.
If the problem of transliteration support in ChosenJS is linked to a lack of suitable PR, is there a subject around the Hanoi patch that blocks its integration into a PR?
(With pleasure to bring my contribution on this subject in this beautiful project, if I can)

@koenpunt
Copy link
Collaborator

There's no (active) development on this project; it's in a stable state, and no new features are currently accepted, so also this PR won't be merged any time soon (if ever). The team at Harvest has to decide on the future of Chosen.
That said; if patching your version of Chosen works for you then I'd just continue doing that. I've actually also been using a modified copy for over 5 years in one of my projects.

@garstud
Copy link

garstud commented Nov 22, 2022

Ok, thank you for your explanations and your feedback. In this case, I will surely do like you and maintain a fork of ChosenJS for my specific project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Chosen Progress
Needs Review / Categorization
Development

Successfully merging this pull request may close these issues.

None yet

9 participants