New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EZP-24292: Extract template operator(s) do not work correctly with utf8 strings #1162

Open
wants to merge 1 commit into
base: master
from

Conversation

5 participants
@brookinsconsulting
Copy link
Contributor

brookinsconsulting commented Apr 22, 2015

Hello,

We write today to share a PR for a problem which was brought up in the share.ez.no forums: http://share.ez.no/forums/developer/extract-returns-garbage-when-result-contains-special-chars

It seems that we found several template operators which are not utf8 string friendly. This PR started with trying to solve the problems in this file as it applies to the extract template operator.

Once our improvements worked well for that use case, we noticed that other operators powered by this file's code were also negatively affected by this classes limitations and we went ahead and cleaned up the rest of the use cases we could find that needed help.

Since these we require mbstring php extension by default we think this should be no problem, that said we went one step further and made it more dynamic if mbstring functions are not available, https://doc.ez.no/display/EZP/Requirements+5.4

We based the dynamic function usage testing for mbstring php extension functions before using them and providing a non-utf8 compatible fall back off the existing (inconsistant) implementation found in lib/eztemplate/classes/eztemplatestringoperator.php

Minor cs improvements were made as well.

This pull request replaces usage of strlen, strpos, substr and strrpos with mb string replacements in the context of the following template operators.

  • extract
  • extract_left
  • extract_right
  • begins_with
  • ends_with
  • insert
  • remove
  • contains

Worth noting that the reverse operator remains utf8 unfriendly since php by default does not yet contain a native mb_strrev function we would need to implement our own, which I have ignored for now.

Issue ticket: https://jira.ez.no/browse/EZP-24292

Please let us know how this finds you!

Cheers,
Brookins Consulting

@rashidul0405

This comment has been minimized.

Copy link

rashidul0405 commented Apr 27, 2015

Please add this pull request.

@brookinsconsulting

This comment has been minimized.

Copy link
Contributor Author

brookinsconsulting commented May 5, 2015

ping @bdunogier

@bdunogier

This comment has been minimized.

Copy link
Member

bdunogier commented Jun 12, 2015

Ping @yannickroger I think you've worked on related problems, have you not ?

@andrerom

This comment has been minimized.

Copy link
Member

andrerom commented Jun 12, 2015

ext-mbstring is a requirement, so actually it could be swapped instead of having to feature detect this. But not a big deal of course.

@brookinsconsulting

This comment has been minimized.

Copy link
Contributor Author

brookinsconsulting commented Jun 13, 2015

Hello @andrerom

Our choice to provide for detection before use of mb-string functions was not an accident or entirely our decision.

The existing related code (default template operators which use mb-string functions) already provided / used this convention (although admittedly it did so inconsistently at best, perhaps because as we know today mb-string is a requirement).

In an effort to retain the best BC and code standards we ensured our changes mimicked this existing code convention.

We think these choices are the right choices at this time considering this PR is for eZ Publish Legacy.

What do you think? Do you think this PR is acceptable to merge?

Cheers,
Brookins Consulting

@joaoinacio

This comment has been minimized.

Copy link
Contributor

joaoinacio commented Jun 15, 2015

Hi, just a small nitpick:

The mbstring function detection is done at compile time, and for non-constant elements the function call code - not value - is stored in the template.
So, AFAICT there is a possibility for fatal errors when using templates compiled on a system with the mbstring extension in a system without it.

Given the above, a nitpick would be to perform the detection at run-time too (or as @andrerom noted, consider mbstring an actual requirement?)

@andrerom

This comment has been minimized.

Copy link
Member

andrerom commented Jun 16, 2015

@joaoinacio as @brookinsconsulting mentions, it is already the case several places in legacy, so better that it stays consistent.

@brookinsconsulting

This comment has been minimized.

Copy link
Contributor Author

brookinsconsulting commented Jun 18, 2015

@andrerom

Thank you for your evaluation and affirmation :)

We too think that consistency is important (especially in legacy).

Please let us know if there is anything more we can do to improve these improvements.

Cheers,
Brookins Consulting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment