Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EZP-24292: Extract template operator(s) do not work correctly with utf8 strings #1162

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

brookinsconsulting
Copy link
Contributor

Hello,

We write today to share a PR for a problem which was brought up in the share.ez.no forums: http://share.ez.no/forums/developer/extract-returns-garbage-when-result-contains-special-chars

It seems that we found several template operators which are not utf8 string friendly. This PR started with trying to solve the problems in this file as it applies to the extract template operator.

Once our improvements worked well for that use case, we noticed that other operators powered by this file's code were also negatively affected by this classes limitations and we went ahead and cleaned up the rest of the use cases we could find that needed help.

Since these we require mbstring php extension by default we think this should be no problem, that said we went one step further and made it more dynamic if mbstring functions are not available, https://doc.ez.no/display/EZP/Requirements+5.4

We based the dynamic function usage testing for mbstring php extension functions before using them and providing a non-utf8 compatible fall back off the existing (inconsistant) implementation found in lib/eztemplate/classes/eztemplatestringoperator.php

Minor cs improvements were made as well.

This pull request replaces usage of strlen, strpos, substr and strrpos with mb string replacements in the context of the following template operators.

  • extract
  • extract_left
  • extract_right
  • begins_with
  • ends_with
  • insert
  • remove
  • contains

Worth noting that the reverse operator remains utf8 unfriendly since php by default does not yet contain a native mb_strrev function we would need to implement our own, which I have ignored for now.

Issue ticket: https://jira.ez.no/browse/EZP-24292

Please let us know how this finds you!

Cheers,
Brookins Consulting

@rashidul0405
Copy link

Please add this pull request.

@brookinsconsulting
Copy link
Contributor Author

ping @bdunogier

@bdunogier
Copy link
Member

Ping @yannickroger I think you've worked on related problems, have you not ?

@andrerom
Copy link
Contributor

ext-mbstring is a requirement, so actually it could be swapped instead of having to feature detect this. But not a big deal of course.

@brookinsconsulting
Copy link
Contributor Author

Hello @andrerom

Our choice to provide for detection before use of mb-string functions was not an accident or entirely our decision.

The existing related code (default template operators which use mb-string functions) already provided / used this convention (although admittedly it did so inconsistently at best, perhaps because as we know today mb-string is a requirement).

In an effort to retain the best BC and code standards we ensured our changes mimicked this existing code convention.

We think these choices are the right choices at this time considering this PR is for eZ Publish Legacy.

What do you think? Do you think this PR is acceptable to merge?

Cheers,
Brookins Consulting

@joaoinacio
Copy link

Hi, just a small nitpick:

The mbstring function detection is done at compile time, and for non-constant elements the function call code - not value - is stored in the template.
So, AFAICT there is a possibility for fatal errors when using templates compiled on a system with the mbstring extension in a system without it.

Given the above, a nitpick would be to perform the detection at run-time too (or as @andrerom noted, consider mbstring an actual requirement?)

@andrerom
Copy link
Contributor

@joaoinacio as @brookinsconsulting mentions, it is already the case several places in legacy, so better that it stays consistent.

@brookinsconsulting
Copy link
Contributor Author

@andrerom

Thank you for your evaluation and affirmation :)

We too think that consistency is important (especially in legacy).

Please let us know if there is anything more we can do to improve these improvements.

Cheers,
Brookins Consulting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants