Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#2867, ACL processing may fail with utf-8 characters which include byte 'A0'. #395

Merged
merged 7 commits into from
Oct 28, 2013

Conversation

Chris--S
Copy link
Collaborator

More detailed discussion and explanations at https://bugs.dokuwiki.org/index.php?do=details&task_id=2867

Summary.

Under some circumstances '\s' may match bytes 'A0' & '85'. These bytes can occur as part of non-space utf8 characters. Conversely, '\S' may not match these bytes for the same reason.

@Chris--S
Copy link
Collaborator Author

More unit tests to come.

@splitbrain
Copy link
Collaborator

I wonder if we have this problem in other files as well confutils comes to mind. Should we grep the sources for \s and \S?

@splitbrain
Copy link
Collaborator

@Chris--S what's the status here? Can I merge this as is?

@Chris--S
Copy link
Collaborator Author

Problem fully understood. Test written. So yes, you can now.

Underlying problem, the characters which '/\s/' will match can vary depending on the locale. (see setlocale() & LC_CTYPE [1]).

Outside 3rd party libraries, there aren't so many uses of '\s'. In some of these uses the object string may not be able to contain non-ascii characters or the match must happen before they could occur (e.g. config files). Though, that is probably not the most robust of assumptions.

Three solutions:

  • force locale to remove any problem LC_CTYPE, e.g. setlocale(LC_ALL,"C"); Unfortunately locale is a per process property, so it can be changed by another php script running its own setlocale() and conversely DokuWiki changing this value may upset other scripts.
  • use 'u' flag. performance may have improved since php5.2.9, but it is still worse than not using 'u'. The character classes and metaclasses are all noted in PCRE docs as using a less efficient comparison mechanism when in UCP mode.[2]
  • [ \t] or [ \t\r\n].

[1] - http://php.net/manual/en/function.setlocale.php
[2] - Matching characters by Unicode property is not fast, because PCRE has to do a multistage table lookup in order to find a character's property.

splitbrain added a commit that referenced this pull request Oct 28, 2013
FS#2867, ACL processing may fail with utf-8 characters which include byte 'A0'.
@splitbrain splitbrain merged commit 1a3aae1 into master Oct 28, 2013
@splitbrain splitbrain deleted the FS#2867 branch October 28, 2013 11:27
splitbrain added a commit that referenced this pull request Apr 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants