FS#2867, ACL processing may fail with utf-8 characters which include byte 'A0'. #395

Chris--S · 2013-10-21T22:39:15Z

More detailed discussion and explanations at https://bugs.dokuwiki.org/index.php?do=details&task_id=2867

Summary.

Under some circumstances '\s' may match bytes 'A0' & '85'. These bytes can occur as part of non-space utf8 characters. Conversely, '\S' may not match these bytes for the same reason.

Chris--S · 2013-10-21T22:39:37Z

More unit tests to come.

splitbrain · 2013-10-22T07:02:07Z

I wonder if we have this problem in other files as well confutils comes to mind. Should we grep the sources for \s and \S?

splitbrain · 2013-10-25T06:21:51Z

@Chris--S what's the status here? Can I merge this as is?

Chris--S · 2013-10-25T12:55:19Z

Problem fully understood. Test written. So yes, you can now.

Underlying problem, the characters which '/\s/' will match can vary depending on the locale. (see setlocale() & LC_CTYPE [1]).

Outside 3rd party libraries, there aren't so many uses of '\s'. In some of these uses the object string may not be able to contain non-ascii characters or the match must happen before they could occur (e.g. config files). Though, that is probably not the most robust of assumptions.

Three solutions:

force locale to remove any problem LC_CTYPE, e.g. setlocale(LC_ALL,"C"); Unfortunately locale is a per process property, so it can be changed by another php script running its own setlocale() and conversely DokuWiki changing this value may upset other scripts.
use 'u' flag. performance may have improved since php5.2.9, but it is still worse than not using 'u'. The character classes and metaclasses are all noted in PCRE docs as using a less efficient comparison mechanism when in UCP mode.[2]
[ \t] or [ \t\r\n].

[1] - http://php.net/manual/en/function.setlocale.php
[2] - Matching characters by Unicode property is not fast, because PCRE has to do a multistage table lookup in order to find a character's property.

… the locale

FS#2867, ACL processing may fail with utf-8 characters which include byte 'A0'.

Fix PHP notices

Chris--S added 4 commits October 19, 2013 18:24

add tests for usernames with 2 & 3 byte utf8 characters

698e7df

Merge branch 'master' into FS#2867

4b94edc

Merge branch 'master' into FS#2867

fa457f5

replace \s, \S with [ \t], [^ \t] in regexs used with acls

21c3090

unittests for auth_loadACL

1f6e92f

Chris--S added 2 commits October 25, 2013 14:42

skip FS#2867 test if \s doesn't match \xA0 after attempting to change…

0113757

… the locale

ensure locale is set back to the original value

30eae85

splitbrain added a commit that referenced this pull request Oct 28, 2013

Merge pull request #395 from splitbrain/FS#2867

1a3aae1

FS#2867, ACL processing may fail with utf-8 characters which include byte 'A0'.

splitbrain merged commit 1a3aae1 into master Oct 28, 2013

splitbrain deleted the FS#2867 branch October 28, 2013 11:27

splitbrain added a commit that referenced this pull request Apr 9, 2020

Merge pull request #395 from cosmocode/fixNotices

88c5033

Fix PHP notices

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FS#2867, ACL processing may fail with utf-8 characters which include byte 'A0'. #395

FS#2867, ACL processing may fail with utf-8 characters which include byte 'A0'. #395

Chris--S commented Oct 21, 2013

Chris--S commented Oct 21, 2013

splitbrain commented Oct 22, 2013

splitbrain commented Oct 25, 2013

Chris--S commented Oct 25, 2013

FS#2867, ACL processing may fail with utf-8 characters which include byte 'A0'. #395

FS#2867, ACL processing may fail with utf-8 characters which include byte 'A0'. #395

Conversation

Chris--S commented Oct 21, 2013

Chris--S commented Oct 21, 2013

splitbrain commented Oct 22, 2013

splitbrain commented Oct 25, 2013

Chris--S commented Oct 25, 2013