Skip to content

Conversation

@mlocati
Copy link
Member

@mlocati mlocati commented Feb 16, 2016

Let's consider this source code:

<div>
    <p><?php __('plain'); ?></p>
    <p><?php __('DATE \a\t TIME'); ?></p>
    <p><?php __("DATE \a\\t TIME"); ?></p>
    <p><?php __("DATE \\a\\t TIME"); ?></p>
    <p><?php __("FIELD\tFIELD"); ?></p>
</div>

Here we have only 3 different strings, but the php extractor finds these strings:

msgid "plain"
msgstr ""

msgid "DATE \\a\\t TIME"
msgstr ""

msgid "DATE \\a\\\\t TIME"
msgstr ""

msgid "DATE \\\\a\\\\t TIME"
msgstr ""

msgid "FIELD\\tFIELD"
msgstr ""

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

Here's the output of xgettext --language=PHP --keyword=__ special-chars.php:

msgid "plain"
msgstr ""

msgid "DATE \\a\\t TIME"
msgstr ""

msgid "FIELD\tFIELD"
msgstr ""

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

I know you may have some reticence about what I'm going to write, but the best approach IMHO would be to replace https://github.com/oscarotero/Gettext/blob/e93f74356280016bd50c41c62cc858c3005791e7/src/Utils/PhpFunctionsScanner.php#L42-L50 with:

$bufferFunctions[0][2][] = eval('return '.$value[1].';');

Contrary to what 99% of PHP developers think, eval is not evil. You only need to know when to use it, and this is the case: $value[1] is for sure a PHP-encoded string (ie T_CONSTANT_ENCAPSED_STRING).

The only alternative I can see is to manually unescape the string, differentiating the behaviour if the string is single quoted (we only need to look for \ and ') or double quoted string (we'll have to look for all the cases reported here).

@oscarotero
Copy link
Member

Yes, you're right: I'm reticent :D

If the string had other evaluable elements, like a variable: "hello $world" or "hello {$object->getWorld()}" they would be executed.

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

So, shall we parse the string char by char?

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

What about something like this:

function decodeString($value)
{
    $result = '';
    if ($value[0] === "'" || strpos($value, '$') === false) {
        $result = eval("return $value;");
    } else {
        // Manual and slow way of parse $value
    }

    return $result;
}

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

Here's the full code that should work:

function decodeString($value)
{
    static $simpleDictionary;
    $result = '';
    if ($value[0] === "'" || strpos($value, '$') === false) {
        $result = eval("return $value;");
    } else {
        $value = substr($value, 1, -1);
        if (!isset($simpleDictionary)) {
            $simpleDecodeDictionary = array(
                '\\' => "\\",
                '$' => '$',
                '"' => '"',
            );
        }
        while (($p = strpos($value, '\\')) !== false) {
            if (!isset($value[$p + 1])) {
                break;
            }
            if ($p > 0) {
                $result .= substr($value, 0, $p);
            }
            $value = substr($value, $p + 1);
            if (isset($simpleDictionary[$value[0]])) {
                $result .= $simpleDictionary[$value[0]];
                $value = substr($value, 1);
            } elseif (preg_match('/^([a-z0-9{}]+)/', $value, $m)) {
                $result .= eval('return "\\'.$m[1].'";');
                $value = substr($value, strlen($m[1]));
            } else {
                $result .= '\\';
            }
        }
        $result .= $value;
    }

    return $result;
}

EDIT: I updated the preg_match, to avoid future unsupported escape sequences (like \u{[0-9a-f]{1,6}} for PHP 7+

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

If it's ok I can update this pull request accordingly

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

Even simpler:

function decodeString($value)
{
    $result = '';
    if ($value[0] === "'" || strpos($value, '$') === false) {
        $result = eval("return @$value;");
    } else {
        $value = substr($value, 1, -1);
        while (($p = strpos($value, '\\')) !== false) {
            if (!isset($value[$p + 1])) {
                break;
            }
            if ($p > 0) {
                $result .= substr($value, 0, $p);
            }
            $value = substr($value, $p + 1);
            if (preg_match('/^([\\$"]|[a-z0-9{}]+)/', $value, $m)) {
                $result .= eval('return "\\'.$m[1].'";');
                $value = substr($value, strlen($m[1]));
            } else {
                $result .= '\\';
            }
        }
        $result .= $value;
    }

    return $result;
}

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

...and without preg_match, for lightning fast execution:

function decodeString($value)
{
    $result = '';
    if ($value[0] === "'" || strpos($value, '$') === false) {
        $result = eval("return @$value;");
    } else {
        $value = substr($value, 1, -1);
        while (($p = strpos($value, '\\')) !== false) {
            if (!isset($value[$p + 1])) {
                break;
            }
            if ($p > 0) {
                $result .= substr($value, 0, $p);
            }
            $value = substr($value, $p + 1);
            $p = strpos($value, '$');
            if ($p === false) {
                $result .= eval('return "\\'.$value.'";');
                $value = '';
                break;
            }
            if ($p === 0) {
                $result .= '$';
                $value = substr($value, 1);
            }
            else {
                $result .= eval('return "\\'.substr($value, 0, $p).'";');
                $value = substr($value, $p);
            }
        }
        $result .= $value;
    }

    return $result;
}

@oscarotero
Copy link
Member

I think it's a bit overcomplicated. Why not use an unique solution for all cases? For example (not tested):

function fix ($value) {
    if (strpos($value, '\\') !== false) {
        if ($value[0] === '"') {
            $value = preg_replace('/[^\\\]\\\([^nrtvefxu0-7\$\\"])/', '\\\\$1', $value);
        } else {
            $value = preg_replace("/[^\\\]\\\([^'])/", '\\\\\1', $value);
        }
    }

    return substr($value, 1, -1);
}

Eval is going to be used in very few cases, so I don't see any value using two solutions for the same problem.

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

think it's a bit overcomplicated. Why not use an unique solution for all cases? For example (not tested):

It's broken: see https://3v4l.org/WqGos

Eval is going to be used in very few cases, so I don't see any value using two solutions for the same problem.

Well, I'd use it for all the strings not enclosed in " or not containing $, and I think they are the 99%...

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

See the results from the approach of mine: https://3v4l.org/tQgip

@oscarotero
Copy link
Member

Ok, it's a bit hard to fix this with regular expressions (at least for me).
I accept your method, but if the string not contain a backslash, avoid to use eval, please. A simple substr is faster.

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

Done... let's see what Trafis says 😉

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

If this pull request will be merged, I have already another fix (the po export removes special chars like "\t" and could mess up multibyte strings)

@oscarotero
Copy link
Member

Ah, ok, thank you. (sorry for the late, I'm currently a bit bussy)

oscarotero added a commit that referenced this pull request Feb 16, 2016
Add test for some PHP special chars
@oscarotero oscarotero merged commit ec19a63 into php-gettext:master Feb 16, 2016
@mlocati mlocati deleted the specia-php-chars branch February 16, 2016 16:53
@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

See #92

@oscarotero
Copy link
Member

Seems like sensiolab does not like eval https://insight.sensiolabs.com/projects/496dc2a6-43be-4046-a283-f8370239dd47/analyses/23 😞

@mlocati
Copy link
Member Author

mlocati commented Feb 16, 2016

I'm open to any valid alternatives... As I said, 99% of developers consider eval=evil, but I really don't understand why...
It's as dangerous as a dB query: if you let dangerous strings to be executed is bad, but it's not a valid reason to don't execute queries at all...

@oscarotero
Copy link
Member

Yes, in this case I agree that it's not dangerous. But having those critical security errors is not nice.
Let's use eval() until we find a better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants