Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

!!! FEATURE: Arrays::getValueByPath with escaped dot #1903

Draft
wants to merge 5 commits into
base: 8.2
Choose a base branch
from

Conversation

gerdemann
Copy link
Contributor

It is now possible to escape the dot in the path of Arrays::getValueByPath.
With this you get array keys, which have a dot.

Example:

$foo = ['bar.baz' => 'value'];
$path = 'bar\.baz';

What I did
It is now possible to escape the dot in the path of Arrays::getValueByPath.

How I did it
I have adapted the method Arrays::getValueByPath

How to verify it
I wrote a test for it.

Checklist

  • Code follows the PSR-2 coding style
  • Tests have been created, run and adjusted as needed
  • The PR is created against the lowest maintained branch

@gerdemann gerdemann force-pushed the getvaluebypath-with-escaped-dot branch from e48558a to 7e024b5 Compare January 31, 2020 07:49
@albe albe changed the title Arrays::getValueByPath with escaped dot FEATURE: Arrays::getValueByPath with escaped dot Jan 31, 2020
@albe
Copy link
Member

albe commented Jan 31, 2020

Technically that's a feature, so should go into next release. But then again, escaping a dot might be an expected enough thing that we could relabel it as "BUGFIX"...

Anyway, one thing I'm unsure about:
What if the \ is an expected part of the path, i.e.

$foo = ['bar\\' => ['baz' => 'value']];

I know, looks weird, but assume the key comes from a path traversal and stores directory names on a windows machine... (whole lot of edgy cases).
I think in that case this could be breaky, but more importantly, it would need an "escape escape" hatch, which this doesn't cover yet.

@gerdemann
Copy link
Contributor Author

@albe I have adapted the regular expression so that your example can be escaped. This makes it a breaking change, I think.
I am not a regex expert, if someone has a better solution I would be happy.

@gerdemann gerdemann force-pushed the getvaluebypath-with-escaped-dot branch from 7e024b5 to 7795c5f Compare January 31, 2020 09:21
@kitsunet
Copy link
Member

Why would it be expected? I don't even know why it should be possible in the first place?

This is an application specific syntax expression (aka path syntax) why does it need escaping support?

@gerdemann
Copy link
Contributor Author

With this change it is possible to use the command 'configuration:show' also for e.g. NodeTypes.

./flow configuration:show --type NodeTypes --path Vendor\\Site:My\\NodeType

@kitsunet
Copy link
Member

./flow configuration:show --type NodeTypes --path "Vendor.Site:My.NodeType"

should do fine?

@kitsunet
Copy link
Member

kitsunet commented Jan 31, 2020

Note: No it doesn't, got it now!

@gerdemann
Copy link
Contributor Author

gerdemann commented Jan 31, 2020

It would also be nice to do something like this:

./flow configuration:show --type NodeTypes --path Vendor\\.Site:My\\.NodeType.superTypes.Vendor\\.Site:My\\.OtherNodeType

@kitsunet
Copy link
Member

Shouldn't we then rather think about a nice syntax extension for the scope?

Eg. [Vendor.Site:My.NodeType].properties

Obviously we need to select the separate well! The brackets look nice but might collide with something ofc.

@albe
Copy link
Member

albe commented Jan 31, 2020

@albe I have adapted the regular expression so that your example can be escaped. This makes it a breaking change, I think.
I am not a regex expert, if someone has a better solution I would be happy.

Was just fiddling around with the regex myself and was now stuck at '#(?<!\\\\)((?:\\\\\\\\)*)\\.#', which works for any uneven amount of escape characters but needs to use PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE to keep the even number of backslashes in the split. You'd then need to array_reduce array elements that only consist of backslashes into the previous. Ugh... getting wild quickly. Your solution only works for the one special case, but I guess that's good enough?

Edit: For reference, here's my fiddle with the ugly-ass "generic" solution: https://3v4l.org/pFoER

Copy link
Member

@albe albe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small optimizations

} else {
$path = preg_split('#((?<!\\\\)\\.|(?<=\\\\\\\\)\\.)#', $path);
$path = array_map(
function ($value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Micro-Optimization: this can be a static function ($value)

$path = preg_split('#((?<!\\\\)\\.|(?<=\\\\\\\\)\\.)#', $path);
$path = array_map(
function ($value) {
return str_replace('\\.', '.', str_replace('\\\\', '\\', $value));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be reduced to str_replace(['\\\\', '\\.'], ['\\', '.'], $value)

@gerdemann
Copy link
Contributor Author

As alternative here the version with the brackets:
master...gerdemann:getvaluebypath-with-escaped-dot-2

@gerdemann gerdemann force-pushed the getvaluebypath-with-escaped-dot branch from 7795c5f to ca00646 Compare January 31, 2020 10:36
@kitsunet
Copy link
Member

The biggest gripe I have with both variants right now is that it turns a super simple and fast function that is used all over the place into something substantially more complex. And while the increase in processing time might be minimal the amount it gets used will probably make it tangible.

@kitsunet
Copy link
Member

The path given to utility can already be an array (probably should be split into two functions but separate problem) but I guess I woudl rather like to see the command doing the work and we accept a new argument "pathSegments" or something that is then an array of keys

@gerdemann
Copy link
Contributor Author

Yes, performance is an argument. I would have, under certain circumstances, the functionality not only in the command. What do you think about a second method "getValueByPathString" or something in the Arrays utility?

@albe
Copy link
Member

albe commented Jan 31, 2020

The biggest gripe I have with both variants right now is that it turns a super simple and fast function that is used all over the place into something substantially more complex.

Yep, that's also why I even slightly prefer the "imperfect" solution without the array_reduce.

Guess that's something we can't easily debate or refactor away. Moving the splitting outside the function will strip it of it's core functionality, plus you have to handle all those places where the function is used. I agree though, that the method should be split into two - one that takes a pathSegments array and one that takes a path string and calls the former.

So it comes down to a general decision/trade-off of "allowing dots in array keys" vs. code complexity+performance

@kitsunet
Copy link
Member

IDK, for me the nodetypes are a bit special, do we have any other place where we need this functionality?
it wa sbuild for simple paths like settings and that's what it's great for.

@gerdemann
Copy link
Contributor Author

Then it shouldn't be in a general arrays utility, should it? I use it often with other arrays as well.

@kitsunet
Copy link
Member

I mean, it fullfills a specific task and that it does fast, but yeah we should certainly talk about that.

@albe
Copy link
Member

albe commented Feb 27, 2020

So, how do we proceed? @kitsunet

@albe albe added this to In progress in Neos 5.2 & Flow 6.2 Release Board via automation Mar 5, 2020
@albe
Copy link
Member

albe commented Apr 1, 2020

Not quite sure what happened to this PR, but it somehow got rebased (?) and a whole lot of other commits pulled in :/ Would like to go over this again tomorrow and see if we can make it mergeable then @kitsunet or rather postpone to 6.3

@github-actions github-actions bot force-pushed the getvaluebypath-with-escaped-dot branch from b949ec1 to d527bc0 Compare April 1, 2020 23:41
@gerdemann
Copy link
Contributor Author

@albe I only see one commit, is there anything I can do to help?

@albe
Copy link
Member

albe commented Apr 2, 2020

D'oh - has this resolved itself? :D Thanks!

This bot is running rouge...

@kitsunet
Copy link
Member

I still feel this might result in a possibly drastic performance change

So a benchmark for the added strpos call would clear this up?

See I tend to overlook that part :D I guess having 50k calls to strpos should be fine...?

@mhsdesign
Copy link
Member

even though we have the strpos inplace, I don't particularly like overloading the code for an edgecase like that....
this function is mainly used for cli arguments, and I think we should just have another function for it.
(I agree that we need to support this functionality somehow, as it is no fun to write such regex by yourself ( only for me 😂))

@kdambekalns
Copy link
Member

https://3v4l.org/Y37ZN#v8.1.6

@mhsdesign
Copy link
Member

just tested it, the regex, that i suggested takes less steps and is bullet proof too.

(?:\\.)(*SKIP)(*FAIL)|\.

vs: (?<!\\)((?:\\\\)*)\.

@mhsdesign
Copy link
Member

As i see it its not important where we implement it: may it be directly in Arrays::getValueByPath or some other place.

Sure it would be nice if this function can handle it as feature in the future but much more important - at least as i find - is that we have a way to use ./flow configuration:show with dots inside paths as bugfix. For that case i would be fine with inlining the regex and refactor it later.

For now you can add the above into the Neos\Flow\Command\ConfigurationCommandController->showCommand

Packages/Framework/Neos.Flow/Classes/Command/ConfigurationCommandController.php:75

- $configuration = Arrays::getValueByPath($configuration, $path);
+ $configuration = Arrays::getValueByPath(
+     $configuration,
+     array_map('stripslashes', preg_split('#(?:\\\.)(*SKIP)(*FAIL)|\\.#', $path))
+ );

use it like:

./flow configuration:show --type Policy --path "roles.Neos\.Flow:Everybody"

(ps. dont miss the qoutes around the path, otherwise you need to escape the backslash)

@bwaidelich
Copy link
Member

much more important [...] is that we have a way to use ./flow configuration:show with dots inside paths as bugfix

I think, that's a great compromise for an otherwise rather profound change!

@mhsdesign
Copy link
Member

I will prepare a bugfix-pr for that

@mhsdesign mhsdesign self-assigned this Jul 16, 2022
@kitsunet
Copy link
Member

Did the bugfix ever happen, as it's not referenced in here?

Copy link
Member

@kitsunet kitsunet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By now I am less sceptical about this, so if we still want, lets do it.

@mhsdesign
Copy link
Member

Did the bugfix ever happen, as it's not referenced in here?

Haha caught, no I never did that xD … yet …

@mhsdesign
Copy link
Member

mhsdesign commented Feb 13, 2023

@kitsunet there was lately a discussion in slack about something similar and a lot of pain involved with escaping things. Since escaping multiple context is always hard to grasp (like when using ddev -> shell -> flow or worse when building up escaped commands in yaml) … I just want to make sure we make the right choice!

After thinking about it i think i like your idea better: #1903 (comment)

@mhsdesign mhsdesign changed the title FEATURE: Arrays::getValueByPath with escaped dot !!! FEATURE: Arrays::getValueByPath with escaped dot Feb 13, 2023
@mhsdesign
Copy link
Member

i think this is so much better

ddev flow configuration:show --type NodeTypes --path [Vendor.Site:My.NodeType].properties

than

ddev flow configuration:show --type NodeTypes --path '"Vendor\.Site:My.NodeType.properties"'
# or
ddev flow configuration:show --type NodeTypes --path 'Neos\\.Neos:Document'

(note due to ddev we need another escaping layer in between, and its common to have to double escape stuff and hard to wrap the head around.)
Of course, when using the [ as a special char we need to allow escaping it, which will make things more complex but as this is an edgecase (i assume) i think its fine.

Btw its a breaking change ... technically ...

Copy link
Member

@mhsdesign mhsdesign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1903 (comment)

aaand target 8.3

@mhsdesign
Copy link
Member

<?php
function parseInputString(string $input): array
{
    $result = [];
    $current = '';
    $len = \strlen($input);

    for ($i = 0; $i < $len; $i++) {
        $char = $input[$i];

        switch ($char) {
            case "[":
                if ($current !== "") {
                    $result[] = $current;
                    $current = "";
                }
                for ($i++; $i < $len; $i++) {
                    $char = $input[$i];
                    switch ($char) {
                        case "[":
                            throw new \Exception(sprintf('Unexpected char "%s" at position "%s" in "%s"', $char, $i, $input));
                        case "]":
                            if ($current !== "") {
                                $result[] = $current;
                                $current = "";
                            }
                            $nextChar = $input[$i + 1] ?? "";
                            if ($nextChar !== "" && $nextChar !== ".") {
                                throw new Exception('Missing dot after closing');
                            }

                            break 2;
                        default:
                            $current .= $char;
                    }
                }
                break;
            case "]":
                throw new \Exception(sprintf('Unexpected char "%s" at position "%s" in "%s"', $char, $i, $input));
            case ".":
                if ($current !== "") {
                    $result[] = $current;
                    $current = "";
                }
                break;
            default:
                $current .= $char;
        }
    }

    if ($current !== "") {
        $result[] = $current;
    }

    return $result;
}

$input = 'fooo.bar[dots.are.are.lowed].something.else';
$output = parseInputString($input);
print_r($output);

i will implement this in the config:show command as bugfix ;)

fooo.bar[dots.are.are.lowed].something.else

Array
(
    [0] => fooo
    [1] => bar
    [2] => dots.are.are.lowed
    [3] => something
    [4] => else
)

@mhsdesign
Copy link
Member

#2973

grüße aus dreden

@mhsdesign
Copy link
Member

it was discussed on the dd sprint 23 #2973 (comment) that we want to use json for complex cli arguments (including dot escaping) and use Array.getValueByPath on php site better with actual arrays instead of paths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature I: Discussion Stale Issues and PRs with no recent activity, about to be closed soon.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants