Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: FEATURE: Add String.urlize Eel helper #1861

Open
wants to merge 5 commits into
base: master
from

Conversation

@jonnitto
Copy link
Member

jonnitto commented Nov 29, 2019

This Eel helper is great to create id names from headlines (for example) or to create from any type of name CSS classes

Resolves: #1865

@Sebobo

This comment has been minimized.

Copy link
Member

Sebobo commented Nov 29, 2019

Love the feature!

@davidspiola davidspiola added this to In progress in Neos 5.1 & Flow 6.1 Release Board via automation Nov 29, 2019
Copy link
Member

bwaidelich left a comment

Looks good (by reading)

Neos 5.1 & Flow 6.1 Release Board automation moved this from In progress to Reviewer approved Nov 30, 2019
@bwaidelich bwaidelich moved this from Reviewer approved to Reviews needed in Neos 5.1 & Flow 6.1 Release Board Nov 30, 2019
@bwaidelich

This comment has been minimized.

Copy link
Member

bwaidelich commented Nov 30, 2019

I added an issue for this in order to simplify release management and to collect higher-level details for the release notes: #1865

@bwaidelich bwaidelich removed this from Reviews needed in Neos 5.1 & Flow 6.1 Release Board Nov 30, 2019
@albe

This comment has been minimized.

Copy link
Member

albe commented Nov 30, 2019

Generally yes, but some questions:

  • why was Behat/Transliterator chosen instead of e.g. php-intl + Transliterator::transliterate("ASCII//TRANSLIT//IGNORE", $string) (https://www.php.net/manual/en/transliterator.transliterate.php)? Not saying it's better, just want to have a rationale for the decision to pull in that dependency
  • what about other non-URL characters or characters that have special meaning, like e.g. "%", "?" or "`" (see https://www.ietf.org/rfc/rfc3986.txt 2.2 + 2.3) - shouldn't those be stripped or at least be urlencoded?
@bwaidelich

This comment has been minimized.

Copy link
Member

bwaidelich commented Nov 30, 2019

Good remarks by @albe (as usual l). We should make sure to get this right from the start and not to introduce unwanted dependencies as ut will be painful to change this lateron

Copy link
Member

daniellienert left a comment

We had the need for such an eelhelper some time ago. I deciced to built a separate package back then (https://github.com/punktDe/eel-transliterator) which also makes use of the behat transliterator package, because I also think that we should avoid adding external dependencies for just a single eelhelper.
If there are suitable PHP built-in methods, I would rather use these. If not, I would keep it in an additional eelhelper package.

@jonnitto jonnitto requested review from bwaidelich, daniellienert and albe Dec 2, 2019
jonnitto added 2 commits Dec 2, 2019
@jonnitto

This comment has been minimized.

Copy link
Member Author

jonnitto commented Dec 2, 2019

If this comes to a merge, please squash it 😉

Copy link
Member

bwaidelich left a comment

Sorry, I think my first approval was done a bit hastily.
Since we can't easily change this, we should really get it "right" from the start.

Also, I don't think that we need to introduce the Behat dependency and can use the \Transliterator instead:

            $transliterator = \Transliterator::createFromRules(
                '$AE = [Ä {A \u0308}];'
                . '$OE = [Ö {O \u0308}];'
                . '$UE = [Ü {U \u0308}];'
                . '[ä {a \u0308}] > ae;'
                . '[ö {o \u0308}] > oe;'
                . '[ü {u \u0308}] > ue;'
                . '$AE } [:Lowercase:] > Ae;'
                . '$OE } [:Lowercase:] > Oe;'
                . '$UE } [:Lowercase:] > Ue;'
                . '$AE > AE;'
                . '$OE > OE;'
                . '$UE > UE;'
                . '\/ > \-;'
                . '::Latin-ASCII;'
                . '::Any-Upper;'
                . '::[[:Punctuation:] - [\-]] Remove;'
            );
            return strtolower(str_replace(' ', '', $transliterator->transliterate($string));

(something along those lines)

@jonnitto

This comment has been minimized.

Copy link
Member Author

jonnitto commented Dec 2, 2019

It would also need '::Any-Latin;' before '::Latin-ASCII;', but with this, € and $ get not changed…

@jonnitto jonnitto requested a review from bwaidelich Dec 2, 2019
Copy link
Member

bwaidelich left a comment

Two more mini comments.
But apart from that I'm still not perfectly sure about the actual replacement rules.
For example: Instead of removing underscores it might be preferable to replace them with a dash.
Was that behavior taken from some other project or can we rely on some "official" document on how to do these things?

];
$string = strtr($string, $specialChars);
$string = str_replace('_', ' ', $string);
$string = transliterator_transliterate('Any-Latin; Latin-ASCII; [\u0100-\u7fff] remove; [:Punctuation:] Remove; Lower()', $string);

This comment has been minimized.

Copy link
@bwaidelich

bwaidelich Dec 2, 2019

Member

Oh, nice!
Just wondering: is there a specific reason to use procedural style here?
As a little performance optimization I would suggest to go for sth like:

    /**
     * @var \Transliterator
     */
    private static $transliterator;

    private function getTransliterator(): \Transliterator
    {
        if (self::$transliterator === null) {
            self::$transliterator = \Transliterator::createFromRules('...');
        }
        return self::$transliterator;
    }

    // ...

    public function urlize(string $string): string
    {
        $string = $this->getTransliterator()->transliterate($string);
        // ...
'å' => 'aa'
];
$string = strtr($string, $specialChars);
$string = str_replace('_', ' ', $string);

This comment has been minimized.

Copy link
@bwaidelich

bwaidelich Dec 2, 2019

Member

Why that?

$string = strtr($string, $specialChars);
$string = str_replace('_', ' ', $string);
$string = transliterator_transliterate('Any-Latin; Latin-ASCII; [\u0100-\u7fff] remove; [:Punctuation:] Remove; Lower()', $string);
$string = preg_replace('/[-\s]+/', '-', $string);

This comment has been minimized.

Copy link
@bwaidelich

bwaidelich Dec 2, 2019

Member

Do we really need another regex replacement here?
Can't we just extend the $specialChars array above?

@jonnitto

This comment has been minimized.

Copy link
Member Author

jonnitto commented Dec 2, 2019

One point about behat/transliterator: I know, it would be a new dependency for the Eel package, but not for Neos: It uses the same library for the generation for the uriPathSegment.

@bwaidelich

This comment has been minimized.

Copy link
Member

bwaidelich commented Dec 2, 2019

I know, it would be a new dependency for the Eel package, but not for Neos

You're right, Neos has a dependency declared to that package (I wasn't aware of that). But since Eel can be used with Flow alone, it would still add a new dependency.
Do you think the Behat transliterator is more capable than the built-in one?

@jonnitto

This comment has been minimized.

Copy link
Member Author

jonnitto commented Dec 2, 2019

At least it would do the same, and all the special characters would be handled correctly. I mean, I would love to have a dependency-less solution, but when I look at the code from the transliterator package, I don't know If we can do it with these few lines. Currently, It works pretty well (I would still have to figure out your optimization), but If we decide not to include the behat package, I would prefer to wait a bit and release it with Neos 5.2. Otherwise, I would have to revert it and then there would be no risk to merge it for Neos 5.1

@bwaidelich

This comment has been minimized.

Copy link
Member

bwaidelich commented Dec 2, 2019

@jonnitto good points, I agree to what you wrote!

@jonnitto

This comment has been minimized.

Copy link
Member Author

jonnitto commented Dec 3, 2019

At least it would do the same, and all the special characters would be handled correctly. I mean, I would love to have a dependency-less solution, but when I look at the code from the transliterator package, I don't know If we can do it with these few lines. Currently, It works pretty well (I would still have to figure out your optimization), but If we decide not to include the behat package, I would prefer to wait a bit and release it with Neos 5.2. Otherwise, I would have to revert it and then there would be no risk to merge it for Neos 5.1

So @albe, @bwaidelich, @Sebobo, and @daniellienert: Which option we want to choose?

  • 👍 for adding behat
  • 👎 for writing our own function and wait for 5.2
  • 👀 for if you are happy with both solutions

Note: If we have one 👎, we will wait for 5.2

@bwaidelich

This comment has been minimized.

Copy link
Member

bwaidelich commented Dec 3, 2019

I think this is a great addition and I don't want to step on your toes, but my tendency is towards postponing because:

  • Once decided we'll have to "live" with the decision. For most other Eel helpers the behavior is pretty obvious, but this one is rather special
  • As long as this feature is not in the core it's relatively easy to ship in in your own site package, thus the pain of postponing this is not too great.

But I won't block this if you guys decide otherwise

public function urlize(string $string): string
{
$specialChars = [

This comment has been minimized.

Copy link
@kitsunet

kitsunet Dec 3, 2019

Member

Why do we do this conversion here and then transliterate, that seems wrong? Shouldn't the Transliteration take care of these?

This comment has been minimized.

Copy link
@bwaidelich

This comment has been minimized.

Copy link
@jonnitto

jonnitto Dec 3, 2019

Author Member

Nope. The transliteration (with this config) doesn't take care of these chars. But as I said: If we decide not to implement behat, I will further dive deeper into this and optimize the code 😉

This comment has been minimized.

Copy link
@kitsunet

kitsunet Dec 4, 2019

Member

Wait, what do you mean it doesn't take care of that? It doesn't transliterate umlauts to something else?

This comment has been minimized.

Copy link
@bwaidelich

bwaidelich Dec 4, 2019

Member

I think the example config I posted didn't cover special characters like "€". They could all be added (see http://userguide.icu-project.org/transforms/general).
But the question still remains what rules to apply.. Maybe there's some official mapping for all disallowed URL path characters

This comment has been minimized.

Copy link
@kitsunet

kitsunet Dec 4, 2019

Member

Weeell, that's the issue with naming, right. If you want to build a URL with special characters I would argue you urlencode instead of doing some transliteration ;) So whatever is allowed or not there doesn't really matter. This is more of a asciilize I guess? IDK. The problem with transliteration is that there is no universal solution because you loose information. The only way to retain at least part of the information of the original string would be to know it's langauge because the "manual" mapping we do there might be right "for us" but not necessarily for someone else. So either this needs to be given a source language to transliterate for OR must follow basic asciifolding rules, aka ü => u, ú => u etc.

This comment has been minimized.

Copy link
@bwaidelich

bwaidelich Dec 5, 2019

Member

True that. "disallowed" wasn't the right word.
But there gotta be some standards that define mappings like ü => ue and similar.. I hoped..

This comment has been minimized.

Copy link
@kitsunet

kitsunet Dec 9, 2019

Member

Just to quote from some other text:

In German the ¨ (umlaut) is always transliterated as -e, while in Swedish the diacritic is simply dropped.

There is no universal rule, most diacritics are transliterated differently depending on their source language, so unless you give that, only a "flattening" transliteration is sensible.

This comment has been minimized.

Copy link
@bwaidelich

bwaidelich Dec 10, 2019

Member

Yes, that's a good call.. It could mean that a helper expects an (optional?) locale or preset but maybe that takes it too far

@jonnitto jonnitto changed the title FEATURE: Add String.urlize Eel helper WIP: FEATURE: Add String.urlize Eel helper Dec 3, 2019
@jonnitto

This comment has been minimized.

Copy link
Member Author

jonnitto commented Dec 3, 2019

I think it is best to skip this feature for this release and look at what we can do for 5.2

@albe

This comment has been minimized.

Copy link
Member

albe commented Dec 3, 2019

Yes, let's skip and more closer inspect if we can replace the dependency with Transliterator in a way that doesn't require writing (and maintaining) a couple dozen or more lines of supporting code. If we do find, that the best solution is to pull in behat/transliterate, then that's fine and no issue if done in 5.2

@kitsunet

This comment has been minimized.

Copy link
Member

kitsunet commented Dec 10, 2019

General comment on the whole behat translit... Just look at Utility::renderValidNodeName() in CR which uses behat transliterator already. I think back in the day we had way more troubles with php-intl (which was used there before) than we ever had with behat translit and I guess it could be nice to use it in routing of custom object properties as well...

@bwaidelich

This comment has been minimized.

Copy link
Member

bwaidelich commented Dec 10, 2019

I think back in the day we had way more troubles with php-intl [...] than we ever had with behat translit

I guess it has improved quite a bit then. We're using the \Transliterator in a project and it is very reliable and much faster than a PHP implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.