Sitemap.xls and custom site_icon_meta_tags #362

vralle · 2018-09-21T20:38:57Z

Hi!

I use custom site icon meta tags through a filter:

function site_icon_tags()
{
    return array(
        '<link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png?v=20180306">',
        '<link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png?v=20180306">',
        '<link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png?v=20180306">',
        '<link rel="manifest" href="/site.webmanifest?v=20180306">',
        '<link rel="mask-icon" href="/safari-pinned-tab.svg?v=20180306" color="#0f3341">',
        '<link rel="shortcut icon" href="/favicon.ico?v=20180306">',
        '<meta name="msapplication-TileColor" content="#0f3341">',
        '<meta name="theme-color" content="#0f3341">',
    );
}
\add_filter('site_icon_meta_tags', __NAMESPACE__ . '\\site_icon_tags');

And I get style errors for the sitemap. White page of sitemap. Console errors 'Opening and ending tag mismatch' for sitemap.xsl.

It was unexpected to meet my html5 tags in the xml file. Now I've closed the tags in my code snippet, but it's not entirely clear what to do with this in the future

sybrew · 2018-09-23T17:43:33Z

From https://www.w3schools.com/tags/tag_link.asp:

Differences Between HTML and XHTML

In HTML the tag has no end tag.

In XHTML the tag must be properly closed.

The general consensus is to make code strict (XHTML) in plugins.
However, I do understand that portability mustn't (and aren't always to) be considered when users are adding targeted adjustments.

I'll see if I can filter the tags with some regex to make them behave strictly. This includes attribute quoting.

sybrew · 2018-09-23T18:34:29Z

We'll fix this in a future update. It takes too long to test for the upcoming patch.

With this fix, we're going to assume that all items in the filter are meta elements without content.

Here's what I got for grabbing the closing tags:

/<[^>]+?(\/?>)/

Here's what I got for strict attribute quoting, it's slow because we can't repeat-capture nested groups:

/<[^>]+?\/?>/                 # Grab tags first
/(\w+=([^"']+?)(\s|(\/?>)))/  # Grab unquoted elements for each of the tags...

vralle · 2018-09-23T18:52:32Z

<[^>]+?(\/?>)

8 matches, 1622 steps (~7ms)

other way: https://regex101.com/r/zgKJUw/1

8 matches, 186 steps (~1ms)

my favorite bullet-proof)
You can use \w+ instead link|meta : https://regex101.com/r/zgKJUw/2

8 matches, 160 steps (~0ms)

sybrew · 2018-09-23T19:12:08Z

By just removing the ? makes this the fastest one, with just 56 steps in your example:

<[^>]+(\/?>)

vralle · 2018-09-23T19:15:00Z

<[^>]+(\/?>)
See what happens if the tag is closed

sybrew · 2018-09-23T19:18:12Z

Ah, yes, it doesn't grab the closing solidus.

sybrew · 2018-09-23T19:23:20Z

Keep in mind that we can't filter the outcome of the function without using output buffering (it introduces code smell).

So, we're going to loop over the array. Whatever's faster over a single meta-tag line will be used. Keep maintainability and readability of the code in mind, too.

8 steps on a single line (112 over all those lines):

/<[^\/>]+(?:\/?[^>\/]++)*?(\/?>)/

Thanks for your input, it helped me get to that 😃

vralle · 2018-09-23T19:37:19Z

If you want to find the ending of the tag, here: https://regex101.com/r/zgKJUw/3
but this does not solve the issue with quotes of attribute values.
Parsing tags is not an easy task. My search for universal code led to the following:
first look for the contents of the tags, like here: https://regex101.com/r/zgKJUw/2
So we get the tag name and attributes. Attributes can be parsed using Wordpress. Look at how work shortcode_parse_atts and wp_kses_hair.

You can simplify the processing, but in this case, we must understand that we will encounter a dirty code. Obviously, we can not be responsible for the dirty code. But you need to choose between a simple and fast code and a handler that will work even with dirty code.

sybrew · 2018-09-23T19:51:15Z

I don't think it's necessary to scrutinize and repair every single line. If it's invalid HTML that isn't supposed to be in the <head> tag nor in that meta-tag-only filter, then it's invalid by default and we can't-nor won't-fix that.

We'll only want to convert HTML to XHTML, whatever purpose the tag has.

vralle · 2018-09-23T19:59:58Z

The easiest way is to notify users through the interface "the icon tags should be compatible with XHTML Strict, otherwise it will lead to style errors and a white window."
There is always an option to abandon the use of site map styles. I have not seen websites for a long time where this is used by the clients of web resources

vralle · 2018-09-23T20:07:47Z

Another way is to remove site_icon_meta_tags filter before output styles and turn filter on after

sybrew · 2018-09-24T03:05:40Z

It's using a WordPress Core function for consistency. Turning off the filters or adding an option for the off-chance that someone adds non-XHTML code is too much of a chore for both developers and users alike.

Let's just make it work as intended 👍

vralle · 2018-09-24T12:18:34Z

Simply looking end of tags https://regex101.com/r/zgKJUw/4
And fail, if unexpected tag.

More: https://stackoverflow.com/a/3524392/2227392

vralle · 2018-09-24T12:53:39Z

https://regex101.com/r/zgKJUw/5
This regex finds the whole tag and its parts - the name, attributes, and content, if any. In this version, we can make the tags as needed.
No other ideas as to provide for any scenario

sybrew · 2018-09-24T16:06:47Z

Brilliant, thanks! That'll help with edge case scenarios too 👍

vralle · 2018-09-24T16:32:06Z

in the case of using regex, a rational option may be where the wait for link and meta tags.
In this case, the handler will be simpler: https://regex101.com/r/zgKJUw/7
And the comment in the UI, for example, "Only link and meta tags are allowed."

sybrew · 2018-10-08T20:44:20Z

There are more restrictions set in XHTML, like only having lowercase elements and attributes names.

For balancing tags, we should use WordPress' force_balance_tags(). This will also close the single tags that should be closed. I think it also lowercases elements.

Alas, this doesn't strictly quote attributes nor lowercase the attribute names.
wp_kses_attr() could help us with this, whilst providing valid XHTML tags and their attributes (aka $allowed_protocols).

Why reinvent the wheel? 😄

sybrew · 2018-10-09T18:19:50Z

I wasn't able to fix attribute minimization due to time constraints. I must move on to TSF 3.2 ASAP for Gutenberg.

Nevertheless, attribute minimization shouldn't be used in any context regarding meta tags anyway, which is why I feel save introducing the following commit.

It should prevent errors showing in the XSL file when filtering site_icon_meta_tags.

vralle · 2018-10-10T09:07:12Z

Now work perfect. Thanks!

sybrew added the [Type] Bug label Sep 23, 2018

sybrew self-assigned this Sep 23, 2018

sybrew added this to the 3.1.4 milestone Sep 30, 2018

sybrew closed this as completed in 49194cd Oct 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sitemap.xls and custom site_icon_meta_tags #362

Sitemap.xls and custom site_icon_meta_tags #362

vralle commented Sep 21, 2018 •

edited

sybrew commented Sep 23, 2018

Differences Between HTML and XHTML

sybrew commented Sep 23, 2018 •

edited

vralle commented Sep 23, 2018 •

edited

sybrew commented Sep 23, 2018 •

edited

vralle commented Sep 23, 2018

sybrew commented Sep 23, 2018

sybrew commented Sep 23, 2018 •

edited

vralle commented Sep 23, 2018 •

edited

sybrew commented Sep 23, 2018

vralle commented Sep 23, 2018 •

edited

vralle commented Sep 23, 2018 •

edited

sybrew commented Sep 24, 2018

vralle commented Sep 24, 2018

vralle commented Sep 24, 2018 •

edited

sybrew commented Sep 24, 2018

vralle commented Sep 24, 2018 •

edited

sybrew commented Oct 8, 2018

sybrew commented Oct 9, 2018

vralle commented Oct 10, 2018

Sitemap.xls and custom site_icon_meta_tags #362

Sitemap.xls and custom site_icon_meta_tags #362

Comments

vralle commented Sep 21, 2018 • edited

sybrew commented Sep 23, 2018

Differences Between HTML and XHTML

sybrew commented Sep 23, 2018 • edited

vralle commented Sep 23, 2018 • edited

sybrew commented Sep 23, 2018 • edited

vralle commented Sep 23, 2018

sybrew commented Sep 23, 2018

sybrew commented Sep 23, 2018 • edited

vralle commented Sep 23, 2018 • edited

sybrew commented Sep 23, 2018

vralle commented Sep 23, 2018 • edited

vralle commented Sep 23, 2018 • edited

sybrew commented Sep 24, 2018

vralle commented Sep 24, 2018

vralle commented Sep 24, 2018 • edited

sybrew commented Sep 24, 2018

vralle commented Sep 24, 2018 • edited

sybrew commented Oct 8, 2018

sybrew commented Oct 9, 2018

vralle commented Oct 10, 2018

vralle commented Sep 21, 2018 •

edited

sybrew commented Sep 23, 2018 •

edited

vralle commented Sep 23, 2018 •

edited

sybrew commented Sep 23, 2018 •

edited

sybrew commented Sep 23, 2018 •

edited

vralle commented Sep 23, 2018 •

edited

vralle commented Sep 23, 2018 •

edited

vralle commented Sep 23, 2018 •

edited

vralle commented Sep 24, 2018 •

edited

vralle commented Sep 24, 2018 •

edited