Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sitemap.xls and custom site_icon_meta_tags #362

Closed
vralle opened this issue Sep 21, 2018 · 19 comments
Closed

Sitemap.xls and custom site_icon_meta_tags #362

vralle opened this issue Sep 21, 2018 · 19 comments
Assignees
Milestone

Comments

@vralle
Copy link

vralle commented Sep 21, 2018

Hi!

I use custom site icon meta tags through a filter:

function site_icon_tags()
{
    return array(
        '<link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png?v=20180306">',
        '<link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png?v=20180306">',
        '<link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png?v=20180306">',
        '<link rel="manifest" href="/site.webmanifest?v=20180306">',
        '<link rel="mask-icon" href="/safari-pinned-tab.svg?v=20180306" color="#0f3341">',
        '<link rel="shortcut icon" href="/favicon.ico?v=20180306">',
        '<meta name="msapplication-TileColor" content="#0f3341">',
        '<meta name="theme-color" content="#0f3341">',
    );
}
\add_filter('site_icon_meta_tags', __NAMESPACE__ . '\\site_icon_tags');

And I get style errors for the sitemap. White page of sitemap. Console errors 'Opening and ending tag mismatch' for sitemap.xsl.

It was unexpected to meet my html5 tags in the xml file. Now I've closed the tags in my code snippet, but it's not entirely clear what to do with this in the future

@sybrew
Copy link
Owner

sybrew commented Sep 23, 2018

From https://www.w3schools.com/tags/tag_link.asp:

Differences Between HTML and XHTML

In HTML the tag has no end tag.

In XHTML the tag must be properly closed.

The general consensus is to make code strict (XHTML) in plugins.
However, I do understand that portability mustn't (and aren't always to) be considered when users are adding targeted adjustments.

I'll see if I can filter the tags with some regex to make them behave strictly. This includes attribute quoting.

@sybrew sybrew self-assigned this Sep 23, 2018
@sybrew
Copy link
Owner

sybrew commented Sep 23, 2018

We'll fix this in a future update. It takes too long to test for the upcoming patch.

With this fix, we're going to assume that all items in the filter are meta elements without content.

Here's what I got for grabbing the closing tags:

/<[^>]+?(\/?>)/

Here's what I got for strict attribute quoting, it's slow because we can't repeat-capture nested groups:

/<[^>]+?\/?>/                 # Grab tags first
/(\w+=([^"']+?)(\s|(\/?>)))/  # Grab unquoted elements for each of the tags...

@vralle
Copy link
Author

vralle commented Sep 23, 2018

<[^>]+?(\/?>)

8 matches, 1622 steps (~7ms)

other way: https://regex101.com/r/zgKJUw/1

8 matches, 186 steps (~1ms)

my favorite bullet-proof)
You can use \w+ instead link|meta : https://regex101.com/r/zgKJUw/2

8 matches, 160 steps (~0ms)

@sybrew
Copy link
Owner

sybrew commented Sep 23, 2018

By just removing the ? makes this the fastest one, with just 56 steps in your example:

<[^>]+(\/?>)

@vralle
Copy link
Author

vralle commented Sep 23, 2018

<[^>]+(\/?>)
See what happens if the tag is closed

@sybrew
Copy link
Owner

sybrew commented Sep 23, 2018

Ah, yes, it doesn't grab the closing solidus.

@sybrew
Copy link
Owner

sybrew commented Sep 23, 2018

Keep in mind that we can't filter the outcome of the function without using output buffering (it introduces code smell).

So, we're going to loop over the array. Whatever's faster over a single meta-tag line will be used. Keep maintainability and readability of the code in mind, too.

8 steps on a single line (112 over all those lines):

/<[^\/>]+(?:\/?[^>\/]++)*?(\/?>)/

Thanks for your input, it helped me get to that 😃

@vralle
Copy link
Author

vralle commented Sep 23, 2018

If you want to find the ending of the tag, here: https://regex101.com/r/zgKJUw/3
but this does not solve the issue with quotes of attribute values.
Parsing tags is not an easy task. My search for universal code led to the following:
first look for the contents of the tags, like here: https://regex101.com/r/zgKJUw/2
So we get the tag name and attributes. Attributes can be parsed using Wordpress. Look at how work shortcode_parse_atts and wp_kses_hair.

You can simplify the processing, but in this case, we must understand that we will encounter a dirty code. Obviously, we can not be responsible for the dirty code. But you need to choose between a simple and fast code and a handler that will work even with dirty code.

@sybrew
Copy link
Owner

sybrew commented Sep 23, 2018

I don't think it's necessary to scrutinize and repair every single line. If it's invalid HTML that isn't supposed to be in the <head> tag nor in that meta-tag-only filter, then it's invalid by default and we can't-nor won't-fix that.

We'll only want to convert HTML to XHTML, whatever purpose the tag has.

@vralle
Copy link
Author

vralle commented Sep 23, 2018

The easiest way is to notify users through the interface "the icon tags should be compatible with XHTML Strict, otherwise it will lead to style errors and a white window."
There is always an option to abandon the use of site map styles. I have not seen websites for a long time where this is used by the clients of web resources

@vralle
Copy link
Author

vralle commented Sep 23, 2018

Another way is to remove site_icon_meta_tags filter before output styles and turn filter on after

@sybrew
Copy link
Owner

sybrew commented Sep 24, 2018

It's using a WordPress Core function for consistency. Turning off the filters or adding an option for the off-chance that someone adds non-XHTML code is too much of a chore for both developers and users alike.

Let's just make it work as intended 👍

@vralle
Copy link
Author

vralle commented Sep 24, 2018

Simply looking end of tags https://regex101.com/r/zgKJUw/4
And fail, if unexpected tag.

More: https://stackoverflow.com/a/3524392/2227392

@vralle
Copy link
Author

vralle commented Sep 24, 2018

https://regex101.com/r/zgKJUw/5
This regex finds the whole tag and its parts - the name, attributes, and content, if any. In this version, we can make the tags as needed.
No other ideas as to provide for any scenario

@sybrew
Copy link
Owner

sybrew commented Sep 24, 2018

Brilliant, thanks! That'll help with edge case scenarios too 👍

@vralle
Copy link
Author

vralle commented Sep 24, 2018

in the case of using regex, a rational option may be where the wait for link and meta tags.
In this case, the handler will be simpler: https://regex101.com/r/zgKJUw/7
And the comment in the UI, for example, "Only link and meta tags are allowed."

@sybrew sybrew added this to the 3.1.4 milestone Sep 30, 2018
@sybrew
Copy link
Owner

sybrew commented Oct 8, 2018

There are more restrictions set in XHTML, like only having lowercase elements and attributes names.

For balancing tags, we should use WordPress' force_balance_tags(). This will also close the single tags that should be closed. I think it also lowercases elements.

Alas, this doesn't strictly quote attributes nor lowercase the attribute names.
wp_kses_attr() could help us with this, whilst providing valid XHTML tags and their attributes (aka $allowed_protocols).

Why reinvent the wheel? 😄

@sybrew
Copy link
Owner

sybrew commented Oct 9, 2018

I wasn't able to fix attribute minimization due to time constraints. I must move on to TSF 3.2 ASAP for Gutenberg.

Nevertheless, attribute minimization shouldn't be used in any context regarding meta tags anyway, which is why I feel save introducing the following commit.

It should prevent errors showing in the XSL file when filtering site_icon_meta_tags.

@sybrew sybrew closed this as completed in 49194cd Oct 9, 2018
@vralle
Copy link
Author

vralle commented Oct 10, 2018

Now work perfect. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants