Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing invalid "<style=" coding with style tag has fatal consequences with fix-style-tags enabled #1000

Open
idw-git opened this issue Sep 25, 2021 · 0 comments

Comments

@idw-git
Copy link

idw-git commented Sep 25, 2021

Hello everyone,

I'm a new member here but a long time user of (Lib)Tidy and believe the following to be an issue of interest for further inspection:

I've recently been submitted real world samples of script generated HTML mails containing the following coding:

<style="background:#f3f4f5; font:16px Arial, sans-serif; color:#4a4a4a">

Sending this through Tidy (5.6 and 5.8) "as is" (the message announced it as <!DOCTYPE html>) resulted in a blank screen being displayed since Tidy was trying to move almost all of the HTML body into the head section due to apparently confusing the above coding as an unclosed <style> section following the fix-style-tags option. Using the "doctype loose" option and disabling fix-style-tags would prevent this from occuring.

Although I'm not a C developer I managed to (re)compile the Tidy sources using a free version of VC 10 and prevent this from happening by inserting a tiny work-around into lexer.c's GetTokenFromStream() implementation of the "case LEX_STARTTAG" section":

            /* parse attributes, consuming closing ">" */
            if (c != '>')
            {
                if (c == '/')
                    TY_(UngetChar)(c, doc->docIn);

                attributes = ParseAttrs( doc, &isempty );
            }

            /* ignore "pseudo tags" with trailing equal signs */
            if (isempty **|| c == '='**)
                lexer->token->type = StartEndTag;

            **if (c == '=')
                lexer->token->attributes = NULL;
            else** lexer->token->attributes = attributes;

            lexer->lexsize = lexer->txtend = lexer->txtstart;

[well, tried to use the "bold" style for emphasis but it doesn't work as expected, hence the unexpected asterisks in the code]

This is definitely only a workaround as it will just silently drop the faulty code without comment and - since I'm not familiar enough with Tidy's further processing - may have unwanted side effects in other places or on other code. IOW: I would appreciate someone to pick up this issue and deal with it in a more professional way than I am capable of. Please let me know if I can be of further assistance (like provide the full sample, e.g.).

Thanks in advance

				Michael
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant