Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidynode.props.attribute is missing "Boolean Attributes" and empty attributes. #12980

Closed
whitehorsesupport opened this issue Dec 20, 2023 · 9 comments

Comments

@whitehorsesupport
Copy link

whitehorsesupport commented Dec 20, 2023

Description

The following code:

<?php
$html = '<!DOCTYPE html><html lang="en"><head><title>bug</title></head><body><select><option selected value="en">English</option></select><img alt="" src="someurl"></body></html>';

$tidy = new tidy();
$tidy->ParseString($html);
//echo "\n".tidy_get_output($tidy)."\n\n"; Uncomment this line to verify that tidy is seeing the Boolean Attributes

function walk_nodes($node) {
	if(!empty($node->attribute)){
		echo '<'.$node->name.'>'."\n";
		foreach($node->attribute as $attributeKey=>$attributeValue) {
			echo '    ATTR '.var_export($attributeKey,true).'=>'.var_export($attributeValue,true)."\n";
		}
	}
	if($node->hasChildren()) {
		foreach($node->child as $child) {
			walk_nodes($child);
		}
	}
}
walk_nodes($tidy->root());

Resulted in this output:

<html>
    ATTR 'lang'=>'en'
<option>
    ATTR 'value'=>'en'
<img>
    ATTR 'src'=>'someurl'

But I expected this output instead:

<html>
    ATTR 'lang'=>'en'
<option>
    ATTR 'selected'=>true
    ATTR 'value'=>'en'
<img>
    ATTR 'alt'=>''
    ATTR 'src'=>'someurl'

Note: This ticket was originally submitted regarding boolean attributes but the example above has been updated to reflect empty attributes showing the same behavior.

Documentation for tidynode.props.attribute, There isn't any mention about this and I believe it is an unintended bug.
https://www.php.net/manual/en/class.tidynode.php#tidynode.props.attribute

The value of a "Boolean Attribute" is debatable. It could be NULL but I think it would be better if it was true.(If NULL was desired the fix below would need to be tweaked.)

After reviewing the PHP source code I believe the issue is probably located here
https://github.com/php/php-src/blob/master/ext/tidy/tidy.c
on line 665

I would suggest that the fix look something like this starting on line 665 /ext/tidy/tidy.c

=			if (name && val) {
=				add_assoc_string(&attribute, name, val);
-			}
+			} else if (name) {
+				add_assoc_string(&attribute, name, true);//I am not sure how a type "Boolean" true is set here so it may need to be tweaked.
+			}

Feel free to use the above code in making a patch, I the author disclaim and release all copyright to it.

PHP Version

PHP 8.2.10

Operating System

No response

@whitehorsesupport
Copy link
Author

This needs a label of "Extension: tidy"

@nielsdos
Copy link
Member

Good catch.
Regarding your patch: you'd need add_assoc_bool to add a boolean, not add_assoc_string 🙂

The value of a "Boolean Attribute" is debatable. It could be NULL but I think it would be better if it was true.(If NULL was desired the fix below would need to be tweaked.)

NULL seems dubious indeed. Although true is better, that does break the documentation that claims the value is a string.

From the docs:

attribute

An array of string, representing the attributes names (as keys) of the current node.

What the DOM spec does in this case is return the empty string instead of a boolean. And I think that makes sense here too.
Let me know what you think.

@whitehorsesupport
Copy link
Author

I think an empty string would be a good fix. A fix like that would probably not need to wait on php8.4 so it could possibly be in a much sooner release of 8.2&8.3 hopefully.(Not sure what the policy on that is but from looking at the guidelines it appears this should be submitted against 8.2 and merged into 8.3 and master.)
The one reason I prefer a bool is it could easily be detected that it is a bool attribute vs an empty attribute.

@nielsdos Is this something you will fix or do you want me to try and open a pull request for it?(And does submitting it to 8.2 seem like the correct place?) If you want me to do it does this look correct? Or does the val need to be cast to a string somehow?

-			if (name && val) {
+ 			if (name) {
=				add_assoc_string(&attribute, name, val);
=			}

@whitehorsesupport
Copy link
Author

I just realized that attributes that just have empty values are also not in the array too. Like alt=""
So I think the best fix is my last comment.

@whitehorsesupport
Copy link
Author

I edited the issue to reflect that the bug applies to empty attributes too.

@whitehorsesupport whitehorsesupport changed the title tidynode.props.attribute is missing "Boolean Attributes" tidynode.props.attribute is missing "Boolean Attributes" and empty attributes. Dec 21, 2023
@nielsdos
Copy link
Member

I think an empty string would be a good fix. A fix like that would probably not need to wait on php8.4 so it could possibly be in a much sooner release of 8.2&8.3 hopefully.(Not sure what the policy on that is but from looking at the guidelines it appears this should be submitted against 8.2 and merged into 8.3 and master.)

Indeed, this can be fixed in stable versions.

The one reason I prefer a bool is it could easily be detected that it is a bool attribute vs an empty attribute.

Right, I get that. However, as far as I know, whether an attribute is boolean or not depends on how the browser interprets it (i.e. assigns special meaning to certain attribute names). Boolean in this context means that the value doesn't really matter: what matters is that the attribute is present. A boolean attribute may have the empty string as value, or the attribute name itself even. Source: https://developer.mozilla.org/en-US/docs/Glossary/Boolean/HTML

Also, looks like we can't distinguish between "empty attributes" and "boolean attributes" like you said.

Is this something you will fix or do you want me to try and open a pull request for it?

Either way works for me.

(And does submitting it to 8.2 seem like the correct place?)

Yes

If you want me to do it does this look correct? Or does the val need to be cast to a string somehow?

At that point, val is NULL, so that would crash.
You'll have to check if val is NULL, and if yes you can use add_assoc_str(&attribute, name, zend_empty_string); to avoid allocating a zend_string instance for the empty value.

@whitehorsesupport
Copy link
Author

You are correct that the name is what matters in HTML.

If you can do it that would be great, My c is not perfect. And I don't have a build and testing env setup for PHP to test this.

I think that this would be correct.

=			if (name && val) {
=				add_assoc_string(&attribute, name, val);
+			} else if (name) {
+				add_assoc_str(&attribute, name, zend_empty_string);
=			}

nielsdos added a commit to nielsdos/php-src that referenced this issue Dec 21, 2023
@nielsdos
Copy link
Member

I submitted a PR with the local patch I had.

nielsdos added a commit that referenced this issue Dec 22, 2023
* PHP-8.2:
  Fix GH-12980: tidynode.props.attribute is missing "Boolean Attributes" and empty attributes
nielsdos added a commit that referenced this issue Dec 22, 2023
* PHP-8.3:
  Fix GH-12980: tidynode.props.attribute is missing "Boolean Attributes" and empty attributes
@EzraDechwong0
Copy link

Description

The following code:

<?php
$html = '<!DOCTYPE html><html lang="en"><head><title>bug</title></head><body><select><option selected value="en">English</option></select><img alt="" src="someurl"></body></html>';

$tidy = new tidy();
$tidy->ParseString($html);
//echo "\n".tidy_get_output($tidy)."\n\n"; Uncomment this line to verify that tidy is seeing the Boolean Attributes

function walk_nodes($node) {
	if(!empty($node->attribute)){
		echo '<'.$node->name.'>'."\n";
		foreach($node->attribute as $attributeKey=>$attributeValue) {
			echo '    ATTR '.var_export($attributeKey,true).'=>'.var_export($attributeValue,true)."\n";
		}
	}
	if($node->hasChildren()) {
		foreach($node->child as $child) {
			walk_nodes($child);
		}
	}
}
walk_nodes($tidy->root());

Resulted in this output:

<html>
    ATTR 'lang'=>'en'
<option>
    ATTR 'value'=>'en'
<img>
    ATTR 'src'=>'someurl'

But I expected this output instead:

<html>
    ATTR 'lang'=>'en'
<option>
    ATTR 'selected'=>true
    ATTR 'value'=>'en'
<img>
    ATTR 'alt'=>''
    ATTR 'src'=>'someurl'

Note: This ticket was originally submitted regarding boolean attributes but the example above has been updated to reflect empty attributes showing the same behavior.

Documentation for tidynode.props.attribute, There isn't any mention about this and I believe it is an unintended bug.
https://www.php.net/manual/en/class.tidynode.php#tidynode.props.attribute

The value of a "Boolean Attribute" is debatable. It could be NULL but I think it would be better if it was true.(If NULL was desired the fix below would need to be tweaked.)

After reviewing the PHP source code I believe the issue is probably located here
https://github.com/php/php-src/blob/master/ext/tidy/tidy.c
on line 665

I would suggest that the fix look something like this starting on line 665 /ext/tidy/tidy.c

=			if (name && val) {
=				add_assoc_string(&attribute, name, val);
-			}
+			} else if (name) {
+				add_assoc_string(&attribute, name, true);//I am not sure how a type "Boolean" true is set here so it may need to be tweaked.
+			}

Feel free to use the above code in making a patch, I the author disclaim and release all copyright to it.

PHP Version

PHP 8.2.10

Operating System

No response

d60035d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants