Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML tag validation added for tag names #49175

Merged

Conversation

akhilgkrishnan
Copy link
Member

@akhilgkrishnan akhilgkrishnan commented Sep 6, 2023

Follow-up #49120, Also fixes the empty string "" case of #44948

Motivation / Background

As per the discussion in #49120 (comment), finally decided to have validation for invalid HTML characters in tag, content_tag helper methods.

Detail

Added validation for HTML tag names in the tag and content_tag method. The content_tag method now checks that the provided tag name adheres to the HTML specification. If an invalid HTML tag name is provided, the method raises an ArgumentError with an appropriate error message.

Examples:
Before:

content_tag("12p") 
#  "<12p></12p>"

content_tag("") 
# "<></>"

tag("image file")
#  "<image_file></image_file>"

After:

# Raises ArgumentError: Invalid HTML5 tag name: 12p
content_tag("12p") # Starting with a number

# Raises ArgumentError: Invalid HTML5 tag name:  
content_tag("") # Empty tag name

# Raises ArgumentError: Invalid HTML5 tag name: "image file"
tag("image file") # Contains a space

cc\ @byroot

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@byroot
Copy link
Member

byroot commented Sep 7, 2023

Looks fine to me now, the only outstanding thing is what to allow exactly.

The HTML5 spec is fairly restrictive:

Tags contain a tag name, giving the element's name. HTML elements all have names that only use characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z, and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z. In the HTML syntax, tag names, even those for foreign elements, may be written with any mix of lower- and uppercase letters that, when converted to all-lowercase, matches the element's tag name; tag names are case-insensitive.

https://www.w3.org/TR/2011/WD-html5-20110405/syntax.html#syntax-tag-name

But I don't know if perhaps there is another later spec that allow more. I'd hate to ship something like this and to break the code of people using some less known part of the spec.

@akhilgkrishnan
Copy link
Member Author

Looks fine to me now, the only outstanding thing is what to allow exactly.

The HTML5 spec is fairly restrictive:

Tags contain a tag name, giving the element's name. HTML elements all have names that only use characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z, and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z. In the HTML syntax, tag names, even those for foreign elements, may be written with any mix of lower- and uppercase letters that, when converted to all-lowercase, matches the element's tag name; tag names are case-insensitive.

https://www.w3.org/TR/2011/WD-html5-20110405/syntax.html#syntax-tag-name

But I don't know if perhaps there is another later spec that allow more. I'd hate to ship something like this and to break the code of people using some less known part of the spec.

As per the current HTML restriction, the implemented regex validates all these.

@akhilgkrishnan
Copy link
Member Author

@byroot Can we proceed with this?

@flavorjones
Copy link
Member

@akhilgkrishnan I see you requested a re-review, but I don't see any changes to the PR. Help me understand what you'd like me to give feedback on?

@akhilgkrishnan
Copy link
Member Author

@akhilgkrishnan I see you requested a re-review, but I don't see any changes to the PR. Help me understand what you'd like me to give feedback on?

Can I get your opinion on this? #49175 (comment)

@flavorjones

This comment was marked as off-topic.

@flavorjones

This comment was marked as off-topic.

@akhilgkrishnan akhilgkrishnan force-pushed the content_tag-html-tag-validation branch 5 times, most recently from 980369b to 3a0f388 Compare September 23, 2023 17:19
@akhilgkrishnan akhilgkrishnan changed the title HTML tag validation added for content_tag HTML tag validation added for tag names Sep 24, 2023
@akhilgkrishnan akhilgkrishnan force-pushed the content_tag-html-tag-validation branch 3 times, most recently from 7b5bb8a to db803a5 Compare September 24, 2023 16:49
@akhilgkrishnan
Copy link
Member Author

@flavorjones Let me know if any changes required here.

cc: @byroot

actionview/lib/action_view/helpers/tag_helper.rb Outdated Show resolved Hide resolved
actionview/test/template/tag_helper_test.rb Outdated Show resolved Hide resolved
actionview/test/template/tag_helper_test.rb Outdated Show resolved Hide resolved
actionview/test/template/tag_helper_test.rb Outdated Show resolved Hide resolved
actionview/test/template/tag_helper_test.rb Outdated Show resolved Hide resolved
@akhilgkrishnan akhilgkrishnan force-pushed the content_tag-html-tag-validation branch 2 times, most recently from 4226028 to d3bae93 Compare September 27, 2023 04:25
Copy link
Member

@flavorjones flavorjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the slight tweak to the exception message, this looks good. Thank you for taking the time to work with me on it!

Added validation for HTML tag names in the `tag` and `content_tag` helper method. The `tag` and
`content_tag` method now checks that the provided tag name adheres to the HTML specification. If
an invalid HTML tag name is provided, the method raises an `ArgumentError` with an appropriate error
message.

Examples:

```ruby
content_tag("12p") # Starting with a number

content_tag("") # Empty tag name

tag("div/") # Contains a solidus

tag("image file") # Contains a space
```
@akhilgkrishnan
Copy link
Member Author

Other than the slight tweak to the exception message, this looks good. Thank you for taking the time to work with me on it!

Thanks for your feedback and support for completing this. ❤️

@rafaelfranca rafaelfranca merged commit 7c03e4f into rails:main Sep 27, 2023
4 checks passed
@akhilgkrishnan akhilgkrishnan deleted the content_tag-html-tag-validation branch September 28, 2023 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants