Skip to content

Incorrect HTML magic identification when preceeded by a comment #102

@markedmondson

Description

@markedmondson

If the HTML has a comment before the opening tag, it is incorrectly identified as XML.

Steps to reproduce

io = StringIO.new(<<~HTML)
  <!--/* Throwaway comment but it has to be over 64 characters to fail AND have a uppercase HTML tag */-->
  <HTML>
    <head>
    </head>
    <body>
      <h1>Magic!</h1>
    </body
  </HTML>
HTML
Marcel::MimeType.for(io)
# => "application/xml"
io = StringIO.new(<<~HTML)
  <!--/* Throwaway comment but it has to be over 128 characters to fail AND have a lowercase HTML tag, we can pad this one out a bit to get it longer */-->
  <html>
    <head>
    </head>
    <body>
      <h1>Magic!</h1>
    </body
  </html>
HTML
Marcel::MimeType.for(io)
# => "application/xml"

Updating the magic definitions is a temporary workaround but obviously the comment could be any length, the broader lookup here https://github.com/rails/marcel/blob/main/lib/marcel/tables.rb#L2761 falls below the comment xml matching magic in https://github.com/rails/marcel/blob/main/lib/marcel/tables.rb#L2747.

Temporary workaround

Marcel::MimeType.extend "text/html", magic: [[0..256, "<HTML"]]
Marcel::MimeType.extend "text/html", magic: [[0..256, "<html"]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions