Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Email agent stripping out tags? #1373

Closed
irfancharania opened this issue Mar 26, 2016 · 17 comments
Closed

Email agent stripping out tags? #1373

irfancharania opened this issue Mar 26, 2016 · 17 comments

Comments

@irfancharania
Copy link
Contributor

I've updated my huginn instance to the latest version and I'm noticing that the email agent is stripping out a number of tags like table.

Here is a test page: http://rockandsock.neocities.org/index.html

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>The web site of rockandsock</title>
    <!-- The style.css file allows you to change the look of your web pages.
         If you include the next line in all your web pages, they will all share the same look.
         This makes it easier to make new pages for your site. -->
    <link href="/style.css" rel="stylesheet" type="text/css" media="all">
  </head>
  <body>
    <h2>Email Template</h2>
    <table width="600" cellpadding="0" cellspacing="0" align="center">
        <tr>
            <td width="200">
                <h2>Side bar</h2>
                <p>Test</p>
            </td>
            <td width="400">
                <h2>Main Content</h2>
                <p>Some stuff here</p>
                <table width="100%">
                    <tbody>
                        <tr>
                            <td>1</td>
                            <td>2</td>
                            <td>3</td>
                        </tr>
                        <tr>
                            <td>1</td>
                            <td>2</td>
                            <td>3</td>
                        </tr>
                    </tbody>
                </table>
            </td>
        </tr>
    </table>
  </body>
</html>

This is a test scenario demonstrating the issue:

{
  "schema_version": 1,
  "name": "Test HTML email",
  "description": "No description provided",
  "source_url": false,
  "guid": "4e458982c373314efea506e8f2085f39",
  "tag_fg_color": "#ffffff",
  "tag_bg_color": "#5bc0de",
  "exported_at": "2016-03-26T16:59:00Z",
  "agents": [
    {
      "type": "Agents::EmailAgent",
      "name": "A_Test - Email",
      "disabled": false,
      "guid": "9f39fc6ac572bfc0473ebae1732403ee",
      "options": {
        "subject": "You have a notification!",
        "expected_receive_period_in_days": "2",
        "body": "{{body}}",
        "content_type": "text/html"
      },
      "propagate_immediately": true
    },
    {
      "type": "Agents::WebsiteAgent",
      "name": "A_Test - in",
      "disabled": false,
      "guid": "4d2ad8ecda6496a3d70af665543f0d1b",
      "options": {
        "expected_update_period_in_days": "2",
        "url": "http://rockandsock.neocities.org/",
        "type": "html",
        "mode": "on_change",
        "extract": {
          "body": {
            "css": "body",
            "value": "."
          }
        }
      },
      "schedule": "every_12h",
      "keep_events_for": 3600,
      "propagate_immediately": false
    }
  ],
  "links": [
    {
      "source": 1,
      "receiver": 0
    }
  ],
  "control_links": [

  ]
}

The older instance sends emails as expected (with tables)...
Any ideas what the issue could be?

@dsander
Copy link
Collaborator

dsander commented Mar 26, 2016

Hi @irfancharania,

I can confirm the tables are tripped from the HTML but can not remember a change that could have caused it. Do you know which in revision of Huginn it worked?

@irfancharania
Copy link
Contributor Author

What's the easiest way for me to tell what revision an instance is?
The latest entry in changes.md in the working instance is from Oct 17, 2015

@dsander
Copy link
Collaborator

dsander commented Mar 26, 2016

git log | head -n1 should give you the sha of the latest commit of your instance.

@irfancharania
Copy link
Contributor Author

commit efb3bc413613910b69d1381c6f54214ef7168206

@dsander
Copy link
Collaborator

dsander commented Mar 26, 2016

I did some bisecting and found that 05a4c32 changed the behavior. However the rails CHANGELOGs did not seem to mention anything: rails/rails@v4.2.4...v4.2.5.1
The issue might be the upgrade of rails-html-sanitizer rails/rails-html-sanitizer@v1.0.2...v1.0.3 specifically this commit rails/rails-html-sanitizer@297161e, even if it was a bugfix that's pretty big change for a patch release. Probably nobody noticed that it would also affect HTML emails.

I think we need to figure out how the HTML sanitizer was configured before the rails 4.2.5.1 upgrade and override the defaults in our configuration.

/cc @cantino @knu

@irfancharania
Copy link
Contributor Author

I'd guess it's due to the restrictive set of allowable tags/attributes (here)[https://github.com/rails/rails-html-sanitizer/blob/master/lib/rails/html/sanitizer.rb#L107-L110]?

Would it be possible to define a more permissive global whitelist in Huginn?

@cantino
Copy link
Member

cantino commented Mar 26, 2016

Based on this article, it looks like you could set Rails::Html::Sanitizer.allowed_tags = Rails::Html::Sanitizer.allowed_tags + Set.new %w(table) in a new initializer in config/initializers and see if it fixes it for you?

@irfancharania
Copy link
Contributor Author

Can you tell me what I'm doing wrong here?

sudo -u huginn -H nano sanitizer.rb

paste in Rails::Html::Sanitizer.allowed_tags = Rails::Html::Sanitizer.allowed_tags + Set.new %w(table)

sudo bundle exec rake production:restart
service nginx restart

sudo bundle exec rake production:check
(in /home/huginn/huginn)
[  OK  ] Everything is fine

That gives me a 502 error, and 'huginn_error.log' just has connection refused entries

@cantino
Copy link
Member

cantino commented Mar 27, 2016

Actually, try ActionView::Base.sanitized_allowed_tags += Set.new(%w[table]) instead.

@irfancharania
Copy link
Contributor Author

That's better!

Silly question, but how do I add more tags?
I've tried %w[table, thead, tbody, tr, th, td] and %w(table, thead, tbody, tr, th, td) but they both only take the last listed tag

@cantino
Copy link
Member

cantino commented Mar 27, 2016

Without the commas. %w(table thead) is Ruby shorthand for ['table', 'thead']

@irfancharania
Copy link
Contributor Author

Also, for future, how would I hunt down the source of that 502 error I was getting before?

@cantino
Copy link
Member

cantino commented Mar 27, 2016

I'd expect to see it in the production.log in log/ in the app directory.

@irfancharania
Copy link
Contributor Author

Your solution works!

So you think we can add it in the main branch?

Here's what I've got: sanitizer.rb

ActionView::Base.sanitized_allowed_tags += Set.new(%w(table thead tbody tr th td))
ActionView::Base.sanitized_allowed_attributes += Set.new(%w(border cellspacing cellpadding valign))

@cantino
Copy link
Member

cantino commented Mar 27, 2016

Seems safe to me. Want to send a PR?

@irfancharania
Copy link
Contributor Author

Sent pull request #1380

@cantino
Copy link
Member

cantino commented Mar 27, 2016

Merged, thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants