Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

property "innerHTML" stopped working in 0.19 #172

Closed
panthershark opened this issue Aug 27, 2018 · 35 comments
Closed

property "innerHTML" stopped working in 0.19 #172

panthershark opened this issue Aug 27, 2018 · 35 comments

Comments

@panthershark
Copy link

This used to work in elm 0.18 and it doesn't show in the DOM. It looks like a bug

@panthershark panthershark changed the title property "innerHTML" stopped working property "innerHTML" stopped working in 0.19 Aug 27, 2018
@evancz
Copy link
Member

evancz commented Aug 28, 2018

This change was to start ruling out XSS attack vectors in Elm.

If you need help updating some code, folks on slack or discourse can help you find a way to get things going nonetheless!

@evancz evancz closed this as completed Aug 28, 2018
@panthershark
Copy link
Author

panthershark commented Aug 28, 2018

I know y'all made your decision and I know that probably will not change, but I'm still gonna take a moment and explain why I think there should be a way to use this... even if it is behind some weird flag or something emits a warning or whatever..

Elm is great. The flame wars on hacker news are dumb and this is not one of them. Please read this with an open mind and consider the trade-offs made here and how they hurt application developers.

The main reason is this: this does not prevent XSS, just obfuscates this and there is no reasonable workaround for application developers that need to inject formatted content. Here are the options that have been proposed:

  • ports - this just won't work b/c using a port to manually patch the DOM after elm diff/patch is going to cause side effects.
  • iframes - this is an option for some problems, but doesn't work for other scenarios. Also, it is a lot of extra work to make it happen b/c frames must be loaded from the server on a separate url.
  • custom element - this is an option, but it has the same problems as innerHTML. Any app that needs to support safari or ie11, must load giant piles of polyfills to use web components..
  1. these polyfills are larger than our entire elm apps.
  2. these polyfills do not support shadowDOM, only shadyDOM which does not priovide any separation from the main browser context, thus it does not protect from XSS

The assertion that it prevent XSS is at the very least partially untrue. If the goal is prevent package authors from using it, maybe a static analysis tool that must pass before publishing would be preferable to removing a feature of the browser.

Thank you for your time and for making Elm.

@evancz
Copy link
Member

evancz commented Aug 28, 2018

What is your specific scenario? What is the nature of the formatted content? Where is it from? How is it created?

@panthershark
Copy link
Author

There are 2 scenarios currently in mind:

  1. When a server api fails on the server, it formats the error message as html to go with the error code. This can be worked around, but it requires a bunch of elm to take error codes, convert them to error models, then render a template for each error type.

  2. Content is authored in a system and presented in elm. The content comes from that system as html. The workaround here is a bloated custom element or iframe that has to round trip to load the content from a special endpoint.

@panthershark
Copy link
Author

It seems that custom elements are the preferred way to divert the responsibility away from elm, but with the sad state of IE11 and Safari WRT web components, this doesn't isolate the runtime from XSS. It pushes the work to the application developer.

I think I understand where y'all are coming from with the change. With the state of browsers and compatibility, it might just be a little early to make it impossible to inject html.

@evancz
Copy link
Member

evancz commented Aug 28, 2018

Is it true that case (1) can be done by sending JSON over? Maybe with { code : Int, message : String } or something a bit more elaborate? How elaborate would it be? Would the resulting code be harder or easier to maintain than having the server produce raw HTML? I'd imagine the current risk is that changes on the frontend require changes on the backend as well, but it may be pretty non-obvious.

Can you say more about case (2) though? What is the nature of this content? Are you saying that it is produced in some other context that can only produce HTML? What parts of HTML does it produce? Arbitrary tags, or some subset for documents?

@panthershark
Copy link
Author

panthershark commented Aug 28, 2018

Scenario 1 triggers a backend refactor when upgrading to elm 0.19. While this is extra work for our small team, our architecture allows for reasonable mitigation.

Scenario 2 is a little more problematic. Here are 2 situations:

  • In our community, posts are created with a RTE. The content is html and the posts are rendered.

  • Document management system - Users upload documents in word, pdf, etc. An offline process converts them to html previews (open office, pdf-to-html utils, etc) so previews can be made available on devices (pdfs are not iOS friendly for inlining). The custom element deficiency is really annoying here and this has been mitigated with iframes.

We can go through all of my use cases, but it seems that we could also look at the proposed alternatives and decide whether those are acceptable solutions, specifically citing the deficiency in safari and ie11.

While I'm not sure I totally agree, I would accept the custom element as a viable solution if the browsers supported it natively.

@panthershark
Copy link
Author

This came from a chat in slack earlier about this:

My position is that this solution is akin to hunting a deer with a bazooka. Sure, you get the deer, but you also get a bunch of other stuff .
application developers are the collateral damage.
a rifle with a scope would just ensure XSS is not in packages, but keep compatibility with the underlying platform.

@evancz
Copy link
Member

evancz commented Aug 29, 2018

Thank you for sharing more information about your situation. In our 2 month period with the community pre-release, I do not recall anyone raising a situation like this. It is very hard to anticipate scenarios that neither your nor anyone you know has ever done, so again, I appreciate you explaining things in more detail.

For future reference, the slack message you shared is not the kind of evidence that typically goes into a decision. It seems like it is an emotionally satisfying way to vent the root frustration, but ultimately, we try to make decisions based on concrete technical considerations. If that person also has a situation like yours, it would be great to hear that example as well. Again, we did not know this sort of scenario existed for companies using Elm.

I will try to find you on Slack to discuss a bit further. In the meantime, if you find other people in the same scenario, please encourage them to share their technical details here!

@panthershark
Copy link
Author

There is nothing emotionally satisfying about the slack situation. We are all respectfully working through our points of view and this is healthy.

We are a 25 person company with 6 engineers (frontend, backend, devops). We build things very quickly and we have an aggressive product roadmap. Elm has been so good for us as a team as a result reduced our team size by 4 front end engineers (from 9 down to 5) and we actually got way better at building things. The unfortunate side effect of this is that we haven't been able to spin up Elm 0.19 ahead of release.

While my position gives me the authority redirect our resources to address the new limitation, this might cause a lot of friction for others as they would have to convince managers or others..

We love Elm and will continue to be a stanch advocate. Again, thank you for making a great language / platform.

Here is another thread about innerHTML if anyone else wants to follow along there as well. elm/virtual-dom#131

@evancz
Copy link
Member

evancz commented Aug 29, 2018

I was thinking a bit about the ports idea. You ruled it out in your comment, but I think it is actually a viable path. Here is a rough brainstorm on that route:

type alias Model =
  { previews : Dict String String  -- key is an ID and value is the HTML string
  }

port setPreviews : List (String, String) -> Cmd msg

viewPreview : String -> String -> Html msg
viewPreview key html =
  div [ id key, attribute "data-html" html ] []

And then in JavaScript having something like:

var app = Elm.Main.init();
app.ports.setPreviews.subscribe(function(previews) {
  for (var i = 0; i < previews.length; i++) {
    var node = document.getElementById(previews[i][0]);
    node.innerHTML = previews[i][1];
  }
});

Now I am not 100% sure what virtual DOM will do when people use innerHTML at different times. (It was not written with that as a feature, and I am surprised that what you seem to be describing works in the first place.) But if diffing is causing problems, it seems possible to do a trick like using Html.Keyed.node "div" [] [ ( String.concat (Dict.keys previews), viewPreviews previews ) ] to make sure things get deleted whenever there is a change to the content.

I understand that this is not the ideal path for you, but on the other side of things, we do not want packages out there that insert <script> tags under the guise of a date picker or a credit card entry view. I also understand that in npm there are no protections against that sort of thing and AFAIK there has been no high-profile use of this particular exploit. They pay the cost of having a "reporting" infrastructure where people are doing a bunch of things by hand, and for the money we have to spend on Elm, I don't think it makes sense to use it in that way. I need to balance a bunch of competing concerns like this, so given that we have one concrete example so far, I think it makes sense to pursue this seemingly tractable workaround for now.

@evancz
Copy link
Member

evancz commented Aug 29, 2018

Yeah, based on saying it all out loud, I think you could do something like this:

type alias Model = { previews : List Preview }

type alias Preview = { id : String, html : String }

port resetPreviews : () -> Cmd msg

view : Model -> Html msg
view model =
  div []
    [ ...
    , Html.Keyed.node "div" []
        [ ( String.concat (List.map Tuple.first model.previews)
          , viewPreviews model.previews
          )
        ]
    ]

viewPreviews : List Preview -> Html msg
viewPreviews previews =
  div [ id "previews" ] (List.map viewPreview previews)

viewPreview : Preview -> Html msg
viewPreview preview =
  div [ id preview.id, attribute "data-html" preview.html ] []

Now on the JavaScript side you need to handle resetPreviews messages at some interval.

var app = Elm.Main.init();
app.ports.resetPreviews.subscribe(function() {
    var kids = document.getElementById('previews').childNodes;
    for (var i = 0; i < kids.length; i++) {
      kids[i].innerHTML = kids[i].getAttribute('data-html');
    }
  }
});

I'm not 100% all the code here will work as stated, but I think this is closer than the last idea.

@panthershark
Copy link
Author

panthershark commented Aug 29, 2018

Sure, implementing a thunk with a port is an option. I assume this port needs to be called on every update.. yeah?

Also, does Elm guarantee the port run immediately after DOM patch?

Also, what about using requestAnimationFrame? I'm sure elm is using it internally and what side effects come out of it?

@evancz
Copy link
Member

evancz commented Aug 29, 2018

Yeah, definitely one downside of this idea is that you'll need to call resetPreviews from update whenever previews changes. I assumed the data would be loaded asynchronously, so there would be a natural spot to add that command. That definitely is a spot for mistakes, but I set things up with Html.Keyed so that you'll get a pretty obvious failure rather than something subtle and weird.

And about timing, Elm hooks into requestAnimationFrame by default for almost all view calls. With 0.19 it is possible to get synchronous renders on user inputs as well, solving some other issue about race conditions when typing really fast. Anyway, this is a nice default for performance, but it requires some extra care when using ports to mess with the DOM. So the actual reliable JS code may look more like this:

var app = Elm.Main.init();
app.ports.resetPreviews.subscribe(function() {
  var node = document.getElementById('previews');
  node
    ? resetPreviews(node)
    : requestAnimationFrame(resetPreviews);
});

function resetPreviews(node) {
  var node = node || document.getElementById('previews');
  if (!node) return;
  var kids = node.childNodes;
  for (var i = 0; i < kids.length; i++) {
    kids[i].innerHTML = kids[i].getAttribute('data-html');
  }
}

I think you may be able to cut some of this depending the particulars of your scenario, which I do not know well enough to say. I also wrote this such that you call document.getElementById as little as possible. It could be simpler if resetPreviews just called itself, gated by rAF but that felt worse to me.

@evancz
Copy link
Member

evancz commented Aug 29, 2018

We discussed a bit in chat, and I wanted to record that here as well.

One of the things that came up was paths that would allow innerHTML for application developers, but rule it out in packages. The goal there is that package.elm-lang.org can still guarantee that a date picker or credit card package does not insert <script> tags, but an application developer can do things if they really really need to.

There are two potential designs that came to mind for that idea:

  1. Just allow property "innerHTML" and detect it if it gets into packages. The trouble is that string may be constructed at runtime, possibly based on user input. Maybe folks encode things with UTF-8 code-points and convert to string later. Etc. So detecting this statically is not technically feasible to my knowledge.
  2. Have an explicit Html.Attributes.innerHTML for this case. When a package is published, just check that it does not exist in the code. The trouble is that code actually does not go through the package website right now. We would like to do that someday (to guarantee packages compile and their tests pass) but it is a quite large infrastructure project that would definitely take a notable number of weeks or months. And even then, it is not just a grep for innerHTML because that is a totally valid name in Elm. We would need a semantic understanding of import Html.Attributes exposing (innerHTML) or import Html.Attributes as A to figure it out precisely.

So the second design is technically possible, but it has two major downsides. (1) The time spent on that path will be at the expense of working on other things in the ecosystem. I'd like to focus on various web platform packages for example, and I am under a lot of pressure to do so. Other folks are volunteer workers, and an infrastructure project like this will take them longer and compete with the work they are already doing. The bigger picture infrastructure changes are desirable, but we'd like to integrate it with testing and it may block on some other things. (2) There are a significant number of people who do not want innerHTML available in application code. In a team of 10 or 20, it is nice to be confident that no summer intern can accidentally introduce an XSS vector by accident. In a team of two, it is nice that by default this potential security issue is not a problem. You have to go out of your way, into JS, to actually open the possibility. From there, you know exactly where the risks are, making it easier to have a "sanitize everything" function to be extra safe.

So there are other paths here, but in trying to balance all the competing concerns on design and prioritization, I think the path with ports outlined in this thread is the best overall, though clearly it is not ideal for your situation.

If you want to talk anything through, please reach out again. I know Richard and Luke have more experience with customElements when that path gets more viable on different platforms. And thank you for talking through all this!

@panthershark
Copy link
Author

When Safari and IE11 finally get their act together with custom elements, that is a much better solution. The polyfill is larger than most elm apps. If you can afford the polyfill, it seems much nicer than the ports option.

It is really nice that the 0.19 rendering is predictable now. My initial concern with ports as a viable option was the side effects of async rendering.

@evancz - thanks for the conversation and closing the loop here. :)

@wildlyinaccurate
Copy link

wildlyinaccurate commented Aug 31, 2018

Hey @evancz, I just wanted to chime in and say that I wouldn't mind spending some time to implement the second potential design that you mentioned (have an explicit Html.Attributes.innerHTML). This would be much more preferable (for me) than requiring developers to jump through unnecessary hoops like using ports or creating custom elements. After all, the "summer intern" that you mentioned will just search up how to set innerHTML in Elm and end up coding the same vulnerability in a much more convoluted way that is more difficult to audit.

I haven't read much of the Elm builder source, but I'm assuming it would be relatively trivial to generate and traverse a full AST of any package during elm publish and evaluate the import statements?

@evancz
Copy link
Member

evancz commented Sep 1, 2018

Adding explicit support for innerHTML is potentially technically feasible, but:

  1. It was a mistake in the first place. The fact that it currently works is somewhat baffling to me. I did not think my virtual DOM implementation could handle nodes like that. If I had a chance to do it again, we would not be in a position where folks were depending on this.
  2. I do not think it is the right long-term move. Ultimately, adding innerHTML support means an XSS attack vector is now upgraded from mistake to feature. It also encourages people to work with unstructured HTML in other parts of their codebase. I think using structured formats will be better in the long run, even though it has a real cost to certain users at this time.
  3. I don't think it's a good use of resources in the short term. This project is to have a server check packages, so we do not have to trust people who can spoof information. And if we have someone spending time setting up a server to do analysis of packages before publication, I think it'd be better to have them focus on "does it compile?" and "do the tests pass?" and "is there a UI to show the work queue?" In other words, that project is really valuable on its own, and I think it'd be a mistake if the core design goals of that project were overshadowed by a secondary project that aside from the positives, also makes XSS attacks more likely. This project also requires some rather coordination with the package website. As far as I can tell, it's not a "hack it together in a weekend" kind of infrastructure project.

I understand the argument that the intern could still add an XSS vulnerability. I think one of the valuable parts of Elm is that "In Elm, you get certain guarantees." If you go out to JS, all of that goes out the window. I think it would be a pretty nice guarantee that "Elm code does not have XSS vulnerabilities via HTML" and worth working towards. If your application is 85% or 90% or 95% Elm code, you have very very dramatically reduced the code that needs security audits. I think that is a big deal.

I get that people are not happy with the decision here, but even if we can find three or four examples, it is a really big thing to give up for those cases.

@wildlyinaccurate
Copy link

Thanks for (another!) really thorough reply, @evancz. I can't fault your reasoning at all, and I agree with all of your points.

@exanup
Copy link

exanup commented Dec 9, 2018

The faq page at http://faq.elm-community.org/#how-can-i-output-literal-html-and-avoid-escaping-of-entities is out of date. It would have saved about 1 hour of me tinkering around if that page contained the link to here.

@psyCodelist
Copy link

psyCodelist commented Dec 20, 2018

For those who have the similar issue, we had the same issue with innerHTML property and we used Html.Parser and Html.Parser.Util to parse strings into elm nodes. works fine for us. give it a try.
https://package.elm-lang.org/packages/hecrj/html-parser/latest/

@psyCodelist
Copy link

let
    nodes =
        case Html.Parser.run item.calculation.htmlContent of
            Ok parsedNodes ->
                Html.Parser.Util.toVirtualDom parsedNodes
            _ ->
                []
in
div [ class "formula", dir "ltr" ] nodes

@berenddeboer
Copy link

In the end, people have backends where content editors type in HTML. That needs to be displayed as HTML. Very common scenario. Elm cannot give guarantees here either way, either it goes to ports or it goes to the example just above. So with 0.19 it's simply more work. Up to a lot more work.

The problem with @psyCodelist's example is that you cannot force the content editor to type in perfect html. Which is what is required I'm afraid.

How this is normally handled is that the output from the backend goes through some escaping mechanism. Because sometimes the script tag is ok, i.e. every kind of HTML is OK, and sometimes it isn't. At Elm's level you usually cannot know. It's the backend that knows who entered a piece of content, and the privileges that editor has, i.e. how much output escaping needs to be done.

So in my opinion Elm simply needs to accept random HTML, it's up the backend to give the appropriate HTML.

@panthershark
Copy link
Author

I've got a solution that is nice and balances both worlds. I'll post the solution this weekend.

@fredcy
Copy link

fredcy commented Apr 29, 2019

The faq page at http://faq.elm-community.org/#how-can-i-output-literal-html-and-avoid-escaping-of-entities is out of date. It would have saved about 1 hour of me tinkering around if that page contained the link to here.

Any yet no one opened an issue on that FAQ item until a few days ago.

@panthershark
Copy link
Author

@berenddeboer - here is a gist that we use. In our case, this is really nice b/c we need to apply functions on the FE to a.href for the following use case:

  • wrap urls in literal html with a proxy url (hey user, you are about to leave our website)

There are a lot of cases where you might want process the HTML and this gives you a place to do this.
https://gist.github.com/doanythingfordethklok/d6e4fee5b5d07ee500683cd989ae69a8

@berenddeboer
Copy link

here is a gist that we use.

It seems this is basically the @psyCodelist solution right? I.e. parse html (so has to be valid), then insert.

@techninja1008
Copy link

@evancz I totally understand the idea of anything being written in elm having certain guarantees, and I certainly agree with it (to a degree).

With that being said, I also thoroughly agree with the side of things where you already have suitably sanitised HTML that you wish to render to the page. In that case, I'd like to propose to possible solutions that get the best of both worlds:

  1. An Elm.Dangerous (or similar) module which provides access to features like a dangerous innerHtml property. Any use of this module would immediately drop any of the Elm guarantees. Not only that, but it only takes a simple grep to find if it's in use by a particular file. This can be used to easily make sure that either packages can't be uploaded with it, or files from a package won't be compiled if they contain it.
  2. Add, or make it possible to turn on via a flag-like interface, the ability to hook into the VirtualDom render sequence from JS in a more predictable manner. This would allow for less clunky and more predictable solutions based on ports, as seen previously in this post, whilst keeping the strict JS/Elm separation.

Overall, I really like the direction that the language of going, and I can tell that it's design is in very responsible hands. Dangerously setting HTML is definitely a use case that I believe Elm should protect against, however there is clearly demand for it, and I think it's a good idea to make it possible in some sort of predictable and widely supportable manner, but not necessarily from directly inside of Elm.

@panthershark
Copy link
Author

Imho, dangerouslySetWhatever was not a good idea in react nor will it be good here. If there is safe sanitized valid html, then the libs above will convert it into vdom in a single function call so it doesn't seem to save anything.

@wildlyinaccurate
Copy link

Imho, dangerouslySetWhatever was not a good idea in react nor will it be good here. If there is safe sanitized valid html, then the libs above will convert it into vdom in a single function call so it doesn't seem to save anything.

I take issue with this "you can parse an HTML string into a VDOM object so there's no need for innerHTML" rhetoric. We shouldn't be shipping an entire HTML parser down to end users when the browser already has its own parser. And referring to said parser as "a single function call" is grossly ignorant of the CPU time required to parse many HTML strings in quick succession. Penalizing users (and developers) with this unnecessary performance overhead just to replicate something the browser already does an order of magnitude more efficiently isn't great.

@panthershark
Copy link
Author

Right. It can be parsed once on load, converted to vdom nodes, then re-used. For cases where this is truly unteneble, web components or ports are great options that don't open up attack vectors to everyone who consumes the language.

The browser already provided a dangerousSetHtml functionality. It is called innerHTML and it is still available via ports.

If you read through these threads, you'll see that originally I agreed with you. After spending some time truly thinking through the trade offs, it became clear that I was wrong.

@berenddeboer
Copy link

If there is safe sanitized valid html,

What is sanitised? A the Elm level you can't know that. If the backend has authenticated a user to emit certain html, that's what is considered sanitised.

At the Elm level you often don't know what permissions the user has, i.e. may they insert scripts? Or just use em and strong? And you don't want to repeat those permissions in Elm either.

@panthershark
Copy link
Author

suitably sanitised HTML

Apologies for misquoting earlier.

@roebuk
Copy link

roebuk commented Jul 3, 2019

I just wanted to say thanks to @evancz for providing a workaround and explaining why innerHTML is no longer supported in 0.19. His solution provided the basis of my solution.

@AdrianRibao
Copy link

It seems that pantershark changed his github's name and the link to the gist has changed, I think the correct one is this one:

https://gist.github.com/panthershark/d6e4fee5b5d07ee500683cd989ae69a8

Pamplemousse added a commit to Pamplemousse/xaviermaso.com that referenced this issue May 23, 2020
  * updating `innerHTML` not working since elm@0.19.0:
  elm/html#172
  * used proper HTML parsing from string to generate nodes
  * correct unparsable description
Pamplemousse added a commit to Pamplemousse/xaviermaso.com that referenced this issue May 23, 2020
  * updating `innerHTML` not working since elm@0.19.0:
  elm/html#172
  * used proper HTML parsing from string to generate nodes
  * correct unparsable description
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants