New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix XSS in ocamldoc #848

Merged
merged 1 commit into from Dec 7, 2016

Conversation

Projects
None yet
4 participants
@emillon
Contributor

emillon commented Oct 11, 2016

It is possible to craft comments that can expand to unfiltered HTML fragments.

For example, the following program:

(** {{: "><script>alert(1)</script>"} } *)
let n = 0

Would generate a HTML file containing:

<a href=" "><script>alert(1)</script>""> </a>

Using this technique it is possible to leak cookies to a third party.

The fix is not perfect since it does not generate usable documentation, but the generated HTML is harmless.

Note that input like {{: javascript:alert(1)} } will still run arbitrary JS but requires human interaction.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Oct 11, 2016

Member

Did you check the rest of the HTML-producing backends for other issues of this kind? I'm surprised that there would be only one place to fix.

Member

gasche commented Oct 11, 2016

Did you check the rest of the HTML-producing backends for other issues of this kind? I'm surprised that there would be only one place to fix.

@Octachron

This comment has been minimized.

Show comment
Hide comment
@Octachron

Octachron Oct 11, 2016

Contributor

Another place to fix may be the title generation: it is possible to use file name like <u>Module.ml to inject raw html inside the documentation (however the impossibility to use / in file name decreases the usability of this injection).

Contributor

Octachron commented Oct 11, 2016

Another place to fix may be the title generation: it is possible to use file name like <u>Module.ml to inject raw html inside the documentation (however the impossibility to use / in file name decreases the usability of this injection).

@emillon

This comment has been minimized.

Show comment
Hide comment
@emillon

emillon Oct 11, 2016

Contributor

I found a couple other ways to inject html:

  • (** {%html: ... %} *) will inject html verbatim in the output. That's a supported feature so it's a bigger problem and I think that this should be disabled by default or documented as dangerous.
  • passing -css-style '"><script>alert(1)</script>' will also insert this verbatim. I think that escaping should work there but it's not directly picking it from the .mli file so I would consider that less important.
Contributor

emillon commented Oct 11, 2016

I found a couple other ways to inject html:

  • (** {%html: ... %} *) will inject html verbatim in the output. That's a supported feature so it's a bigger problem and I think that this should be disabled by default or documented as dangerous.
  • passing -css-style '"><script>alert(1)</script>' will also insert this verbatim. I think that escaping should work there but it's not directly picking it from the .mli file so I would consider that less important.
@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Oct 11, 2016

Contributor

What is the threat model here ?

ocamldoc is a tool that is precisely made to generate (possibly arbitrary) html output. If you can't trust the mlis of packages you install then you shouldn't trust their code either. Should we also make the compilers produce void outputs, just in case ?

As it stands there's no subsitute for {%html: if you want to be able to do something as mundane as introducing multimedia elements in your documentation. So unless the ocamldoc language evolves we'll have to live with that (which I personally don't find a problem).

Contributor

dbuenzli commented Oct 11, 2016

What is the threat model here ?

ocamldoc is a tool that is precisely made to generate (possibly arbitrary) html output. If you can't trust the mlis of packages you install then you shouldn't trust their code either. Should we also make the compilers produce void outputs, just in case ?

As it stands there's no subsitute for {%html: if you want to be able to do something as mundane as introducing multimedia elements in your documentation. So unless the ocamldoc language evolves we'll have to live with that (which I personally don't find a problem).

@emillon

This comment has been minimized.

Show comment
Hide comment
@emillon

emillon Oct 11, 2016

Contributor

Hi,

With the current state of ocamldoc it is impossible to host documentation for third-party packages in a safe way. Say, if you want to build documentation for all opam packages à la https://readthedocs.io/. Or if you build documentation for pull requests as part of your CI for example, the existing implementation is not suitable.

It is not my decision to determine if the above should be possible or not, but if ocamldoc output can not be trusted this should be documented (and indeed, the fact that there's {%html: } hints that this is not one of ocamldoc's goals).

Contributor

emillon commented Oct 11, 2016

Hi,

With the current state of ocamldoc it is impossible to host documentation for third-party packages in a safe way. Say, if you want to build documentation for all opam packages à la https://readthedocs.io/. Or if you build documentation for pull requests as part of your CI for example, the existing implementation is not suitable.

It is not my decision to determine if the above should be possible or not, but if ocamldoc output can not be trusted this should be documented (and indeed, the fact that there's {%html: } hints that this is not one of ocamldoc's goals).

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Oct 11, 2016

Member

I think the proposed change of the PR, which escape the %s in <a href="%s">, makes sense from a programming point of view (and not only for security), as we can think of it as a change improving correctness, in particular in the face of URLs that would contain quotes. One could argue similarly that escaping makes sense for the -css-style property. On the other hand {%html ...} is designed for raw HTML injection and I would agree about keeping it this way.

Member

gasche commented Oct 11, 2016

I think the proposed change of the PR, which escape the %s in <a href="%s">, makes sense from a programming point of view (and not only for security), as we can think of it as a change improving correctness, in particular in the face of URLs that would contain quotes. One could argue similarly that escaping makes sense for the -css-style property. On the other hand {%html ...} is designed for raw HTML injection and I would agree about keeping it this way.

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Oct 11, 2016

Contributor

@gasche Agreed.

With the current state of ocamldoc it is impossible to host documentation for third-party packages in a safe way.

The proper way of handling this is to run a sanitization policy on the program's outputs. In practice package documentation could also provide pre-generated html bits that sport the same kind of problems, malicious images, ping servers via stylesheets possibly triggering exploits, etc. etc.

but if ocamldoc output can not be trusted this should be documented

I don't see the point. It's not that ocamldoc's output that cannot be trusted it's the input files that can't (or shouldn't). Does every markdown processor out there advertise that it's output cannot be trusted because raw html can be written in markdown files ? I think you are looking at the problem from the wrong point of view.

Contributor

dbuenzli commented Oct 11, 2016

@gasche Agreed.

With the current state of ocamldoc it is impossible to host documentation for third-party packages in a safe way.

The proper way of handling this is to run a sanitization policy on the program's outputs. In practice package documentation could also provide pre-generated html bits that sport the same kind of problems, malicious images, ping servers via stylesheets possibly triggering exploits, etc. etc.

but if ocamldoc output can not be trusted this should be documented

I don't see the point. It's not that ocamldoc's output that cannot be trusted it's the input files that can't (or shouldn't). Does every markdown processor out there advertise that it's output cannot be trusted because raw html can be written in markdown files ? I think you are looking at the problem from the wrong point of view.

@emillon

This comment has been minimized.

Show comment
Hide comment
@emillon

emillon Oct 11, 2016

Contributor

@gasche : agreed too, this looks like the correct solution.

@dbuenzli: I think that there are two kinds of documentation processors. Some are like doxygen and "just" render comments in a structured way, and others like pandoc that are helpers to write arbitrary html that happen to contain documentation.

I assumed that ocamldoc was in the first group, but if it is in the second one I agree with you, there is no point in trying to make it output sanitized HTML.

Contributor

emillon commented Oct 11, 2016

@gasche : agreed too, this looks like the correct solution.

@dbuenzli: I think that there are two kinds of documentation processors. Some are like doxygen and "just" render comments in a structured way, and others like pandoc that are helpers to write arbitrary html that happen to contain documentation.

I assumed that ocamldoc was in the first group, but if it is in the second one I agree with you, there is no point in trying to make it output sanitized HTML.

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
Contributor

dbuenzli commented Oct 11, 2016

@emillon

This comment has been minimized.

Show comment
Hide comment
@emillon

emillon Oct 11, 2016

Contributor

I stand corrected :)

Contributor

emillon commented Oct 11, 2016

I stand corrected :)

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Oct 11, 2016

Member

So is there consensus that the only things that should be sanitized are links and css-style, and that we don't both about the latter? @emillon, are you reasonably confident that that there don't exist other parts for which escaping would be justified under the same terms? What about module names as @Octachron suggested? We should not spend too much time on it, but it always feels better to close an issue for good.

Member

gasche commented Oct 11, 2016

So is there consensus that the only things that should be sanitized are links and css-style, and that we don't both about the latter? @emillon, are you reasonably confident that that there don't exist other parts for which escaping would be justified under the same terms? What about module names as @Octachron suggested? We should not spend too much time on it, but it always feels better to close an issue for good.

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Oct 11, 2016

Contributor

So is there consensus that the only things that should be sanitized are links and css-style

Well it's not the only thing. In general whatever may end up in the html output as data should be properly escaped in order not to generate invalid documents (except what is in {%html: %} of course). So you may also want to check for e.g. the -title option etc.

Contributor

dbuenzli commented Oct 11, 2016

So is there consensus that the only things that should be sanitized are links and css-style

Well it's not the only thing. In general whatever may end up in the html output as data should be properly escaped in order not to generate invalid documents (except what is in {%html: %} of course). So you may also want to check for e.g. the -title option etc.

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Oct 11, 2016

Contributor

and -charset.

Contributor

dbuenzli commented Oct 11, 2016

and -charset.

@emillon

This comment has been minimized.

Show comment
Hide comment
@emillon

emillon Oct 11, 2016

Contributor

Yes, -t and -charset work in the same way as -css-style. Assuming that Naming functions return safe identifiers, there does not seem to be any other places that would have the same problem as links (@see would have the same problem but calls the same function as {{: url} }).

Contributor

emillon commented Oct 11, 2016

Yes, -t and -charset work in the same way as -css-style. Assuming that Naming functions return safe identifiers, there does not seem to be any other places that would have the same problem as links (@see would have the same problem but calls the same function as {{: url} }).

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Oct 11, 2016

Member

Should I merge the PR in its current state, or does anyone wish to do something more?

Member

gasche commented Oct 11, 2016

Should I merge the PR in its current state, or does anyone wish to do something more?

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Oct 11, 2016

Member

(In think it would be nice to have a Changes entry to reflect the contributions of Daniel and Florian.)

Member

gasche commented Oct 11, 2016

(In think it would be nice to have a Changes entry to reflect the contributions of Daniel and Florian.)

@emillon

This comment has been minimized.

Show comment
Hide comment
@emillon

emillon Oct 12, 2016

Contributor

I just added a changelog entry and squashed the commits.

Contributor

emillon commented Oct 12, 2016

I just added a changelog entry and squashed the commits.

@Octachron

This comment has been minimized.

Show comment
Hide comment
@Octachron

Octachron Dec 7, 2016

Contributor

I think merging the PR in its current state is fine.
@emillon, would you have the time to update the Changes file to solve the current conflict?

Contributor

Octachron commented Dec 7, 2016

I think merging the PR in its current state is fine.
@emillon, would you have the time to update the Changes file to solve the current conflict?

@emillon

This comment has been minimized.

Show comment
Hide comment
@emillon

emillon Dec 7, 2016

Contributor

@Octachron absolutely, just did so. Thanks!

Contributor

emillon commented Dec 7, 2016

@Octachron absolutely, just did so. Thanks!

@Octachron

This comment has been minimized.

Show comment
Hide comment
@Octachron

Octachron Dec 7, 2016

Contributor

Sorry to bother you, but could you move up your change before gasche change clarify ocamldoc text parsing error messages to respect the MPR/GPR/other order in the changelog?

Contributor

Octachron commented Dec 7, 2016

Sorry to bother you, but could you move up your change before gasche change clarify ocamldoc text parsing error messages to respect the MPR/GPR/other order in the changelog?

Fix XSS in ocamldoc
It is possible to craft comments that can expand to unfiltered HTML
fragments.

For example, the following program:

```ocaml
(** {{: "><script>alert(1)</script>"} } *)
let n = 0
```

Would generate a HTML file containing:

```html
<a href=" "><script>alert(1)</script>""> </a>
```

Using this technique it is possible to leak cookies to a third party.

The fix is not perfect since it does not generate usable documentation,
but the generated HTML is harmless.

Note that input like `{{: javascript:alert(1)} }` will still run
arbitrary JS but requires human interaction.
@emillon

This comment has been minimized.

Show comment
Hide comment
@emillon

emillon Dec 7, 2016

Contributor

No problem! CONTRIBUTING.md mentions ordering PR before GPR but does not say anything about other changes. Would you like a doc patch clarifying this in a separate MR?

Contributor

emillon commented Dec 7, 2016

No problem! CONTRIBUTING.md mentions ordering PR before GPR but does not say anything about other changes. Would you like a doc patch clarifying this in a separate MR?

@Octachron Octachron merged commit 545aaef into ocaml:trunk Dec 7, 2016

1 of 2 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@Octachron

This comment has been minimized.

Show comment
Hide comment
@Octachron

Octachron Dec 7, 2016

Contributor

Thanks! I went ahead and merged the pull request (since the CI failure are unrelated).

I agree that updating the CONTRIBUTING.md file would be nice.
If you want to submit a PR , you are very welcome; otherwise I would probably do it in the upcoming days.

Contributor

Octachron commented Dec 7, 2016

Thanks! I went ahead and merged the pull request (since the CI failure are unrelated).

I agree that updating the CONTRIBUTING.md file would be nice.
If you want to submit a PR , you are very welcome; otherwise I would probably do it in the upcoming days.

@emillon emillon deleted the emillon:xss-ocamldoc-link branch Dec 8, 2016

camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment