formatResult and formatSelection should escape #375

Closed
wants to merge 1 commit into
from

Projects

None yet

5 participants

I have followed a number of issues revolving around escapeMarkup that seem to recur, and I
believe the solutions you are choosing are good, but don't get at the root cause of the problem.
I believe the root of the problem is that it is used in the wrong place.

If formatSelection and formatResult are expected to return HTML, then you should not be calling
any escaping function on their output. Instead, you should expect that the job of "formatting" is to convert text into HTML, which includes escaping.

This means that your impl of formatSelection and formatResult should call escapeMarkup, and that escapeMarkup needs to be robust enough to sanitize any string (it shouldn't expect pre-escaped text).

NOTE: This is not yet tested. Might not even run. I'll take a look in a bit, but unless you're expecting to do a new release really soon, my priority is to fix up the version I have now (3.1).

hitsthings formatResult and formatSelection should escape
Don't call escapeMarkup on the result of formatSelection and formatResult.

Instead, modify the default formatSeleciton and formatResult to escape their text inputs.
That is: if the point of formatSelection and formatResult is to take in text and return HTML,
they should properly escape their text.

This modification makes the escapeMarkup function more robust (actually escapes HTML, not just &) which makes
it work for remote data (which it didn't before because it would have allowed through < and > characters).
1817629
Contributor

If formatSelection and formatResult are expected to return HTML, then you should not be calling
any escaping function on their output. Instead, you should expect that the job of "formatting" is to
convert text into HTML, which includes escaping.

not exactly correct. the job of renderers such as formatResult is to combine server-side data with an html template to produce the final html. you make the assumption that server-side data is always text, which i do not think is safe to make.

This means that your impl of formatSelection and formatResult should call escapeMarkup,
and that escapeMarkup needs to be robust enough to sanitize any string (it shouldn't expect
pre-escaped text).

this puts the burden of performing the escaping on everyone who overrides a renderer. it also introduces a barrier of knowledge: you have to know that you have to escape markup and you have to know how to escape it properly.

what i like about the scheme as it stands right now is that it is flexible:

  • if you do not want to escape the markup replace escapeMarkup with a noop
  • if you want to change how markup is escaped, you can do it in one place
  • if you want to implement your scheme: replace the default with a noop and use your own escaping in renderers
@ivaynberg ivaynberg closed this Sep 5, 2012

not exactly correct. the job of renderers such as formatResult is to combine server-side data with an html template to produce the final html. you make the assumption that server-side data is always text, which i do not think is safe to make.

Actually I think it's the only safe assumption to make. You could assume that servide-side data is always HTML and I would consider that reasonable. But you currently you are making the assumption that server-side data is "HTML, where everything is escaped that needs to be, but magically doesn't have any HTML entities in it". I say this is your assumption because:

  1. You don't actually escape anything before inserting it as HTML. If I pass through "<script>alert(1)</script>" as the data's text, then you happily execute the script. Therefore, you are trusting that the data is HTML.
  2. Then you deliberately destroy HTML entities by turning "&" into "&" This means if I wanted to display the text "<script>alert(1)</script>" it would be impossible. "&lt;script&gt;alert(1)&lt;/script&gt;", which is properly escaped HTML, becomes incorrectly double escaped.

This halfway-there assumption is the worst option.

this puts the burden of performing the escaping on everyone who overrides a renderer. it also introduces a barrier of knowledge: you have to know that you have to escape markup and you have to know how to escape it properly.

As stated in point #1, this knowledge is already required, and the burden is already on the formatX implementer to escape things. <script>alert(1)</script> will still get executed.

if you do not want to escape the markup replace escapeMarkup with a noop

The downside to this is that what you are doing is GOOD for one specific case: when pulling text from elements. Since you are calling .text(), you are implicitly unescaping the HTML that was in the document. That escaping needs to happen. And your current usage is actually inadequate already (try typing <script>alert(1)</script> into the Auto-tokenization box on your demo page). This is why I made escapeMarkup more robust - you need to escape more than just &.

if you want to change how markup is escaped, you can do it in one place

You can still do that with this approach. escapeMarkup is still an option and can be used in formatX functions.

if you want to implement your scheme: replace the default with a noop and use your own escaping in renderers

Yup, I'll have to set a more robust default escapeMarkup impl to handle the data-from-markup case, and then whenever I override formatSelection and formatResult, I'll have to also override escapeMarkup on that instance to just be a noop. Doing the right thing should be simpler than that.

Contributor

the difference in thinking is that i am only trying to sanitize values coming from the server side, while you are trying to cover the full range. this is why, for example, the default escape markup only escapes the ampersand - when calling text() markup is already escaped so there is no reason to escape the less than sign because it is already escaped as ampersand gt.

i need to think about this some more. if an all-encompassing approach to escaping markup is needed then your approach works better. however, it is too big a change to make in the 3.x version because it will break some uses.

polish the code and lets revisit this in the 4.0 timeframe. i want to be done with 3.x soon anyways, maybe one more release.

@ivaynberg ivaynberg reopened this Sep 6, 2012

Please see my response to issue #290, it works for me in production and may be adequate for the remainder of 3.x: #290 (comment)

I am evaluating select2 to migrate from chosen. One issue I found bit tricky is to handle the XSS data.

@dickensian I tried your double encoding solution. Initially it was looking good, but then I found few issues with it;

  1. We cannot search on an encoded item, eg: a single quote ' or <
  2. If I type in &, it shows all the elements which has an encoded value

I cannot add attachments here, so I am pointing to our issue tracking system See issue #12

See attachments,

list with single quote and xss

single quote does NOT match

ampersand does match

I am quite new to select2; So what I understand looking into this XSS handling issue, escapeMarkup is not hooked in the right place.

Eg: if I plugin my escapeMarkup method which encodes html element, I see that the list shows values prefixed by <span class="select2-match"></span>

escapeMarkup:function (markup) {
    return $j("<div/>").text(markup).html();
}

What is available to escapeMarkup function cannot be html encoded or escaped; It should have done few steps before. It looks to me that formatResult could be overridden to achieve the same but then it beats the purpose of escapeMarkup method.

Contributor

this should be fixed in master.

@ivaynberg ivaynberg closed this Feb 10, 2013

Awesome! Thanks a ton! The fix looks good. I had thought formatSelection
would also need to escape the input text, but I probably am missing
something. I'll definitely check this out. Thanks again!
On 10/02/2013 4:46 PM, "Igor Vaynberg" notifications@github.com wrote:

this should be fixed in master.


Reply to this email directly or view it on GitHubhttps://github.com/ivaynberg/select2/pull/375#issuecomment-13344826..

Thanks for the update. Our migration to select2 is on hold; will let you guys know once I get a chance to look at it.

@asifabdul thanks for the reminder!

Also make sure you convert the markup into a string or you will get an error like Object has no html() method
I'm using a custom formatResult method that grabs data- attributes from the <option> elements.

Example:

    var _formatResult = function(nodeData, elem){

        var $po = $(nodeData.element),
            productName = $.trim($po.data('productName')),
            html = '';

        // Escape the 'productName' for JavaScript XSS.
        html += '<span class="product productName">' + $('<div/>').text(productName).html() + '</span>';

        return html;        

    };
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment