Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve escape_html_attr performance #4024

Merged
merged 1 commit into from
Feb 21, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
58 changes: 24 additions & 34 deletions packages/kit/src/utils/escape.js
Original file line number Diff line number Diff line change
Expand Up @@ -24,39 +24,6 @@ export function escape_json_in_html(val) {
);
}

/**
* @param str {string} string to escape
* @param dict {Record<string, string>} dictionary of character replacements
* @param unicode_encoder {function(number): string} encoder to use for high unicode characters
* @returns {string}
*/
function escape(str, dict, unicode_encoder) {
let result = '';

for (let i = 0; i < str.length; i += 1) {
const char = str.charAt(i);
const code = char.charCodeAt(0);

if (char in dict) {
result += dict[char];
} else if (code >= 0xd800 && code <= 0xdfff) {
const next = str.charCodeAt(i + 1);

// If this is the beginning of a [high, low] surrogate pair,
// add the next two characters, otherwise escape
if (code <= 0xdbff && next >= 0xdc00 && next <= 0xdfff) {
result += char + str[++i];
} else {
result += unicode_encoder(code);
}
} else {
result += char;
}
}

return result;
}

/**
* When inside a double-quoted attribute value, only `&` and `"` hold special meaning.
* @see https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state
Expand All @@ -67,6 +34,20 @@ const escape_html_attr_dict = {
'"': '&quot;'
};

const escape_html_attr_regex = new RegExp(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment about why this is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete < and >
they break HTML if not escaped
we had issue #3773

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we fixed it with #3798 and #3804

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PH4NTOMiki That was about escape_json_in_html, this is about escape_html_attr.

@benmccann Sorry about the comments, the previous code didn't have them and I thought they weren't necessary. I'll add some.

// special characters
`[${Object.keys(escape_html_attr_dict).join('')}]|` +
// high surrogate without paired low surrogate
'[\\ud800-\\udbff](?![\\udc00-\\udfff])|' +
// a valid surrogate pair, the only match with 2 code units
// we match it so that we can match unpaired low surrogates in the same pass
// TODO: use lookbehind assertions once they are widely supported: (?<![\ud800-udbff])[\udc00-\udfff]
'[\\ud800-\\udbff][\\udc00-\\udfff]|' +
// unpaired low surrogate (see previous match)
'[\\udc00-\\udfff]',
'g'
);

/**
* Formats a string to be used as an attribute's value in raw HTML.
*
Expand All @@ -78,5 +59,14 @@ const escape_html_attr_dict = {
* @example const html = `<tag data-value=${escape_html_attr('value')}>...</tag>`;
*/
export function escape_html_attr(str) {
return '"' + escape(str, escape_html_attr_dict, (code) => `&#${code};`) + '"';
const escaped_str = str.replace(escape_html_attr_regex, (match) => {
if (match.length === 2) {
// valid surrogate pair
return match;
}

return escape_html_attr_dict[match] ?? `&#${match.charCodeAt(0)};`;
});

return `"${escaped_str}"`;
}