Skip to content

Conversation

keithasaurus
Copy link
Contributor

@keithasaurus keithasaurus commented Oct 11, 2025

On my machine, the benchmarking code below shows approximately a 1.2x speedup in the default case of escape(s, quote=True). The quote=False case remains more or less unchanged on my machine. The underlying logic in the function should be unaltered.

Here's the code I ran to benchmark:

import time, dis

def escape_old(s, quote=True):
    """
    Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true (the default), the quotation mark
    characters, both double quote (") and single quote (') characters are also
    translated.
    """
    s = s.replace("&", "&amp;") # Must be done first!
    s = s.replace("<", "&lt;")
    s = s.replace(">", "&gt;")
    if quote:
        s = s.replace('"', "&quot;")
        s = s.replace('\'', "&#x27;")
    return s

# identical to the code in the commit
def escape(s, quote=True):
    """
    Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true (the default), the quotation mark
    characters, both double quote (") and single quote (') characters are also
    translated.
    """
    s = (
        s.replace("&", "&amp;") # Must be done first!
        .replace("<", "&lt;")
        .replace(">", "&gt;")
    )
    if quote:
        return s.replace('"', "&quot;").replace('\'', "&#x27;")
    return s

test_html_strings = [
    "Hello, world!",
    "1234567890",
    "Simple text with spaces",

    "Hello, &world!",
    "12345<67890",
    "Simple text with spaces&",

    "<b>bold</b>",
    "<i>italic</i>",
    "<u>underline</u>",
    "<p>paragraph</p>",
    "<div>content</div>",

    "<div><script>alert('x')</script></div>",
    "<a href='https://example.com'>link</a>",
    "<img src='x' onerror='alert(1)'>",
    "<b><i>nested</i> bold</b>",
]


for quote in True, False:
    for fn in escape_old, escape:
        start = time.time()
        for _ in range(10000):
            for s in test_html_strings:
                escape(s, quote)
        end  = time.time()
        print(fn.__name__, f"quote={quote}")
        print(end - start)


# uncomment to compare bytecode
# print(dis.dis(escape_old))
#
# print(dis.dis(escape))

The output on my mac is as follows:
Screenshot 2025-10-10 at 11 17 54 PM

The disassembled bytecode after the change is 6 instructions shorter.

@python-cla-bot
Copy link

python-cla-bot bot commented Oct 11, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@StanFromIreland
Copy link
Member

StanFromIreland commented Oct 11, 2025

IIUC your screenshot correctly, the proposed is actually slower when quote=False.

@StanFromIreland
Copy link
Member

I ran the benchmark you provided locally, and increased the number of runs to 50, the results:

escape_old escape diff
quote=True 0.116973 0.119703 2.33% slower
quote=False 0.097689 0.085042 12.95% faster

@mpkocher
Copy link
Contributor

It would be useful to add a wider range of string sizes.

Also, is there a reason to not use str.translate?

D = {"&": "&amp;", "<": "&lt;", ">": "&gt;"}
DQ = {"&": "&amp;", "<": "&lt;", ">": "&gt;", "'":"&quot;", '\'': "&#x27;"}
TD = str.maketrans(D)
TDQ = str.maketrans(DQ)

def escape(sx:str, quote=True) -> str:
    t = TDQ if quote else TD
    return sx.translate(t)

For string of length N, the current escape will be 3-5 N (depending on quote=True), where translate will always be N (a single pass).

For small N, the choice of translate vs replace won't really matter.

@picnixz
Copy link
Member

picnixz commented Oct 11, 2025

The micro-benchmarks are less pronounced:

+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| Benchmark                                                   | escape-ref | escape-new            | escape-alt            |
+=============================================================+============+=======================+=======================+
| escape[Hello, world!]                                       | 103 ns     | 98.9 ns: 1.04x faster | 107 ns: 1.03x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[Hello, world!, quote=False]                          | 73.5 ns    | 70.5 ns: 1.04x faster | 97.5 ns: 1.33x slower |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[1234567890]                                          | 102 ns     | 97.3 ns: 1.05x faster | not significant       |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[1234567890, quote=False]                             | 72.2 ns    | 70.6 ns: 1.02x faster | 99.4 ns: 1.38x slower |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[Simple text with spaces]                             | 116 ns     | 107 ns: 1.09x faster  | 125 ns: 1.07x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[Simple text with spaces, quote=False]                | 79.3 ns    | 76.3 ns: 1.04x faster | 124 ns: 1.56x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[Hello, &world!]                                      | 130 ns     | 123 ns: 1.05x faster  | 161 ns: 1.24x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[Hello, &world!, quote=False]                         | 93.8 ns    | not significant       | 153 ns: 1.63x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[12345<67890]                                         | 124 ns     | 118 ns: 1.05x faster  | 142 ns: 1.14x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[12345<67890, quote=False]                            | 92.6 ns    | 90.3 ns: 1.03x faster | 137 ns: 1.48x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[Simple text with spaces&]                            | 132 ns     | 126 ns: 1.05x faster  | 151 ns: 1.15x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[Simple text with spaces&, quote=False]               | 96.9 ns    | not significant       | 142 ns: 1.47x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<b>bold</b>]                                         | 154 ns     | not significant       | 173 ns: 1.13x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<b>bold</b>, quote=False]                            | 119 ns     | not significant       | 168 ns: 1.41x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<i>italic</i>]                                       | 155 ns     | 151 ns: 1.02x faster  | 183 ns: 1.18x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<i>italic</i>, quote=False]                          | 125 ns     | 122 ns: 1.03x faster  | 179 ns: 1.43x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<u>underline</u>]                                    | 158 ns     | 154 ns: 1.03x faster  | 196 ns: 1.24x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<u>underline</u>, quote=False]                       | 123 ns     | not significant       | 198 ns: 1.62x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<p>paragraph</p>]                                    | 156 ns     | 152 ns: 1.03x faster  | 207 ns: 1.32x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<p>paragraph</p>, quote=False]                       | 123 ns     | 120 ns: 1.02x faster  | 201 ns: 1.64x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<div>content</div>]                                  | 161 ns     | not significant       | 208 ns: 1.30x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<div>content</div>, quote=False]                     | 124 ns     | not significant       | 207 ns: 1.67x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<div><script>alert('x')</script></div>]              | 236 ns     | not significant       | 412 ns: 1.75x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<div><script>alert('x')</script></div>, quote=False] | 170 ns     | not significant       | 381 ns: 2.25x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<a href='https://example.com'>link</a>]              | 211 ns     | not significant       | 350 ns: 1.66x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<a href='https://example.com'>link</a>, quote=False] | 142 ns     | not significant       | 304 ns: 2.14x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<img src='x' onerror='alert(1)'>]                    | 197 ns     | not significant       | 343 ns: 1.74x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<img src='x' onerror='alert(1)'>, quote=False]       | 117 ns     | not significant       | 241 ns: 2.07x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<b><i>nested</i> bold</b>]                           | 200 ns     | 196 ns: 1.02x faster  | 311 ns: 1.55x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| escape[<b><i>nested</i> bold</b>, quote=False]              | 157 ns     | not significant       | 301 ns: 1.92x slower  |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+
| Geometric mean                                              | (ref)      | 1.02x faster          | 1.45x slower          |
+-------------------------------------------------------------+------------+-----------------------+-----------------------+

The str.translate alternative is much slower so not worth it. My benchmarks were done on Python 3.14rc3 as provided by uv, but I don't think there will be much changes between 3.14rc3 and 3.14.0 performance-wise. Your suggestion is slightly faster but not faster enough to warrant the change IMO.

Benchmark script
import pyperf

def escape_ref(s, quote=True):
    s = s.replace("&", "&amp;")
    s = s.replace("<", "&lt;")
    s = s.replace(">", "&gt;")
    if quote:
        s = s.replace('"', "&quot;")
        s = s.replace('\'', "&#x27;")
    return s

def escape_new(s, quote=True):
    s = (
        s.replace("&", "&amp;")
        .replace("<", "&lt;")
        .replace(">", "&gt;")
    )
    if quote:
        return s.replace('"', "&quot;").replace('\'', "&#x27;")
    return s

D = {"&": "&amp;", "<": "&lt;", ">": "&gt;"}
DQ = {"&": "&amp;", "<": "&lt;", ">": "&gt;", '"':"&quot;", '\'': "&#x27;"}
TD = str.maketrans(D)
TDQ = str.maketrans(DQ)

def escape_alt(sx, quote=True):
    t = TDQ if quote else TD
    return sx.translate(t)

test_html_strings = [
    "Hello, world!",
    "1234567890",
    "Simple text with spaces",

    "Hello, &world!",
    "12345<67890",
    "Simple text with spaces&",

    "<b>bold</b>",
    "<i>italic</i>",
    "<u>underline</u>",
    "<p>paragraph</p>",
    "<div>content</div>",

    "<div><script>alert('x')</script></div>",
    "<a href='https://example.com'>link</a>",
    "<img src='x' onerror='alert(1)'>",
    "<b><i>nested</i> bold</b>",
]

def add_cmdline_args(cmd, args):
    cmd.append(args.implementation)

if __name__ == "__main__":
    runner = pyperf.Runner(add_cmdline_args=add_cmdline_args)
    runner.argparser.add_argument(
        "implementation", choices=["ref", "new", "alt"]
    )
    args = runner.parse_args()
    if args.implementation == "new":
        func = escape_new
    elif args.implementation == "alt":
        func = escape_alt
    else:
        func = escape_ref

    for case in test_html_strings:
        runner.bench_func(f"escape[{case}]", func, case)
        runner.bench_func(f"escape[{case}, quote=False]", func, case, False)

@picnixz picnixz added the pending The issue will be closed if no feedback is provided label Oct 11, 2025
@keithasaurus
Copy link
Contributor Author

keithasaurus commented Oct 11, 2025

Just to add a bit of context to my motivation for this, html.escape is used by django (among others) in its html escaping logic, so this function may be called on the order of billions of times per day. The impact of this is just to remove several bytecode instructions per call, meaning it should be more noticeable when escaping small strings; the work of escaping longer strings is dominated by the actual .replace calls. My hope is that this is simple enough to be an uncontroversial micro-optimization.

@picnixz
Copy link
Member

picnixz commented Oct 11, 2025

I think we should leave such improvements to the JIT instead. I think this is where it would shine (where we could reuse the same variable, though I don't know if it's already the case [@Fidget-Spinner is it a JIT feature to be able to optimize such calls?]).

so this function may be called on the order of billions of times per day

Well, the problem is that it's not just this function that could slow down the process. And even if it were called billions of times per day, calls for strings that are already a bit longer don't see any improvements (e.g., "<div><script>alert('x')</script></div>"). For larger texts, I think the bottleneck will the iteration+replace (possibly no-op) rather than the re-assignment (I should still confirm this tomorrow, or someone can check with a larger input as I gave my benchmark scripts).

It's usually not used as a single instruction, and is likely put inside other more complex logic, which itself may or may also be slower. I'm sorry but I think this change is not significant enough. What would be interesting is to benchmark a production server to see if this is indeed a bottleneck.

@cmaloney
Copy link
Contributor

cmaloney commented Oct 11, 2025

For speeding this up, looking at the str.replace implementation that ends up in the case stringlib_replace_single_character I think (go from single character to multi-character for each). That internally counts the number of times the search character appears and, if the count is > 0, allocates a new string then copies the bytes with translation applied.

That means best case (no escapes found, quote=True) scan the string 5 times never copy and worst case scan the string 5 times and allocate a new string + copy 5 times (each .replace() makes its own copy). Definitely some performance could be gained if the best case stays the same or slightly better (scan the string once counting escape vars) and worst case becomes scan string 1 time + copy 1 time.

Unfortunately I don't think this is worth moving to a C implementation that does that specifically (it'll be a lot of one-off code in a security conscious path). .translate tries doing those optimizations somewhat, not sure why it's so much slower (optimizing it then moving to it would be interesting to me but need to get notably better performance). In pure python I don't see a straightforward way to get faster than the str.replace / stringlib implementation while keeping readability/maintainability at the moment though.

@picnixz
Copy link
Member

picnixz commented Oct 11, 2025

An alternative would be to rebuild the string from scratch and perform a ''.join() (or use io.StringIO()). You'll only scan the string once. For large texts I think there could be some improvements but I don't know if it's worth it.

@Fidget-Spinner
Copy link
Member

I think we should leave such improvements to the JIT instead. I think this is where it would shine (where we could reuse the same variable, though I don't know if it's already the case [@Fidget-Spinner is it a JIT feature to be able to optimize such calls?]).

Not at the moment but with TOS caching (I forgot the issue/PR but it's by Mark Shannon) it should become just register moves which is nearly free and some refcounting which is the expensive part.

I'm not on a computer right now but can you examine if the original bytecode says LOAD_FAST_BORROW instead of LOAD_FAST? If so then it's going to have no refcounting at all in the JIT in the future and then it will just be register to register moves without refcounting which are basically free on modern CPUs

@Fidget-Spinner
Copy link
Member

Yeah so I checked and it is indeed LOAD_FAST_BORROW on Python 3.14-3.15. The JIT will be able to optimize all the refcounting away. With the register allocation, this should be free if the JIT does its job in teh future.

@Fidget-Spinner
Copy link
Member

One minor correction: when I say in the future for the JIT, I don't know how far into the future that may be. So if you really need this to be optimized right now, I'd recommend proceeding with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review pending The issue will be closed if no feedback is provided skip news

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants