Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template cache, {% extends %} and globals - design issue? #295

Closed
miracle2k opened this issue Jan 30, 2014 · 15 comments · Fixed by #1244
Closed

Template cache, {% extends %} and globals - design issue? #295

miracle2k opened this issue Jan 30, 2014 · 15 comments · Fixed by #1244
Milestone

Comments

@miracle2k
Copy link

Say I have this template with an extends call:

foo.html:

{% extends 'base.html' %}

I load it:

jinja2_env.get_template('foo.html', globals=my_globals}

This causes two calls to Environment._load_template: First, for foo.html, the for base.html.

Environment._load_template caches both templates.

The crucial point is that for the second call, the "globals" parameter to Environment._load_template is missing my_globals.

If I now attempt to load the base template directly:

jinja2_env.get_template('base.html', globals=my_globals}

The cache will be used, and my globals will not be available.

Using 2.7.2

@mitsuhiko
Copy link
Contributor

Yes. This is currently the case. This is related to the general problems with scoping in Jinja2 that need to be tackled. Unfortunately that is a very big change and I am not sure how to fix this without spending a lot of time on it.

@dsedivec
Copy link

Here are a couple observations for anyone who runs into this in the future, as I have just this evening.

If you do @miracle2k's steps in the opposite order, that could be equally problematic: If you first instantiate base.html with a particular set of globals, but then subsequently instantiate foo.html, base.html will still have its original set of globals, which might be somewhere between unexpected and dangerous.

Of course, I'm pretty sure this breaks as well. Is it documented somewhere that you shouldn't call get_template for the same template with different globals? It's very possible I missed it.

globals_1 = {"x": 42}
globals_2 = {"y": 101}
jinja2_env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))
instance_1 = jinja2_env.get_template("foo.html", globals=globals_1)
instance_2 = jinja2_env.get_template("foo.html", globals=globals_2)
# This fails because instance_1.globals is instance_2.globals.
assert "y" in instance_2.globals

I think you can get around these problems with template globals in exchange for a ~10x performance hit by using a byte code cache and disabling cache_size=0 on your Environment. Example:

class MemoryBytecodeCache(jinja2.BytecodeCache):
    def __init__(self):
        self._cache = {}

    def load_bytecode(self, bucket):
        try:
            bucket.code = self._cache[bucket.key]
        except KeyError:
            return

    def dump_bytecode(self, bucket):
        self._cache[bucket.key] = bucket.code

jinja2_env = jinja2.Environment(
    loader=jinja2.FileSystemLoader("."),
    cache_size=0,
    bytecode_cache=MemoryBytecodeCache(),
)
instance_1 = jinja2_env.get_template("foo.html", globals=globals_1)
instance_2 = jinja2_env.get_template("foo.html", globals=globals_2)
# This now tests true, like I originally expected.
assert "y" in instance_2.globals

Here's a hasty benchmark showing the cost of disabling the template cache (macOS, 2020 MacBook Pro, Python 3.8):

Testing with template cache, without byte code cache
Time per call for  50000 loops:     5.40 μs     5.49 μs     5.55 μs     6.65 μs     7.52 μs
Testing without template cache, without byte code cache
Time per call for    200 loops:   969.33 μs   979.95 μs   999.58 μs  1036.58 μs  1142.10 μs
Testing with template cache, with byte code cache
Time per call for  50000 loops:     5.28 μs     5.41 μs     5.43 μs     5.62 μs     6.68 μs
Testing without template cache, with byte code cache
Time per call for   5000 loops:    44.04 μs    44.31 μs    45.37 μs    46.86 μs    51.69 μs

Code that produced that:

def run_silly_benchmark(*timer_args, repeat=5, **timer_kwargs):
    timer = timeit.Timer(*timer_args, **timer_kwargs)
    num_times, time_taken = timer.autorange()
    times = [time_taken]
    for _ in range(1, repeat):
        times.append(timer.timeit(num_times))
    print(
        f"Time per call for {num_times:6d} loops: ",
        "  ".join(
            f"{(time * 10**6) / num_times:7.2f} μs" for time in sorted(times)
        ),
    )


loader = jinja2.FileSystemLoader(".")
for env in (
    jinja2.Environment(loader=loader),
    jinja2.Environment(loader=loader, cache_size=0),
    jinja2.Environment(loader=loader, bytecode_cache=MemoryBytecodeCache()),
    jinja2.Environment(
        loader=loader, cache_size=0, bytecode_cache=MemoryBytecodeCache()
    ),
):
    print(
        "Testing %s template cache, %s byte code cache"
        % (
            "without" if env.cache is None else "with",
            "without" if env.bytecode_cache is None else "with",
        )
    )
    run_silly_benchmark("env.get_template('foo.html')", globals={"env": env})

foo.html contained the first example template from https://jinja.palletsprojects.com/en/2.11.x/templates/#synopsis.

@davidism
Copy link
Member

I'm not sure if you saw, but there's a PR linked above your comment that addresses this.

@dsedivec
Copy link

I'm not sure if you saw, but there's a PR linked above your comment that addresses this.

I did notice that, thank you. I noticed the PR is a couple months old on an issue that's six years old, and I can't even concretely say that it will be merged, so since I didn't know how much longer this issue might be outstanding I thought my comments might help someone else passing this way.

And please don't anyone on the development team take this as a criticism—I absolutely understand how valuable developer time is, and I am very grateful for all the work that's gone into this project.

@davidism
Copy link
Member

It will be merged, along with all the other PRs we've gotten from the MLH program over the last 3 months. There have been so many I haven't been able to get to them all yet.

@sophiebits-humu
Copy link

This issue is now closed due to #1244, but for what it's worth – the behavior after this PR is still very surprising to me as a user. The new behavior is to merge the new globals in with any globals previously passed but to retain existing globals from the cached template.

This means that data can still leak between calls to get_template (even possibly resulting in leaking data between users, depending on how jinja is used – an incredibly severe bug). It also means the semantic behavior depends on if caching is enabled, rather than caching being solely a performance optimization.

I hope this behavior can be reconsidered before final release.

@davidism
Copy link
Member

The fact that different globals can be passed to get_template for the same template seems to be a slight oversight. The docs don't mention globals much to begin with, but they are pretty clear that the globals, both for envs and templates, shouldn't be modified by the user after a template is created. The options are:

  1. Document more strongly that's it's undefined/unsafe to pass different globals to the same template.
  2. Raise an error if the globals passed don't match the globals already present. This might not be feasible, and could be a very noisy breaking change if people are relying on this undefined behavior.
  3. Somehow account for whatever globals are passed in the cache key. This will be difficult since dicts (and the arbitrary values they might have) are not hashable and we want to do all loading as fast as possible.

I'm inclined to go with option 1. I don't believe it's particularly common to load the same template with different globals. Flask, for example, doesn't expose the fact that templates can have distinct globals at all, and systems such as Ansible don't expose this either.

@sophiebits-humu
Copy link

If I understand correctly, the primary motivation for the cache is to avoid the cost of re-parsing the template HTML multiple times. My hope would be that Jinja could cache that parsing but return a new Template object, which should require only a single small allocation.

If Flask and Ansible don't expose the ability to pass different globals in, then they wouldn't be affected by a minor performance regression as long as the caching of templates without globals was not affected (I agree that that seems to be the more common usage pattern).

I would argue that even with a documented warning, this is a very large footgun (even if someone is not too likely to run into this, it can be an incredibly bad bug if it does crop up) and so would prefer a solution 2 or 3 – but a warning in the documentation would be still a significant improvement over the status quo.

@davidism
Copy link
Member

If I understand correctly, the primary motivation for the cache is to avoid the cost of re-parsing the template HTML multiple times.

The cache stores the loaded template object, keyed on the loader and name. Jinja parses Jinja syntax, not HTML. The parsing is cached by the bytecode cache, not the template cache. The template cache is only to avoid loading the same object more than once, it's not a guarantee that the context the object is rendered with, such as globals, will be unchanged.

I'll add docs, but both of you have identified how to avoid the cache issue: by disabling the cache if you want to change template globals every time in a multi call / thread safe way. Given that the docs already discourage this, I'm classifying this as undefined behavior that you shouldn't be using. I don't agree it's an "incredibly severe bug", there's nothing technically wrong with the current behavior, it's just not the behavior you want.

@sophiebits-humu
Copy link

Jinja parses Jinja syntax, not HTML.

Sorry, sloppy writing on my part.

In case it's not clear, I'm thinking of code something like the following:

env = jinja2.Environment()

@app.route('/sendpasswordreset')
def send_password_reset():
    resetUrl = generate_password_reset_url(g.logged_in_user_id)
    template = env.get_template('password_reset_email.html', globals={'RESET_URL': resetUrl})
    html = template.module.password_reset_email()
    send_email(g.logged_in_user_id, html)

Whereas the very similar-looking

    template = env.get_template('password_reset_email.html')
    html = template.module.password_reset_email(RESET_URL=resetUrl)

is correct (I believe).

If the globals leak between requests, then you can send one person's password reset link to another person. This is an incredibly severe bug (and I indeed had a serious production issue resulting from this behavior). Obviously not all instances of this bug will be equally severe.

@mitsuhiko
Copy link
Contributor

Globals as they were created are not for per-render instantiations. This used to be a much stronger point made originally when they were the only thing passed between imports.

I think Flask might have negatively contributed to people having different expectations in globals because it registers thread locals there (request etc.).

Globals inherently have to be cached. It's not a small performance impact, but has some rather fundamental implications about how Jinja2 operates. I do think that this should be more explicitly called out in the docs though if this causes confusion.

@davidism
Copy link
Member

davidism commented Mar 30, 2021

The correct way to pass data into a template is to use template.render(**kwargs) (or Flask's render_template(name, **kwargs), etc.) to pass context for a render. Globals are intended for data that is global to all renders of the template, or to all templates loaded by an environment.

@sophiebits-humu
Copy link

sophiebits-humu commented Mar 30, 2021

In case you're misunderstanding me: My confusion isn't with expecting global objects (like my env above) to behave as thread locals. It's that I don't expect when calling get_template with different arguments that data is shared between those calls. When a function uses a cache internally my expectation is that all arguments used for computing the return value are also used as part of the cache key.

My mental model is that get_template return a new template each time it is called with different arguments, not that it reuses a single template and mutates it. (I generally don't expect functions named "get" to have any effects.) So I wasn't thinking about this as "changing the globals for an existing template", instead as getting a new template with the right globals. Perhaps this is the wrong mental model but I think it is one that is easy to have. This behavior to me is just as unexpected as if multiple instantiations of Environment shared globals, or as if the variables passed to the .render() call were shared between calls. Let me know if I'm misunderstanding something.

Certainly I realize now that globals were not intended for this. But I don't think it is at all obvious from the API itself that there is such a meaningful difference in the behavior between passing variables as globals= vs passing to .render() and that's why I say it is a footgun. In my mind this is dangerous enough that I would recommend to anyone using Jinja that they lint against using the globals= argument to get_template in their own codebases or monkeypatch it to throw an error. (I don't think docs are a full solution because people don't reliably read docs.)

@davidism
Copy link
Member

davidism commented Mar 31, 2021

The op's issue was that the template got loaded incidentally as part of extends, so it got the environment globals only. Then when it was loaded directly it was already cached and get_template ignored the template-specific globals that were given.

On the other hand, your issue is that it wasn't clear that globals are not for render-specific data. I understand how you could arrive at that, but the only thing I can do about it is update the docs and hope that people can't misinterpret them again.

#1244 at least allows updating the template globals like the op wanted without clobbering the environment globals. Yes, they're still globals and not local to each render of the template, but that's by definition.

If we disallow updating template globals, we're back at the op's issue and your issue. We could raise an error so you wouldn't have been able to do what you tried, but that doesn't change the op's issue. If we try to cache based on the globals as well, that allows what you were trying to do, but we want to discourage that not enable it.

@sophiebits-humu
Copy link

My proposal given my understanding of your constraints would be to either (a) include the globals in the cache key (even just by dict object identity) – even if that isn't something you want to encourage – or (b) not cache invocations to get_template that pass any globals. That said, I haven't necessarily meant to suggest any particular solution so far – my intent right now is more to make sure I'm properly communicating why I think this is a pitfall large enough to make it worth seriously considering making a code change for.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants