Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add request body based caching #76

Closed

Conversation

buckley-w-david
Copy link
Contributor

This PR is a stab at #71, caching based on request body.

I implemented it because it seemed like a fairly easy thing to do and I was looking for an excuse to dig through this projects source a little bit to get a better understanding, but I would like to raise the point that this functionality is very dangerous.

The POST method is used to submit an entity to the specified resource, often causing a change in state or side effects on the server.

The difference between PUT and POST is that PUT is idempotent: calling it once or several times successively has the same effect (that is no side effect), where successive identical POST may have additional effects, like passing an order several times.

POST requests are typically for operations like resource creation on the server, and I worry that implementing this feature will lead some users astray, either by caching routes and thus essentially dropping requests since they never go through the server logic, or misinforming less experienced users into flawed usage of POST requests.

What do you think @sh4nks?

resolves #71

@coveralls
Copy link

coveralls commented Jul 20, 2018

Coverage Status

Coverage increased (+0.03%) to 68.467% when pulling d617ca1 on buckley-w-david:cache-request-body into 5d1856b on sh4nks:master.

@sh4nks
Copy link
Collaborator

sh4nks commented Jul 24, 2018

First of all, thanks for you contribution!

Hmm. Now the question is whether we want to have something dangerous in Flask-Caching or not. Chances are that dangerous things will get abused, especially by less experienced people as you have already said. Those users are probably create issues when something is not working as they expect it which in turn does mean more work for me. I am leaning towards -1 on this one.

If there is nevertheless a big demand for this feature I might be willing to accept the PR.

@buckley-w-david
Copy link
Contributor Author

I would agree that it's probably best to consciously exclude this functionality.

@buckley-w-david
Copy link
Contributor Author

@sh4nks I'm going to close this, as I think we're agreed it'd a bad idea

@diogobaeder
Copy link

Hi guys,

Is there any chance this change can be introduced in the library? I'm using it in a project and I really need the caching to work based on request bodies, because the API I'm implementing uses request bodies to receive payloads that describe complex queries made by the client-side, and flask-caching just ignores different bodies and returns wrong results when they change (since it bases on the URL only).

While I understand that inexperienced devs might make bad usage of it (like when caching login views), I don't think it's a strong enough argument to disallow genuine cases like mine to be covered. It could even require a special option, like CACHING_ALLOW_POST or something, but it's important for me (an maybe for other people out there - like when using GraphQL for example) to have this implemented.

Thanks a lot, guys :-)
Diogo

@bzed
Copy link

bzed commented Aug 21, 2019

@sh4nks I think not implementing this feature just because people might do stupid things with it is not helpful - there are people like @diogobaeder and myselft out there, who know what they are doing and would find this super useful. Not all POST requests do bad stuff, sometimes POST is just used because GET requests have a limited lenght....

@gergelypolonkai
Copy link
Collaborator

Hiding it behind a config flag sounds good to me, however…

When cached() is invoked, the app may or may not be initialised (it’s definitely not initialised if you are using an app factory), hence the config is not yet available. Raising an exception from within functions decorated with @cached(request_body=True) sounds rude to me, but how else do you prevent people from using this feature? Raising a warning (is “raising” the correct term here?) might be an option, but you won’t see that warning until your cached endpoint is invoked by a request.

@buckley-w-david
Copy link
Contributor Author

buckley-w-david commented Aug 21, 2019

The RFC for caching http is pretty explicit,

https://tools.ietf.org/html/rfc2616#section-13.10

All methods that might be expected to cause modifications to the origin server's resources MUST be written through to the origin server. This currently includes all methods except for GET and HEAD. A cache MUST NOT reply to such a request from a client before having transmitted the request to the inbound server, and having received a corresponding response from the inbound server. This does not prevent a proxy cache from sending a 100 (Continue) response before the inbound server has sent its final reply.

The Spec seems to be a little self-contradictory but still the interpretation is still not what you're asking for, this is the best summary that I can find. (Also the unexpected feature on iOS that the article mentioned was confirmed as a bug)

After doing some research it seems those who need this functionality typically go to a highly configurable caching proxy like Varnish or Nginx or Build their own

Now I know regardless of how many sources I throw at you you're not going to be happy since it still leaves you without the functionality that you actually need. I am still against adding this functionality to the library since standard compliance is something that I value, however after more reading I do have a suggestion for a way to achieve this if you're still committed to using this library to cache post requests.

A technique suggested for client-side caching of GraphQL queries could be used here as well. If you calculate a digest of the post body on the client-side, and attach it to the POST in a query string, then it doesn't matter that the library itself doesn't inspect and take into account the request body, since you would be taking advantage of the existing functionality of caching based on the path + query string.

Ultimately, @sh4nks and @gergelypolonkai are the maintainers, the decision is theirs. However if you feel strongly about this functionality there is nothing stopping you from forking the repository and applying my patch that adds the functionality, that is what I would suggest as a last resort.

@diogobaeder
Copy link

Thanks for the explanation and suggestion, @buckley-w-david !

For my case specifically, where I use POST not for modifying any resource (in that one particular case) but to allow a well-structured JSON body as specification of the request for data, that kind of caching would help. In the end I just went on and implemented the caching myself, and I understand that caching POST in any case would break the standard - in my case, it would make sense in practice, but since the standard is too rigid (in that it just denies caching for all POST requests just because they "usually" make changes in some cases) I'm in no position to judge the library as doing anything wrong. It's not, it's following the standard, and personally I prefer that, since it covers the standard and isn't bound to create negatively surprising results.

As for the suggestion to put a digest as query-string, I'm not sure that's a good idea. First, because the query-string might not be passed along, thus caching the "generic" endpoint (that doesn't contain the query-string), and second because it breaks the standard as per the RFC above. But, well, in practice it might be helpful for other people than me, I reckon that.

Cheers!
Diogo

@bzed
Copy link

bzed commented Aug 21, 2019

@buckley-w-david well, this is not a reverse proxy server! So rfc2616 does not apply here.
We are talking about caching POST data. If I have an API which delivers always the same stuff when I post the same data to it, I can cache it - where is the problem?

@buckley-w-david
Copy link
Contributor Author

buckley-w-david commented Aug 21, 2019

RFC2616 is not a spec for reverse proxy servers, it's the spec for the entirety HTTP/1.1

EDIT: I realize the excerpt that I quoted may make it seem like that part of the spec only applies to reverse proxy servers, but that is only due to how the RFC defines server.

server
An application program that accepts connections in order to
service requests by sending back responses. Any given program may
be capable of being both a client and a server; our use of these
terms refers only to the role being performed by the program for a
particular connection, rather than to the program's capabilities
in general. Likewise, any server may act as an origin server,
proxy, gateway, or tunnel, switching behavior based on the nature
of each request.

origin server
The server on which a given resource resides or is to be created.

Just because it separates out the "cache" and the "origin server" does not mean that they are not the same machine, or even the same application, it is just what role is being played

@bzed
Copy link

bzed commented Aug 21, 2019

@buckley-w-david but we are in an application here. RFC2616 defines whats happening in the http protocol, like when a ressource that was actually delivered from a server is being cached and for how long. Not what the application does that actually delivers the ressource. That is completely out of the scope of RFC2616.

@buckley-w-david
Copy link
Contributor Author

buckley-w-david commented Aug 21, 2019

I see what you're saying, but I'm not sure I agree. I think we have different ideas of the purpose of this library, I see it primarily as a cache in the HTTP sense of the word, that is easy to integrate into your application, but distinctly separate from it.

I am open to the possibility that that assumption on my part is wrong

@bzed
Copy link

bzed commented Aug 21, 2019

@buckley-w-david well, it basically implements more or less nothing you'd want to have from an http cache. For example it does not handle Vary headers. It ignores Accept-*. It doesn't know how to answer If-Modified-Since requests. I really fail to understand how this library is even close to an http cache...

@bzed
Copy link

bzed commented Aug 21, 2019

@buckley-w-david this library caches the return value of functions, which might (or not) have a route decorator. Maybe they have even more decorators. Maybe there is a middleware modifying the request to or the response from flask (which is actually an interesting thing - I think in the current way this library wouldn't even be able to handle session data or similar things from beaker and others...). And then you usually have some wsgi server, which acually delivers http and speaks http to clients - and wsgi with flask. HTTP caching is done in front of that server - because even the server might modify the request before passing it on...

tl;dr: The caching within flask is far far away from the resulting HTTP....

@buckley-w-david
Copy link
Contributor Author

buckley-w-david commented Aug 21, 2019

Alright, that is by far the most convincing argument I've heard on the subject, you've convinced me that my initial view of the library was wrong.

Given that, I don't see the issue in question to really be about POST requests specifically at all (As you said, we shouldn't really be concerned with HTTP semantics), wouldn't it make more sense if we're going to extend the API to allow for more flexible use to go all the way with it, and allow caching on arbitrary properties of the incoming request?

Instead of the request_body flag to grab the request body from the request, would it make more sense to add the ability to let the user supply their own make_cache_key function, which would accept the request, pull whatever data the user needs and return the digest. (Maybe the POST request code could also still be there just for convenience, but you've convinced me that that's insufficient to cover the actual use case here)

@bzed
Copy link

bzed commented Aug 21, 2019

Given that, I don't see the issue in question to really be about POST requests specifically at all (As you said, we shouldn't really be concerned with HTTP semantics), wouldn't it make more sense if we're going to extend the API to allow for more flexible use to go all the way with it, and allow caching on arbitrary properties of the incoming request?

Absolutely.

Instead of the request_body flag to grab the request body from the request, would it make more sense to add the ability to let the user supply their own make_cache_key function, which would accept the request, pull whatever data the user needs and return the digest. (Maybe the POST request code could also still be there just for convenience, but you've convinced me that that's insufficient to cover the actual use case here)

Yes, that makes a lot of sense! Also this makes even sense for GET requests, as you don't always want to have all args in the cache. Common example is javascript/tracking magic, or bots adding stupid things to a request. On the other hand I might want to require args for caching, but not necessarily let the request fail just because they are missing.
This goes even further to memoize - it does not always make sense to have all parameters of a function in the cache key - sometimes it makes sense to pass handlers to external database connections into a function, without getting new connections from the app context within the function. But you don't want to have a cache key depending on them.

@oblek
Copy link

oblek commented Dec 11, 2019

Thank you @buckley-w-david I implemented your patch and worked exactly as I needed which is by what I understood, the same as @diogobaeder.
I Still think it would be a great idea to include it in the native funtionality of the library.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Caching based on POST body
7 participants