Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Card data cache in NGINX #3966

Closed
anki-code opened this issue Dec 15, 2016 · 12 comments
Closed

Card data cache in NGINX #3966

anki-code opened this issue Dec 15, 2016 · 12 comments

Comments

@anki-code
Copy link

anki-code commented Dec 15, 2016

Hi guys! Thank you for great product!

I have many cards which have a data on yesterday and earlier. I try to found the way to cache cards data once a day and I've found the one — using NGINX reverse proxy cache as frontend for Metabase.

Warning! It's solution have security issue. Read posts below.

// For metabase 0.22-snapshot
# apt-get install nginx-extras && mkdir /var/cache/nginx
# cat /etc/nginx/nginx.conf | grep cache
       proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=cache_all:32m inactive=8h;

# cat /etc/nginx/sites-enabled/metabase
server {
	listen 80;
	listen [::]:80;
	server_name 127.0.0.1;

	location / {
	    proxy_pass http://127.0.0.1:3000; # Metabase here

	    location ~ /api/card((?!/42/|/41/)/[0-9]*/)query { 
                # cache all cards data without card 42 and card 41 (they have realtime data)

		if ($http_referer !~ /dashboard/){ 
                    #cache only cards on dashboard
                    set $no_cache 1;
		}
                proxy_no_cache $no_cache;
                proxy_cache_bypass $no_cache;

		proxy_pass http://127.0.0.1:3000;
		proxy_cache_methods POST;
                proxy_cache_valid 8h;                      # cache lifetime
		proxy_ignore_headers Cache-Control Expires;     # ignore Metabase cache controls 
		proxy_cache cache_all;
		proxy_cache_key "$request_uri|$request_body";
		proxy_buffers 8 32k;
		proxy_buffer_size 64k;
		add_header X-MBCache $upstream_cache_status;
	    }

	    location ~ /api/card/\d+ {
	        proxy_pass http://127.0.0.1:3000;

	             if ($request_method ~ PUT) {
                        # when the card was edited reset the cache for this card
			access_by_lua 'os.execute("find /var/cache/nginx -type f -exec grep -q \\"".. ngx.var.request_uri .."/\\"  {} \\\; -delete ")';
			add_header X-MBCache REMOVED;
	            }
	    }
	}
}

It works perfect — all cards data will cache and charts are loaded fast like crazy.

⬇️ Please click "like" instead of "+1" comment

@anki-code anki-code changed the title Caching card data with Nginx: need to fix Last-Modified header Caching card data with Nginx: new header with card editing last time Dec 15, 2016
@anki-code
Copy link
Author

anki-code commented Dec 15, 2016

Hello guys from #1666 😈

@anki-code
Copy link
Author

anki-code commented Dec 15, 2016

Guys, I've made a complete solution. For remove cache when card is edited we can use lua:

location ~ /api/card/\d+ {
	proxy_pass http://127.0.0.1:3000;

	if ($request_method ~ PUT) {
		access_by_lua 'os.execute("find /var/cache/nginx -type f -exec grep -q \\"".. ngx.var.request_uri .."/\\"  {} \\\; -delete ")';
		add_header X-MBCache REMOVED;
	}
}

The code in the first message has updated. It's complete solution for caching cards now. I don't need any headers from Metabase.

Thanks for watching :)

@anki-code anki-code changed the title Caching card data with Nginx: new header with card editing last time Caching card data with Nginx Dec 15, 2016
@anki-code anki-code changed the title Caching card data with Nginx Card data cache in NGINX Dec 15, 2016
@tlrobinson
Copy link
Contributor

@anki-code Just be aware you might be leaking cached data, e.x. curl -X POST "http://NGINX_HOSTNAME/api/card/[1-100]/query"

I like the idea of using an HTTP cache, as long as we could figure out the authentication issue. It might be possible with some careful combination of HTTP cache headers.

@anki-code
Copy link
Author

anki-code commented Dec 16, 2016

@tlrobinson I'm not clearly understand about "leaking". Could you please give me more details?

If you want to say that unauthorize requests to /api/card/*/query can replace the cache to wrong data — it's not. Because proxy_cache_valid caches only 200 OK request. If user not authorized then api returns 404.

@tlrobinson
Copy link
Contributor

@anki-code Metabase's authentication/authorization checks won't be run for cached requests because it will be served directly from nginx, so an unauthenticated user could just try brute force every card's URL until they find ones that are cached.

Try running loading a card in Metabase (to ensure it's cached) then run the same request (using curl or another http client) with the Cookie header removed entirely:

$ curl 'http://localhost:3001/api/card/1/query' \
  -H 'Cookie: metabase.SESSION_ID=01ceb2b2-8ed0-48f7-bf6d-d20ff401210d' \
  -H 'Content-Type: application/json' \
  -H 'Referer: http://localhost:3001/dash/1' \
  --data-binary '{"parameters":[]}' \
  --compressed
{"finished_at":"2016-12-16T01:23:54.912Z","started_at":"2016-12-16T01:23:54.825Z","json_query":{"database":1,"type":"query","query":{"source_table":1,"aggregation":["count"],"breakout":[],"filter":[]},"parameters":[],"constraints":{"max-results":10000,"max-results-bare-rows":2000}},"status":"completed","id":3,"uuid":"00906475-64c3-43bf-b293-6170803a3c55","row_count":1,"running_time":87,"data":{"cols":[{"description":null,"table_id":null,"special_type":"type/Number","name":"count","source":"aggregation","extra_info":{},"id":null,"target":null,"display_name":"count","base_type":"type/Integer"}],"columns":["count"],"rows":[[17624]],"native_form":{"query":"SELECT count(*) AS \"count\" FROM \"PUBLIC\".\"ORDERS\"","params":null}}}%

$ curl 'http://localhost:3001/api/card/1/query' \
  -H 'Content-Type: application/json' \
  -H 'Referer: http://localhost:3001/dash/1' \
  --data-binary '{"parameters":[]}' \
  --compressed
{"finished_at":"2016-12-16T01:23:54.912Z","started_at":"2016-12-16T01:23:54.825Z","json_query":{"database":1,"type":"query","query":{"source_table":1,"aggregation":["count"],"breakout":[],"filter":[]},"parameters":[],"constraints":{"max-results":10000,"max-results-bare-rows":2000}},"status":"completed","id":3,"uuid":"00906475-64c3-43bf-b293-6170803a3c55","row_count":1,"running_time":87,"data":{"cols":[{"description":null,"table_id":null,"special_type":"type/Number","name":"count","source":"aggregation","extra_info":{},"id":null,"target":null,"display_name":"count","base_type":"type/Integer"}],"columns":["count"],"rows":[[17624]],"native_form":{"query":"SELECT count(*) AS \"count\" FROM \"PUBLIC\".\"ORDERS\"","params":null}}}%

As you can see I was able to get the cached data without being authenticated with a Cookie. Not good, especially if your server is on the public internet.

@tlrobinson
Copy link
Contributor

tlrobinson commented Dec 16, 2016

It seems it's possible to have an intermediate HTTP proxy cache a response but still validate the request is authorized by sending it to the origin server.

Basically Cache-Control: public, max-age=0 says that the resource can be cached and shared among users but must be revalidated by the origin server on every request. If the user is authorized the backend would return a 304 response code and the cache would return the response.

This would require changes to Metabase itself.

@anki-code
Copy link
Author

Another solution: Metabase and Proxy have a secret key and crypt the user group id and save to user cookie. When user opens the dashboard then Proxy decrypt cookie and set proxy_cache_key to user group id.

Pros: don't need additional request to Metabase server
Cons: cache could have a many copies of one chart (when different groups have access to one chart)

But this solution require changes to Metabase too.
And we don't talk about client side browser cache.

Yep, you're right. The solution above works for me only in private network.

@tlrobinson
Copy link
Contributor

We'd want to keep all authentication/authorization logic in Metabase itself, especially as we add new permissions features like the upcoming collections. It would be preferable if the proxy didn't require special configuration.

Proxying the request through to the backend won't add much latency since the nginx and Metabase will likely be on the same server/network.

@j005u
Copy link

j005u commented Dec 19, 2016

In certain cases, it could be enough to just check the JWT auth tokens already in the request (at least for google authenticated users, not sure if present for others) directly in nginx: https://github.com/auth0/nginx-jwt

You can also make nginx do a seperate request to the backend with all the request headers intact which then expects a 200 or 304 response for access granted/denied. The request can then be served from cache, sent to another backend service etc. This way metabase would only need to check the current users permissions and leave response serving to nginx. See: http://nginx.org/en/docs/http/ngx_http_auth_request_module.html

Possibly could be combined well with @tlrobinson's answer.

I have a little bit of experience with this, let me know if I could help.

@dakanji
Copy link

dakanji commented Dec 30, 2016

@tlrobinson, in response to #3966 (comment), if you test against the amendment to @anki-code's configuation given in response to your query on Stackoverflow, there will be no leakage.

Each applicable request will be authenticated first before the request is served ... whether from cache or not as per standard Nginx configuration.

@anki-code
Copy link
Author

For v.0.24.0 /dash/ should be changed to /dashboard/ in first message config:

		if ($http_referer !~ /dash/){ 
                    #cache only cards on dashboard

@salsakran
Copy link
Contributor

closing since we have a cache now and it works inside MB rather than via nginx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants