New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboards not loading when using https #8312

Closed
MrZeroo00 opened this Issue Aug 14, 2018 · 15 comments

Comments

Projects
None yet
8 participants
@MrZeroo00

MrZeroo00 commented Aug 14, 2018

Hi there,

I set up the latest version of Metabase with Nginx as a reverse proxy and PostgreSQL as database engine in a Dockerized environment. Everything works fine with http.

I recently switched to https (Let's Encrypt + Nginx in a Dockerized environement, docker-compose.yaml available here: https://dsc.cloud/badr/note-GzzLARseTc.txt). I’m not using Jetty https features at all since Metabase is served through Nginx.
Most of the features work fine, I only have a problem with Dashboards not loading. I get this error indefinitely:

Blockquote Still Waiting...
This usually takes an average of 0 seconds.
(This is a bit long for a dashboard)

Screenshot here.
All the other features that we working before (questions, raw data etc.) stop working after trying to load a dashboard.

Here's my Ngix conf:

listen      80;
listen [::]:80;
server_name metabase.xxx.com;

location / {
    return 301 https://metabase.xxx.com$request_uri;
}

#for certbot challenges (renewal process)
location ~ /.well-known/acme-challenge {
    allow all;
    root /data/letsencrypt;
}
}

server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name metabase.xxx.com;

server_tokens off;

ssl on;

ssl_certificate /etc/letsencrypt/live/metabase.xxx.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/metabase.xxx.com/privkey.pem;

ssl_buffer_size 8k;

ssl_dhparam /etc/ssl/certs/dhparam-2048.pem;

ssl_protocols TLSv1.2 TLSv1.1 TLSv1;
ssl_prefer_server_ciphers on;

ssl_ciphers ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5;

ssl_ecdh_curve secp384r1;
ssl_session_tickets off;

# OCSP stapling
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4;

location / {
    proxy_pass http://metabase:3000;
    proxy_connect_timeout 600;
    proxy_send_timeout 600;
    proxy_read_timeout 600;
    send_timeout 600;
}

Any ideas? This seems related to the SSL/TLS configuration, the very same setup works without SSL/TLS.
Thanks!

Configuration details

  • Your browser and the version: (e.x. Chrome 52.1, Firefox 48.0, IE 10)
    Chrome 68.0.3440.106
  • Your operating system: (e.x. OS X 10, Windows XP, etc)
    macOS 10.13.6
  • Your databases: (e.x. MySQL, Postgres, MongoDB, …)
    Postgres & H2 (sample dataset)
  • Metabase version: (e.x. 0.19.3)
    0.30.0
  • Metabase hosting environment: (e.x. Mac app, Elastic Beanstalk, Docker, Heroku, Linux/Ubuntu 12)
    Docker
  • Metabase internal database: (e.x. H2 (default), MySQL, Postgres)
    Postgres
@MrZeroo00

This comment has been minimized.

Show comment
Hide comment
@MrZeroo00

MrZeroo00 Aug 14, 2018

Thanks to jornh for pointing to this thread on the discourse, it helped me fix the issue.

Somehow switching from enforcing http2 to http1 fixed the issue:

listen 443 ssl http2;
listen [::]:443 ssl http2;

to

listen 443 ssl;
listen [::]:443 ssl;

I honestly have no idea why this works... I will not close this issue if you want to get to bottom of this.

MrZeroo00 commented Aug 14, 2018

Thanks to jornh for pointing to this thread on the discourse, it helped me fix the issue.

Somehow switching from enforcing http2 to http1 fixed the issue:

listen 443 ssl http2;
listen [::]:443 ssl http2;

to

listen 443 ssl;
listen [::]:443 ssl;

I honestly have no idea why this works... I will not close this issue if you want to get to bottom of this.

@rjurado01

This comment has been minimized.

Show comment
Hide comment
@rjurado01

rjurado01 Aug 27, 2018

We are using http1 but the problem persists.
If we remove ssl it works. This is strange.

Edited

Metabase conf:

  • Your browser and the version: Chrome 68.0
  • Your operating system: Ubuntu
  • Your databases: MongoDB
  • Metabase version: 0.30.1
  • Metabase hosting environment: Docker
  • Metabase internal database: Postgres

Ngix conf:

server {
  listen 80;
  server_name domain.com;
  return 301 https://domain.com$request_uri;
}

server {
    listen       443 ssl;
    server_name  domain.com;

    ssl on;
    ssl_certificate     /etc/nginx/ssl/domain.com.certs/domain.crt;
    ssl_certificate_key /etc/nginx/ssl/domain.com.certs/domain.key;

    access_log  /var/log/nginx/metabase-access;
    error_log   /var/log/nginx/metabase-error;

    location / {
        proxy_pass http://127.0.0.1:9009;
        proxy_set_header    Host                $host;
        proxy_set_header    X-Real-IP            $remote_addr;
        proxy_set_header    X-Forwarded-For        $proxy_add_x_forwarded_for;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
    }
}

Docker run command used:

docker run -d --name metabase  \
  -p 127.0.0.1:9009:3000 \
  -v `pwd`/metabase-data:/metabase-data \
  -e "MB_DB_FILE=/metabase-data/metabase.db" \
  metabase/metabase

rjurado01 commented Aug 27, 2018

We are using http1 but the problem persists.
If we remove ssl it works. This is strange.

Edited

Metabase conf:

  • Your browser and the version: Chrome 68.0
  • Your operating system: Ubuntu
  • Your databases: MongoDB
  • Metabase version: 0.30.1
  • Metabase hosting environment: Docker
  • Metabase internal database: Postgres

Ngix conf:

server {
  listen 80;
  server_name domain.com;
  return 301 https://domain.com$request_uri;
}

server {
    listen       443 ssl;
    server_name  domain.com;

    ssl on;
    ssl_certificate     /etc/nginx/ssl/domain.com.certs/domain.crt;
    ssl_certificate_key /etc/nginx/ssl/domain.com.certs/domain.key;

    access_log  /var/log/nginx/metabase-access;
    error_log   /var/log/nginx/metabase-error;

    location / {
        proxy_pass http://127.0.0.1:9009;
        proxy_set_header    Host                $host;
        proxy_set_header    X-Real-IP            $remote_addr;
        proxy_set_header    X-Forwarded-For        $proxy_add_x_forwarded_for;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
    }
}

Docker run command used:

docker run -d --name metabase  \
  -p 127.0.0.1:9009:3000 \
  -v `pwd`/metabase-data:/metabase-data \
  -e "MB_DB_FILE=/metabase-data/metabase.db" \
  metabase/metabase
@jornh

This comment has been minimized.

Show comment
Hide comment
@jornh

jornh Aug 27, 2018

Contributor

@rjurado01 please share additional details on your config. Otherwise it's next to impossible to try to help out now that MrZeroo00 got it working.

Is your setup differing in any way from what MrZeroo00 shared above? Can you get closer to the setup above?

What are the steps for others to reproduce the problem you experience?

Contributor

jornh commented Aug 27, 2018

@rjurado01 please share additional details on your config. Otherwise it's next to impossible to try to help out now that MrZeroo00 got it working.

Is your setup differing in any way from what MrZeroo00 shared above? Can you get closer to the setup above?

What are the steps for others to reproduce the problem you experience?

@ianmartorell

This comment has been minimized.

Show comment
Hide comment
@ianmartorell

ianmartorell Aug 27, 2018

We were having this issue and disabling HTTP/2 seems to have fixed it, no more hangs for now. This doesn't seem like a great solution though.

ianmartorell commented Aug 27, 2018

We were having this issue and disabling HTTP/2 seems to have fixed it, no more hangs for now. This doesn't seem like a great solution though.

@jornh

This comment has been minimized.

Show comment
Hide comment
@jornh

jornh Aug 27, 2018

Contributor

@ianmartorell fully agree, it's desirable to get it working with HTTP/2 as well. (I also think that's why @MrZeroo00 left the issue open 👍)

All additional input such as either browser console errors like #8388 (comment) (which may or may not be related) or other kind of debug/logging or Wireshark dumps highlighting the difference between running this with/without HTTP/2 etc. any of you guys can bring to the table with v0.30.1 is helpful in getting to the bottom of the issue and is highly appreciated.

Contributor

jornh commented Aug 27, 2018

@ianmartorell fully agree, it's desirable to get it working with HTTP/2 as well. (I also think that's why @MrZeroo00 left the issue open 👍)

All additional input such as either browser console errors like #8388 (comment) (which may or may not be related) or other kind of debug/logging or Wireshark dumps highlighting the difference between running this with/without HTTP/2 etc. any of you guys can bring to the table with v0.30.1 is helpful in getting to the bottom of the issue and is highly appreciated.

@rjurado01

This comment has been minimized.

Show comment
Hide comment
@rjurado01

rjurado01 Aug 28, 2018

@jornh I have updated my comment with more info.

Steps for reproduce are the same that @MrZeroo00 explain.
When we load a Dashboard or a X-RAYS I get this error indefinitely:

Blockquote Still Waiting...
This usually takes an average of 0 seconds.
(This is a bit long for a dashboard)

And all the other features that we working before (questions, raw data etc.) stop working after that (we need to restart Metabase).

Maybe it has to do with making several requests at the same time.

rjurado01 commented Aug 28, 2018

@jornh I have updated my comment with more info.

Steps for reproduce are the same that @MrZeroo00 explain.
When we load a Dashboard or a X-RAYS I get this error indefinitely:

Blockquote Still Waiting...
This usually takes an average of 0 seconds.
(This is a bit long for a dashboard)

And all the other features that we working before (questions, raw data etc.) stop working after that (we need to restart Metabase).

Maybe it has to do with making several requests at the same time.

@oliverxchen

This comment has been minimized.

Show comment
Hide comment
@oliverxchen

oliverxchen Aug 29, 2018

We're getting similar behaviour. We'll try disabling http2, but disabling ssl is not an option. If x-rays is really the root cause of the connection issues, would #8411 be easier to address?

oliverxchen commented Aug 29, 2018

We're getting similar behaviour. We'll try disabling http2, but disabling ssl is not an option. If x-rays is really the root cause of the connection issues, would #8411 be easier to address?

@mpiacenza

This comment has been minimized.

Show comment
Hide comment
@mpiacenza

mpiacenza Aug 29, 2018

I'm seeing the same issue- disabled http2 but still see a problem. It seems to be hanging jetty threads within the JVM - each call to the xray dashboard hangs a few more. I was able to get a thread dump, and believe the hanging threads look like this:

"qtp49705985-24" #24 prio=5 os_prio=0 tid=0x0000557e6df44800 nid=0x2d waiting on condition [0x00007f5307b45000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000007a8589c68> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at clojure.core$promise$reify__8144.deref(core.clj:7029)
	at clojure.core$deref.invokeStatic(core.clj:2312)
	at clojure.core$deref.invoke(core.clj:2298)
	at clojure.core.async$_LT__BANG__BANG_.invokeStatic(async.clj:115)
	at clojure.core.async$_LT__BANG__BANG_.invoke(async.clj:107)
	at metabase.api.common$cancellable_json_response.invokeStatic(common.clj:472)
	at metabase.api.common$cancellable_json_response.invoke(common.clj:463)
	at metabase.api.dataset$fn__43466.invokeStatic(dataset.clj:43)
	at metabase.api.dataset$fn__43466.invoke(dataset.clj:34)
	at compojure.core$make_route$fn__14528.invoke(core.clj:135)
	at compojure.core$wrap_route_middleware$fn__14521.invoke(core.clj:122)
	at compojure.core$wrap_route_info$fn__14525.invoke(core.clj:126)
	at compojure.core$if_route$fn__14477.invoke(core.clj:45)
	at compojure.core$if_method$fn__14467.invoke(core.clj:27)
	at compojure.core$routing$fn__14535.invoke(core.clj:151)
	at clojure.core$some.invokeStatic(core.clj:2693)
	at clojure.core$some.invoke(core.clj:2684)
	at compojure.core$routing.invokeStatic(core.clj:151)
	at compojure.core$routing.doInvoke(core.clj:148)
	at clojure.lang.RestFn.applyTo(RestFn.java:139)
	at clojure.core$apply.invokeStatic(core.clj:659)
	at clojure.core$apply.invoke(core.clj:652)
	at compojure.core$routes$fn__14539.invoke(core.clj:156)
	at metabase.middleware$enforce_authentication$fn__55250.invoke(middleware.clj:116)
	at compojure.core$routing$fn__14535.invoke(core.clj:151)
	at clojure.core$some.invokeStatic(core.clj:2693)
	at clojure.core$some.invoke(core.clj:2684)
	at compojure.core$routing.invokeStatic(core.clj:151)
	at compojure.core$routing.doInvoke(core.clj:148)
	at clojure.lang.RestFn.invoke(RestFn.java:423)
	at metabase.api.routes$fn__55398.invokeStatic(routes.clj:65)
	at metabase.api.routes$fn__55398.invoke(routes.clj:65)
	at compojure.core$if_context$fn__14559.invoke(core.clj:220)
	at compojure.core$routing$fn__14535.invoke(core.clj:151)
	at clojure.core$some.invokeStatic(core.clj:2693)
	at clojure.core$some.invoke(core.clj:2684)
	at compojure.core$routing.invokeStatic(core.clj:151)
	at compojure.core$routing.doInvoke(core.clj:148)
	at clojure.lang.RestFn.applyTo(RestFn.java:139)
	at clojure.core$apply.invokeStatic(core.clj:659)
	at clojure.core$apply.invoke(core.clj:652)
	at compojure.core$routes$fn__14539.invoke(core.clj:156)
	at clojure.lang.AFn.applyToHelper(AFn.java:154)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:657)
	at clojure.core$apply.invoke(core.clj:652)
	at metabase.routes$fn__55477$fn__55478.doInvoke(routes.clj:108)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at compojure.core$routing$fn__14535.invoke(core.clj:151)
	at clojure.core$some.invokeStatic(core.clj:2693)
	at clojure.core$some.invoke(core.clj:2684)
	at compojure.core$routing.invokeStatic(core.clj:151)
	at compojure.core$routing.doInvoke(core.clj:148)
	at clojure.lang.RestFn.invoke(RestFn.java:423)
	at metabase.routes$fn__55477.invokeStatic(routes.clj:103)
	at metabase.routes$fn__55477.invoke(routes.clj:103)
	at compojure.core$if_context$fn__14559.invoke(core.clj:220)
	at compojure.core$routing$fn__14535.invoke(core.clj:151)
	at clojure.core$some.invokeStatic(core.clj:2693)
	at clojure.core$some.invoke(core.clj:2684)
	at compojure.core$routing.invokeStatic(core.clj:151)
	at compojure.core$routing.doInvoke(core.clj:148)
	at clojure.lang.RestFn.applyTo(RestFn.java:139)
	at clojure.core$apply.invokeStatic(core.clj:659)
	at clojure.core$apply.invoke(core.clj:652)
	at compojure.core$routes$fn__14539.invoke(core.clj:156)
	at clojure.lang.Var.invoke(Var.java:381)
	at metabase.middleware$catch_api_exceptions$fn__55379.invoke(middleware.clj:424)
	at metabase.middleware$log_api_call$fn__55356$fn__55358.invoke(middleware.clj:351)
	at toucan.db$_do_with_call_counting.invokeStatic(db.clj:203)
	at toucan.db$_do_with_call_counting.invoke(db.clj:196)
	at metabase.middleware$log_api_call$fn__55356.invoke(middleware.clj:350)
	at metabase.middleware$add_security_headers$fn__55304.invoke(middleware.clj:253)
	at ring.middleware.json$wrap_json_body$fn__60455.invoke(json.clj:44)
	at metabase.core$wrap_streamed_json_response$fn__60666.invoke(core.clj:67)
	at ring.middleware.keyword_params$wrap_keyword_params$fn__60512.invoke(keyword_params.clj:36)
	at ring.middleware.params$wrap_params$fn__60560.invoke(params.clj:67)
	at metabase.middleware$bind_current_user$fn__55255.invoke(middleware.clj:140)
	at clojure.core$comp$fn__5529.invoke(core.clj:2561)
	at clojure.core$comp$fn__5529.invoke(core.clj:2561)
	at clojure.core$comp$fn__5529.invoke(core.clj:2561)
	at metabase.middleware$maybe_set_site_url$fn__55308.invoke(middleware.clj:277)
	at puppetlabs.i18n.core$locale_negotiator$fn__124.invoke(core.clj:357)
	at ring.middleware.cookies$wrap_cookies$fn__60394.invoke(cookies.clj:175)
	at ring.middleware.session$wrap_session$fn__60651.invoke(session.clj:108)
	at ring.middleware.gzip$wrap_gzip$fn__60425.invoke(gzip.clj:65)
	at ring.adapter.jetty$proxy_handler$fn__60253.invoke(jetty.clj:25)
	at ring.adapter.jetty.proxy$org.eclipse.jetty.server.handler.AbstractHandler$ff19274a.handle(Unknown Source)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
	at org.eclipse.jetty.server.Server.handle(Server.java:499)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
	at java.lang.Thread.run(Thread.java:745)

mpiacenza commented Aug 29, 2018

I'm seeing the same issue- disabled http2 but still see a problem. It seems to be hanging jetty threads within the JVM - each call to the xray dashboard hangs a few more. I was able to get a thread dump, and believe the hanging threads look like this:

"qtp49705985-24" #24 prio=5 os_prio=0 tid=0x0000557e6df44800 nid=0x2d waiting on condition [0x00007f5307b45000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000007a8589c68> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at clojure.core$promise$reify__8144.deref(core.clj:7029)
	at clojure.core$deref.invokeStatic(core.clj:2312)
	at clojure.core$deref.invoke(core.clj:2298)
	at clojure.core.async$_LT__BANG__BANG_.invokeStatic(async.clj:115)
	at clojure.core.async$_LT__BANG__BANG_.invoke(async.clj:107)
	at metabase.api.common$cancellable_json_response.invokeStatic(common.clj:472)
	at metabase.api.common$cancellable_json_response.invoke(common.clj:463)
	at metabase.api.dataset$fn__43466.invokeStatic(dataset.clj:43)
	at metabase.api.dataset$fn__43466.invoke(dataset.clj:34)
	at compojure.core$make_route$fn__14528.invoke(core.clj:135)
	at compojure.core$wrap_route_middleware$fn__14521.invoke(core.clj:122)
	at compojure.core$wrap_route_info$fn__14525.invoke(core.clj:126)
	at compojure.core$if_route$fn__14477.invoke(core.clj:45)
	at compojure.core$if_method$fn__14467.invoke(core.clj:27)
	at compojure.core$routing$fn__14535.invoke(core.clj:151)
	at clojure.core$some.invokeStatic(core.clj:2693)
	at clojure.core$some.invoke(core.clj:2684)
	at compojure.core$routing.invokeStatic(core.clj:151)
	at compojure.core$routing.doInvoke(core.clj:148)
	at clojure.lang.RestFn.applyTo(RestFn.java:139)
	at clojure.core$apply.invokeStatic(core.clj:659)
	at clojure.core$apply.invoke(core.clj:652)
	at compojure.core$routes$fn__14539.invoke(core.clj:156)
	at metabase.middleware$enforce_authentication$fn__55250.invoke(middleware.clj:116)
	at compojure.core$routing$fn__14535.invoke(core.clj:151)
	at clojure.core$some.invokeStatic(core.clj:2693)
	at clojure.core$some.invoke(core.clj:2684)
	at compojure.core$routing.invokeStatic(core.clj:151)
	at compojure.core$routing.doInvoke(core.clj:148)
	at clojure.lang.RestFn.invoke(RestFn.java:423)
	at metabase.api.routes$fn__55398.invokeStatic(routes.clj:65)
	at metabase.api.routes$fn__55398.invoke(routes.clj:65)
	at compojure.core$if_context$fn__14559.invoke(core.clj:220)
	at compojure.core$routing$fn__14535.invoke(core.clj:151)
	at clojure.core$some.invokeStatic(core.clj:2693)
	at clojure.core$some.invoke(core.clj:2684)
	at compojure.core$routing.invokeStatic(core.clj:151)
	at compojure.core$routing.doInvoke(core.clj:148)
	at clojure.lang.RestFn.applyTo(RestFn.java:139)
	at clojure.core$apply.invokeStatic(core.clj:659)
	at clojure.core$apply.invoke(core.clj:652)
	at compojure.core$routes$fn__14539.invoke(core.clj:156)
	at clojure.lang.AFn.applyToHelper(AFn.java:154)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:657)
	at clojure.core$apply.invoke(core.clj:652)
	at metabase.routes$fn__55477$fn__55478.doInvoke(routes.clj:108)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at compojure.core$routing$fn__14535.invoke(core.clj:151)
	at clojure.core$some.invokeStatic(core.clj:2693)
	at clojure.core$some.invoke(core.clj:2684)
	at compojure.core$routing.invokeStatic(core.clj:151)
	at compojure.core$routing.doInvoke(core.clj:148)
	at clojure.lang.RestFn.invoke(RestFn.java:423)
	at metabase.routes$fn__55477.invokeStatic(routes.clj:103)
	at metabase.routes$fn__55477.invoke(routes.clj:103)
	at compojure.core$if_context$fn__14559.invoke(core.clj:220)
	at compojure.core$routing$fn__14535.invoke(core.clj:151)
	at clojure.core$some.invokeStatic(core.clj:2693)
	at clojure.core$some.invoke(core.clj:2684)
	at compojure.core$routing.invokeStatic(core.clj:151)
	at compojure.core$routing.doInvoke(core.clj:148)
	at clojure.lang.RestFn.applyTo(RestFn.java:139)
	at clojure.core$apply.invokeStatic(core.clj:659)
	at clojure.core$apply.invoke(core.clj:652)
	at compojure.core$routes$fn__14539.invoke(core.clj:156)
	at clojure.lang.Var.invoke(Var.java:381)
	at metabase.middleware$catch_api_exceptions$fn__55379.invoke(middleware.clj:424)
	at metabase.middleware$log_api_call$fn__55356$fn__55358.invoke(middleware.clj:351)
	at toucan.db$_do_with_call_counting.invokeStatic(db.clj:203)
	at toucan.db$_do_with_call_counting.invoke(db.clj:196)
	at metabase.middleware$log_api_call$fn__55356.invoke(middleware.clj:350)
	at metabase.middleware$add_security_headers$fn__55304.invoke(middleware.clj:253)
	at ring.middleware.json$wrap_json_body$fn__60455.invoke(json.clj:44)
	at metabase.core$wrap_streamed_json_response$fn__60666.invoke(core.clj:67)
	at ring.middleware.keyword_params$wrap_keyword_params$fn__60512.invoke(keyword_params.clj:36)
	at ring.middleware.params$wrap_params$fn__60560.invoke(params.clj:67)
	at metabase.middleware$bind_current_user$fn__55255.invoke(middleware.clj:140)
	at clojure.core$comp$fn__5529.invoke(core.clj:2561)
	at clojure.core$comp$fn__5529.invoke(core.clj:2561)
	at clojure.core$comp$fn__5529.invoke(core.clj:2561)
	at metabase.middleware$maybe_set_site_url$fn__55308.invoke(middleware.clj:277)
	at puppetlabs.i18n.core$locale_negotiator$fn__124.invoke(core.clj:357)
	at ring.middleware.cookies$wrap_cookies$fn__60394.invoke(cookies.clj:175)
	at ring.middleware.session$wrap_session$fn__60651.invoke(session.clj:108)
	at ring.middleware.gzip$wrap_gzip$fn__60425.invoke(gzip.clj:65)
	at ring.adapter.jetty$proxy_handler$fn__60253.invoke(jetty.clj:25)
	at ring.adapter.jetty.proxy$org.eclipse.jetty.server.handler.AbstractHandler$ff19274a.handle(Unknown Source)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
	at org.eclipse.jetty.server.Server.handle(Server.java:499)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
	at java.lang.Thread.run(Thread.java:745)
@bmariusz

This comment has been minimized.

Show comment
Hide comment
@bmariusz

bmariusz Aug 29, 2018

I have the same issue using kubernetes & ingress with TLS.
Metabase v0.30.1

Do you know which previous version will work with https?

bmariusz commented Aug 29, 2018

I have the same issue using kubernetes & ingress with TLS.
Metabase v0.30.1

Do you know which previous version will work with https?

@jornh

This comment has been minimized.

Show comment
Hide comment
@jornh

jornh Aug 30, 2018

Contributor

Do you know which previous version will work with https?

Not a 100% sure but, most people I’ve seen reporting this problem are referring a v0.30.x as where they see the problems. So I’ll assume you only need to step back to v0.29.3 ...

Contributor

jornh commented Aug 30, 2018

Do you know which previous version will work with https?

Not a 100% sure but, most people I’ve seen reporting this problem are referring a v0.30.x as where they see the problems. So I’ll assume you only need to step back to v0.29.3 ...

@bmariusz

This comment has been minimized.

Show comment
Hide comment
@bmariusz

bmariusz Aug 30, 2018

Thank you @jornh. It worked like a charm ;)

bmariusz commented Aug 30, 2018

Thank you @jornh. It worked like a charm ;)

@senior

This comment has been minimized.

Show comment
Hide comment
@senior

senior Aug 30, 2018

Member

I've been able to confirm what is being observed here. I don't have a fix yet, but I have narrowed this down quite a bit. I have found the following three things are needed to reproduce this:

  1. NGINX with HTTP/2 and SSL termination, reverse proxying to Metabase
  2. A dashboard with a lot of cards (11 in my example)
  3. The HTTP "keepalive" code that Metabase is using to notice when a user has abandoned a query (so we can cancel it)

When those three things are true, it hangs. You are more likely to hit this issue with newer versions (i.e. 0.30.1) as the new X-Ray features create dashboards with more cards. I believe you would hit this in versions prior to 0.30.x if you had a dashboard with enough cards because the query cancellation code has been in place since 0.28.2.

I'll report back here when I have a fix, but if you remove the HTTP/2 connector, the issue goes away. For testing I removed the query cancellation code (kept the HTTP/2 connector in NGINX) and the issue goes away as well. Dashboards with smaller numbers of cards are fine too. The problem seems to be the HTTP/2 connector, the proxy and how we're figuring out if the client is still connected or maybe how those results are being streamed back.

Member

senior commented Aug 30, 2018

I've been able to confirm what is being observed here. I don't have a fix yet, but I have narrowed this down quite a bit. I have found the following three things are needed to reproduce this:

  1. NGINX with HTTP/2 and SSL termination, reverse proxying to Metabase
  2. A dashboard with a lot of cards (11 in my example)
  3. The HTTP "keepalive" code that Metabase is using to notice when a user has abandoned a query (so we can cancel it)

When those three things are true, it hangs. You are more likely to hit this issue with newer versions (i.e. 0.30.1) as the new X-Ray features create dashboards with more cards. I believe you would hit this in versions prior to 0.30.x if you had a dashboard with enough cards because the query cancellation code has been in place since 0.28.2.

I'll report back here when I have a fix, but if you remove the HTTP/2 connector, the issue goes away. For testing I removed the query cancellation code (kept the HTTP/2 connector in NGINX) and the issue goes away as well. Dashboards with smaller numbers of cards are fine too. The problem seems to be the HTTP/2 connector, the proxy and how we're figuring out if the client is still connected or maybe how those results are being streamed back.

@senior senior self-assigned this Aug 30, 2018

senior added a commit that referenced this issue Aug 30, 2018

core.async take/put ops in `go` blocks use single `!` functions
When using the core async `alts` and `>` functions, they should use
the single `!` version (i.e. `alts!` and `>!` instead of the `!!`
version. When using HTTP/2 and a reverse proxy, the multiplexed HTTP/2
requests get issues to Metabase not as a single connection, but as
multiple requests at the same time. The combination of those
simultaneous requests and the incorrect usage of the `core.async`
functions leads to the query responses not being delivered.

Fixes #8312

@senior senior added the Bug label Aug 30, 2018

@senior senior added this to the 0.30.2 milestone Aug 30, 2018

@senior

This comment has been minimized.

Show comment
Hide comment
@senior

senior Aug 30, 2018

Member

I just opened up #8428 that fixed the issue for me. Would be great to get confirmation that the fix works from those that hit this issue. The goal would be to get the fix in our next release (0.30.2).

Member

senior commented Aug 30, 2018

I just opened up #8428 that fixed the issue for me. Would be great to get confirmation that the fix works from those that hit this issue. The goal would be to get the fix in our next release (0.30.2).

@oliverxchen

This comment has been minimized.

Show comment
Hide comment
@oliverxchen

oliverxchen Sep 3, 2018

@senior, we tried this and it seems to have fixed the issue. X-rays are displaying and metabase is quicker and hasn't frozen up. Thank you very much!

oliverxchen commented Sep 3, 2018

@senior, we tried this and it seems to have fixed the issue. X-rays are displaying and metabase is quicker and hasn't frozen up. Thank you very much!

@senior

This comment has been minimized.

Show comment
Hide comment
@senior

senior Sep 4, 2018

Member

@oliverxchen thanks for testing out the fix. I just merged it and it will be included in the 0.30.2 release.

Member

senior commented Sep 4, 2018

@oliverxchen thanks for testing out the fix. I just merged it and it will be included in the 0.30.2 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment