Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxDB 0.9 behind ELB results in 503s #3102

Closed
pilt opened this issue Jun 23, 2015 · 4 comments
Closed

InfluxDB 0.9 behind ELB results in 503s #3102

pilt opened this issue Jun 23, 2015 · 4 comments
Assignees

Comments

@pilt
Copy link

pilt commented Jun 23, 2015

We just upgraded from InfluxDB 0.8 to 0.9 and are now seeing many 503 Service Unavailable: Back-end server is at capacity from AWS load balancers.

Beneath is a graph from CloudWatch. 0.9 was deployed around 08:00:

image

CPU load on InfluxDB instance is higher:

image

The health check is working. My guess is that it has to do with instances refusing socket connections.

The 503s happen both for writes and reads. We use Grafana for monitoring. When Grafana makes a batch of requests to update graphs, usually one or two requests are OK and the rest fails with 503s.

Is this a bug or should we simply avoid using ELB? We use it primarily because of Route53's aliasing and for SSL termination.

@pilt
Copy link
Author

pilt commented Jun 26, 2015

I built influxd from source and had it save pprof data (pprof.StartCPUProfile(f)).

top30:

(pprof) top30
206.07s of 207.25s total (99.43%)
Dropped 48 nodes (cum <= 1.04s)
      flat  flat%   sum%        cum   cum%
   194.29s 93.75% 93.75%    194.29s 93.75%  golang.org/x/crypto/blowfish.encryptBlock
     9.71s  4.69% 98.43%    203.72s 98.30%  golang.org/x/crypto/blowfish.ExpandKey
     2.04s  0.98% 99.42%      2.04s  0.98%  runtime.lostProfileData
     0.03s 0.014% 99.43%    204.12s 98.49%  golang.org/x/crypto/bcrypt.expensiveBlowfishSetup
         0     0% 99.43%    204.16s 98.51%  github.com/bmizerany/pat.(*PatternServeMux).ServeHTTP
         0     0% 99.43%    204.13s 98.49%  github.com/influxdb/influxdb/meta.(*Store).Authenticate
         0     0% 99.43%    204.13s 98.49%  github.com/influxdb/influxdb/meta.(*Store).read
         0     0% 99.43%    204.13s 98.49%  github.com/influxdb/influxdb/meta.func·021
         0     0% 99.43%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.(*Handler).ServeHTTP
         0     0% 99.43%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·001
         0     0% 99.43%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·002
         0     0% 99.43%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·003
         0     0% 99.43%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·004
         0     0% 99.43%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·005
         0     0% 99.43%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·006
         0     0% 99.43%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·007
         0     0% 99.43%    204.13s 98.49%  golang.org/x/crypto/bcrypt.CompareHashAndPassword
         0     0% 99.43%    204.13s 98.49%  golang.org/x/crypto/bcrypt.bcrypt
         0     0% 99.43%    204.16s 98.51%  net/http.(*conn).serve
         0     0% 99.43%    204.16s 98.51%  net/http.HandlerFunc.ServeHTTP
         0     0% 99.43%    204.16s 98.51%  net/http.serverHandler.ServeHTTP
         0     0% 99.43%    204.16s 98.51%  runtime.goexit

top30 -cum:

(pprof) top30 -cum
206.07s of 207.25s total (99.43%)
Dropped 48 nodes (cum <= 1.04s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%    204.16s 98.51%  github.com/bmizerany/pat.(*PatternServeMux).ServeHTTP
         0     0%     0%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.(*Handler).ServeHTTP
         0     0%     0%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·001
         0     0%     0%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·002
         0     0%     0%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·003
         0     0%     0%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·004
         0     0%     0%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·005
         0     0%     0%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·006
         0     0%     0%    204.16s 98.51%  github.com/influxdb/influxdb/services/httpd.func·007
         0     0%     0%    204.16s 98.51%  net/http.(*conn).serve
         0     0%     0%    204.16s 98.51%  net/http.HandlerFunc.ServeHTTP
         0     0%     0%    204.16s 98.51%  net/http.serverHandler.ServeHTTP
         0     0%     0%    204.16s 98.51%  runtime.goexit
         0     0%     0%    204.13s 98.49%  github.com/influxdb/influxdb/meta.(*Store).Authenticate
         0     0%     0%    204.13s 98.49%  github.com/influxdb/influxdb/meta.(*Store).read
         0     0%     0%    204.13s 98.49%  github.com/influxdb/influxdb/meta.func·021
         0     0%     0%    204.13s 98.49%  golang.org/x/crypto/bcrypt.CompareHashAndPassword
         0     0%     0%    204.13s 98.49%  golang.org/x/crypto/bcrypt.bcrypt
     0.03s 0.014% 0.014%    204.12s 98.49%  golang.org/x/crypto/bcrypt.expensiveBlowfishSetup
     9.71s  4.69%  4.70%    203.72s 98.30%  golang.org/x/crypto/blowfish.ExpandKey
   194.29s 93.75% 98.45%    194.29s 93.75%  golang.org/x/crypto/blowfish.encryptBlock
     2.04s  0.98% 99.43%      2.04s  0.98%  runtime.lostProfileData

gif:
out

@pauldix
Copy link
Member

pauldix commented Jun 26, 2015

Looks like we're not caching the authenticated info so it's bcrypting the password on every request, which is killing the CPU. Will prioritize this one.

@pauldix
Copy link
Member

pauldix commented Jun 26, 2015

Thanks for the additional info and investigation btw!

@dgnorton dgnorton self-assigned this Jun 26, 2015
@pilt
Copy link
Author

pilt commented Jun 26, 2015

Glad to help make an awesome product even better!

dgnorton added a commit that referenced this issue Jun 26, 2015
dgnorton added a commit that referenced this issue Jun 26, 2015
dgnorton added a commit that referenced this issue Jun 29, 2015
dgnorton added a commit that referenced this issue Jun 29, 2015
dgnorton added a commit that referenced this issue Jun 29, 2015
dgnorton added a commit that referenced this issue Jun 30, 2015
dgnorton added a commit that referenced this issue Jun 30, 2015
otoolep added a commit that referenced this issue Jun 30, 2015
@dgnorton dgnorton removed the review label Jun 30, 2015
dgnorton added a commit that referenced this issue Jun 30, 2015
dgnorton added a commit that referenced this issue Jun 30, 2015
dgnorton added a commit that referenced this issue Jun 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants