Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request responses getting swapped #2001

Closed
vdhpieter opened this issue Sep 27, 2019 · 15 comments
Closed

Request responses getting swapped #2001

vdhpieter opened this issue Sep 27, 2019 · 15 comments

Comments

@vdhpieter
Copy link

Describe the bug
We have a really weird bug that only happens very rarely.

When 2 requests happen really close to each other (we saw occurrences with only 0,001s in between) that are handled on the same container we noticed that request response get swapped. A small example:
Request A happens at 12:00:00.000
Request B happens at 12:00:00.001

Request A should get response 1 and Request B should get response 2.
In our logs we see that it happens very rarely that Request A receives response 2 and request B receives response 1.

We have not more debug info then this at the moment but we'll keep monitoring and update this issue. The issue has a very high priority for us because we already saw the case where 1 request was a public endpoint and the other request was protected. This can potentially leak personal data to unauthorized users.

Puma config:

# config/puma.rb in our rails app
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }.to_i
threads threads_count, threads_count

port        ENV.fetch("PORT") { 3000 }

environment ENV.fetch("RAILS_ENV") { "development" }

(['test', 'development'].include?(ENV.fetch("RAILS_ENV", "development"))) ? (workers 0) : (workers ENV.fetch("WEB_CONCURRENCY") { 2 }.to_i)

preload_app!

before_fork do
  Rails.configuration.split_factory.stop!
end

on_worker_boot do
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord)

  Rails.configuration.split_factory.resume!
end

plugin :tmp_restart


We start our rails server using this command:

bundle exec puma

To Reproduce

Not possible at the moment.

Expected behavior

Request A gets response 1 and Request B gets response 2.

Desktop (please complete the following information):

  • OS: Docker container with the ruby:2.6.2 image running on a AWS ecs cluster
  • Puma Version 4.1.1 (looks like it started there, but not entirely sure)
@dentarg
Copy link
Member

dentarg commented Sep 27, 2019

Is your only observation of this your logs? Or have you done the request and received a response that you shouldn't get for that type of request?

@nateberkopec
Copy link
Member

You didn't mention - what's the Rack app? Is this Rails?

It really sounds like you have a threading bug in your application, which is much more likely if you're not using Rails.

I'm also interested in how you know who's receiving which response. We'll need more details there. You're obviously not logging the entire response, so how do you know who is receiving what? Does Request A receive all of Response B or just some parts of Response B?

@vdhpieter
Copy link
Author

@dentarg no not only logs, we noticed it the first time when a colleague noticed that he saw the response of a server-side request on the site instead of the async rendered view that should be there

@nateberkopec It is a rails app. we're indeed not logging the response but from what we saw the entire responses where swapped

I know it's not a lot of information but the bug is also very rare. We have millions of requests every hour en we only saw proof of the bug 3 times in 2 days...

We reverted to puma 4.1.0 and haven't seen any occurrences since

@nateberkopec
Copy link
Member

I’ll take another look at what’s changed lately but I don’t think much of the Reactor/Client code that governs this has changed in a very long time. If I can’t find anything obvious from the last few minor versions I’ll have to close, as it’s more likely something weird is happening in your end than ours. Could even be a threading bug somewhere else in your request chain - a proxy for example.

Sent with GitHawk

@p8
Copy link
Contributor

p8 commented Sep 30, 2019

@vdhpieter Are you using Thread.current? What are you using for the async rendering?

@nateberkopec
Copy link
Member

So I just checked - we've not changed anything in the response pipeline since 4.1.0. I think you've got a bug in the app or somewhere else in the chain. Good luck!

@vdhpieter
Copy link
Author

@nateberkopec thanks for checking anyway. We rolled back to version 4.1 and haven't seen the issue since.

@p8 The async rendering is just the frontend calling another API endpoint after the initial load. We don't use Thread.current

@nateberkopec
Copy link
Member

We're making more and more test classes use parallelize_me!, which starts each test case in a new thread. That will help us a bit to find weird threading bugs. Unfortunately there's just not much to go on in this particular instance 😞

@henningms
Copy link

Hi there!

We recently discovered this issue in production, and it was quite a severe issue so we had to turn off our system due to privacy concerns, and have sadly switched to Unicorn for the time being.

We were able to reproduce this in our production environment on demand by having two clients refresh the same endpoint until the request/response time matched within milliseconds. Everything seems fine in the API until the response is sent, then it is somehow swapped. In our case, one user got the profile information of another user.

Our API is also running in AWS EKS clusters. Coincidence?

I've been trying to create a repro project this weekend but I have not been successful in reproducing it locally yet.

@nateberkopec
Copy link
Member

A repro project would be amazing. Puma’s response pipeline is pretty battle tested so I’d be surprised if it’s our fault but you never know.

Did you try Puma in production with a thread setting of 1?

@henningms
Copy link

henningms commented Feb 16, 2020

It could be our code, could be Puma, could be something else interfering, but we noticed the exact same thing as the OP. I logged everything that happened up until the rendering of the response and it all seemed normal, but one client ended up with a response not intended for them. We're also running on AWS EKS, dunno if that would be a factor.

Will stress test the same code with Unicorn come Monday, and I will try to test in production with thread setting of 1.

@henningms
Copy link

After digging through everything it might seem to be a custom middleware that I was unaware of that leaked. More testing follows tomorrow. Will post my findings here.

@henningms
Copy link

Hi again,

Conclusion: We had a leaky middleware in our system, and everything is good again. Thanks!

@luispcosta
Copy link

luispcosta commented May 28, 2021

@henningms I'm having this issue as well. We've replicated with puma and thin. Can you give any insights into what middleware you had that was causing the issue?

@synth
Copy link

synth commented Sep 19, 2022

For those future visitors who stumble upon this issue (and may you have stumbled upon this without too much hair pulling):

Instance variables in rack middleware are bad. Don't do them. Freeze your middleware or dup your ivars. I've been coding Rails for 18 years (since the beginning, yo) and I never knew (or needed to know) Middleware should be frozen. Maybe I'm still a newb?

Rack middleware is initialized only on the first request of the process. So any instance variables acts like class variables, and modifying them in call() isn’t thread-safe.

Reference:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants