Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB Atlas reports too many connections to the database #8024

Open
psonnera opened this issue May 28, 2023 · 10 comments
Open

MongoDB Atlas reports too many connections to the database #8024

psonnera opened this issue May 28, 2023 · 10 comments

Comments

@psonnera
Copy link

If you need support for Nightscout, PLEASE DO NOT FILE A TICKET HERE
For support, please post a question to the "CGM in The Cloud" group in Facebook
(https://www.facebook.com/groups/cgminthecloud) or visit the WeAreNotWaiting Discord at https://discord.gg/zg7CvCQ

Describe the bug
We have several reports of AAPS/Loop users receiving a mail from MongoDB specifying they are using more than 500 connections and M0 doesn't support this. As a result, Nightscout will crash.

To Reproduce
Not known yet, users confirm they didn't change the amount of uploaders/downloaders

Expected behavior
500 connections seem impossible for Nightscout

Screenshots

image
image

Your setup information

  • Nightscout 14.2.x
  • AAPS/Loop

Additional context
Not known yet

(Will edit and update with new information, not using Atlas I haven't seen the issue)

@psonnera
Copy link
Author

The same issue has been seen with Railway
image

@sulkaharo
Copy link
Member

This is a pretty bizarre issue. Nightscout doesn't do any connection management, with the assumption that the MongoDB driver pools the connections and would never need more than a few concurrent connections given we're reusing the connection object. This makes me wonder if there's been an update to the driver that requires code changes, there's a bug in the MongoDB driver or if there's a bug in Nightscout that causes runaway connections / connection leaks. The relevant code in NS hasn't changed and I never observed that many connections being used. @bewest ideas?

@bewest
Copy link
Member

bewest commented May 28, 2023

Similar outlook, @sulkaharo. Maybe the MongoDB version? Are they using Mongo 6 or some version that changes the backwards compatibility?

@bewest
Copy link
Member

bewest commented May 28, 2023

What is the time period for these connections and how does Atlas measure them? It's possible that if the server is shedding connections, the client will have to reconnect. It's also possible that if Nightscout is crashing, perhaps due to an AAPS data issue, that restarting lots of times will burn through more connections. It's worth looking carefully to see if the crashes are causing the connections to increase vs other way around. In general in the case of a db error, Nightscout should show a detailed error, page not crash. However, crashing will generate 8 new connections when the hosting provider restarts the process. What's in the crash logs?

@jamesthurlow
Copy link

@bewest version 6 I think. Just trying to figure out how to get access to crash logs.

image

@sulkaharo
Copy link
Member

Isn't this concurrent connections though? If so, Nightscout crashing can't be the culprit as crashing should release the connections. I recently observed my instance having gone to a mode where some event that causes the data reload was being triggered very often - if that happens in a runaway way, that could cause a lot of connections being used. I'll add a constraint to how often that can happen.

@sulkaharo
Copy link
Member

Right, so bootevent already implements debounce, but it's only set to cap the data updates to once / 5 seconds, which I suspect is too fast for Atlas. Depending on how they defer the execution of queries, I guess this could cause 500 connections to be consumed if something was causing the data update event to be triggered frequently. I suggest we raise that DB load debounce to 15 seconds and see if that helps. With the current server implementation, this load is theoretically not needed at all.

The other potential culprit is the ddata_at REST endpoint, which has no cap to how frequently it can be called and it'd be very easy to take an instance down by calling that at a rapid rate.

@bewest
Copy link
Member

bewest commented May 31, 2023

@jamesthurlow, if your Nightscout is hosted in Heroku, then it will be there, or Railway, Flyio, etc.
Before this issue, were you previously hitting the Atlas size quotas and then deleted some data?
Maybe we can add some instrumentation to the pool to see if we can monitor the issue ourselves?

@jamesthurlow
Copy link

Hi @bewest - have enabled access to @psonnera. I'm not precious about my account so if someone else needs to dive in let me know email address.

Quite sometime ago I did hit limits on quotas. Deleted a whole load of data and the problem went away.. must have been 6 months ago.

Just to add I am using AAPS - not sure if that is a factor or not.

@ninelore
Copy link

ninelore commented Mar 3, 2024

Hello everyone, this issues recently appeared after I upgraded to 15.0.2 on fly.io.

Connections went down in Atlas after i shut down my nightscout instance for testing
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants