-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MongoDB reporting "AuthenticationAbandoned: Overridden by new authentication session" for a PHP client #1506
Comments
I traced the introduction of AuthenticationAbandoned(337) back to mongodb/mongo@5b83d4c and SERVER-52863.
If you'd like to further identify the connections, you could look into using the I'm not familiar with istio, but as a proxy is it possible that the driver-to-proxy connection is terminating while the proxy-to-server connection is not? In that case, we might see the driver reestablishing a connection and sending the initial auth handshake (as it should); however, the proxy only ends up passing the auth handshake over to the server, where it's unexpected due to there already being an existing auth session. I'd also like to confirm that there's nothing in istio that would result in multiple driver connections being multiplexed across a single server connection. In other words, can we safely assume that "conn704" in the logs above corresponds to exactly one instance of the PHP driver? If the logs are split across several minutes, I think the likely suspect would be whatever process is watching a change stream (e.g. If you're willing to debug that process further, you could register an SDAM logger within the process and have that log output (to a stream, file, or logging service, whichever you prefer). Using the sdam_logger.php example script as a base, you can build upon the subscriber class and replace the |
Hi @jmikola, thank you for your quick and helpful advice, it helped me a lot to progress on the issue! I was able to tag all the connections with the
I then deployed a new pod with the same PHP version and configuration (e.g. the same MongoDB library version, using the same MongoDB servers, the same credentials and the same database) and tested with the following script: <?php
require_once(__DIR__ . '/vendor/autoload.php');
$connURI = getenv('APP_MONGO_CONN_URI');
$manager = new MongoDB\Driver\Manager($connURI);
$cmd = new MongoDB\Driver\Command(['ping' => true]);
$res = $manager->executeCommand(getenv('APP_MONGO_DB'), $cmd);
// I tried the same also with the MongoDB library
// $client = new MongoDB\Client($connURI);
// $client->selectDatabase(getenv('APP_MONGO_DB'))->command(['ping' => true]);
echo "OK", PHP_EOL; In this case, all went well, here are the logs from both MongoDB servers. The first server, acting as SECONDARY:
The second server, acting as PRIMARY:
TLDR: the first (SECONDARY) server closed the connection with AuthenticationAbandoned: Authentication session abandoned, client has likely disconnected, while the second (PRIMARY) server closed the connection cleanly. I am not sure whether the way how the first server closes connections is correct, but at least the connections in both MongoDB servers are closed immediately and cannot be found anymore with That's not, however, the case of the production application that for some reason leaves the connection open. The difference between the test application and the production application seems to be somewhere in Symfony or Doctrine, so I will try the SDAM debugging script to identify the difference and report the updates here. Thank you! |
I have some more debugging insights. TLDR is:
I still did not get to deploying the SDAM logger, because I made some new observations in my local environment. Specifically, the issue does not appear locally, despite me having the very same application's Docker image and the very same MongoDB's image, and the database also running as a replica set. Application-wise, the local environment is exactly the same as the production environment, the difference between those two is only that production application runs in a Kubernetes cluster extended by istio. Then I also realised that I had made a mistake with my test script above - because I ran it as a CLI, meaning the PHP process finished immediately and that was the reason for the connection to be closed cleanly. When I triggered the same testing script over a HTTP request in a PHP-FPM process, the connection stayed open and the issue appeared again. So, it seems that the PHP process keeps the connection open, whether it's a long-running process or a PHP-FPM process - because in both cases the process keeps running. Only when the logic is ran as a separate PHP process that finishes immediatelly afterwards, the issues is not observed. To verify that, I reconfigured PHP-FPM with Lastly, when I reported the bug, I stated that the issue appeared every "60-120 seconds". However, that's not accurate. It seems that the issue happens with a first request that comes after 60 seconds since the message appeared last time, independently on whether there was meanwhile silence on the same connection or whether there were other requests. To illustrate that, I let the test script ran every second over HTTP as follows: watch -n 1 -- wget http://localhost/index.php -O - >/dev/null The AuthenticationAbandoned: Overridden by new authentication session error message for the particular connection happened every 60 seconds (plus minus a few milliseconds). The first occurence, however, happened only after 120 seconds since the connection was created. Then I triggered the Lastly, there still seems to be issue mentioned in my previous message that when ran as a one-off CLI script, the connections to the PRIMARY closes cleanly, but the connection to SECONDARY is closed with AuthenticationAbandoned: Authentication session abandoned, client has likely disconnected. Is that on purpose? I will go and try the SDAM logger now. |
I tried the SDAM debugging. It seems that after the initial request, there is a heartbeat request every 60 seconds, and the first one does not produce the aforementioned log message, while all the following ones do. I didn't notice anything particular in the logs. The log that appeared with the first connection:
I kept hitting the script (over HTTP) every second, but no log appeared until 60 seconds elapsed:
Again, no further output until another 60 seconds elapsed:
At this point, the following error appeared in MongoDB's logs: {
"t": {
"$date": "2024-01-05T19:01:44.019+00:00"
},
"s": "I",
"c": "ACCESS",
"id": 5286307,
"ctx": "conn1814",
"msg": "Failed to authenticate",
"attr": {
"client": "127.0.0.6:39925",
"isSpeculative": false,
"isClusterMember": false,
"mechanism": "",
"user": "",
"db": "",
"error": "AuthenticationAbandoned: Overridden by new authentication session",
"result": 337,
"metrics": {
"conversation_duration": {
"micros": 60494130,
"summary": {}
}
},
"extraInfo": {}
}
} |
At this point, I am puzzled as where to look next.
@jmikola It doesn't seem to be this case - since the connection is established from driver-through-proxy-to-server. When I shut down the PHP process, the connection is closed. And even after the errors appear, they are all related to a single MongoDB's connection, I didn't observe new connections created as long as the process lives. Also worth mentioning, I have it configured as TCP proxying, I don't have the experimental MongoDB support enabled in istio. That means, istio should see only TCP traffic and should not try to manipulate it in any way.
AFAIK Envoy multiplexes HTTP traffic and not TCP traffic. But I don't fully understand it, so I need to learn more about it. Even though, in that case, I would expect to see troubles also for the Node.js client. So I have a few more ideas to explore:
If anyone has any hints on where to look next, I'd be greatful for them. Thank you! |
I decided to deploy a new MongoDB testing server into the same K8s cluster but not as a replica set, to better see what is happening. I enabled debugging as follows: test> db.setLogLevel(5, 'accessControl');
test> db.setLogLevel(5, 'network'); I used the same script as above and triggered the script every second over a HTTP request. Here are the logs from the MongoDB server:
Here's the timeline overview:
All the heartbeat requests have the Here are the logs from the SDAM logger:
I don't see anything suspicious there. |
I have redeployed my testing pods without istio and the issue still persists. This IMO rules istio out. I will still try to replicate the bug in the local development environment. |
I was eventually able to replicate the issue locally. I created a small project to demonstrate the behaviour: https://github.com/aleshaczech/mongo-php-driver-issue-1506. It is based on PHP-FPM v8.3.1 and MongoDB v7.0.4. To observe the issue, one should do: docker compose down && \
docker compose rm -f && \
docker compose build --pull && \
docker compose up -d && \
docker compose logs -f ...and wait 120 seconds. Here's the (redacted) output:
|
@aleshaczech: Thanks for the excellent debugging. I'll investigate this further and follow up once I have something to report. The 60-second delay (I can't speak for 120) corresponds to the single-threaded monitoring interval used in libmongoc, so I'm going to start by looking into whether the outgoing monitoring command is being improperly assembled (e.g. re-using a document previously prepared for the initial handshake). |
@aleshaczech: This is caused by libmongoc including a This can likely be addressed for the upcoming libmongoc 1.26 release, which would then be included in the next PHPC minor version that follows. In the meantime, you should be able to work around this by explicitly specifying |
Thank you, @jmikola ! This indeed seems to fix the issue, my logs are now a very quiet place. :-) I guess we can close this ticket, since the issue is now tracked in Jira. If that's not right, please feel free to reopen and/or let me know. Thanks again! |
I am having the same issue on a nodejs application with mongoose. I have a similar setup as @aleshaczech, I have some processes running on background and as well a frontend that makes requests. Looking at the logs I do see the same notification pop in almost every minute:
Adding |
@emilushi this sounds like something we'd want to investigate. Are you able to share a self-contained reproduction that produces this result? If so you could open a new ticket in JIRA under the Node project so that we can dig into this further. |
@alexbevi I will try to create one for you tomorrow. |
Bug Report
Hi all,
I am observing AuthenticationAbandoned: Overridden by new authentication session errors (code 337) with message Failed to authenticate in MongoDB's logs for existing connections from my PHP applications. These errors start to happen after some time for connections that started with Successfully authenticated. However, my PHP application connects to the MongoDB server successfully and reports no errors whatsoever.
Some parts of the application run as long-running PHP processes and use MongoDB's
db.collection.watch()
, so, they are connected to MongoDB indefinitely. The errors are being logged for a few different connections. I am not sure, though, whether these errors are produced only by the long-running processes or if it's also the case of the HTTP requests-related connections.The errors seem to be generated only by a PHP client. I don't have any other PHP application connecting to the server to compare with another PHP application, but I have a Node.js application connecting to the same MongoDB instance and it doesn't produce any such errors.
I can't seem to find any details about the error on Google and the error code is also marked as internal in https://github.com/mongodb/mongo/blob/master/src/mongo/base/error_codes.yml#L421C1-L421C1.
Here's the full log line, please note the empty
user
anddb
fields, even though they are specified in the connection URI:The error message appears in the logs every 60-120 seconds, but only after some time. The connection starts with successful authentication, and only after some time starts producing these errors:
The field
attr.metrics.conversation_duration.micros
seems to match closely the time difference between two Failed to authenticate error messages.The MongoDB is deployed as a replica set, my connection URI is as follows:
The details of the connection from MongoDB's
db.currentOp()
output:My application and the MongoDB server are deployed in a Kubernetes cluster extended with istio service mesh, all hosted by DigitalOcean. The IP address
127.0.0.6
in istio is supposed to be used for "undeclared" services ports (as discussed in istio/istio#29603), so the issue might be related to istio. However, since my Node.js application runs in the same infrastructure and uses the same connection URI (with different user, password and db) and does not produce any such errors, it might be that the PHP client is doing something differently.Environment
I have the following setup:
StatefulSet
)Composer libraries used:
MongoDB's extension configuration:
Expected and Actual Behavior
I would expect the error not to appear in the MongoDB's logs.
The text was updated successfully, but these errors were encountered: