-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2.3.0 causes mongo transaction issue "Attempted illegal state transition from [TRANSACTION_ABORTED] to [TRANSACTION_ABORTED]" #4350
Comments
Same error here on a fresh install, using mongodb atlas and payload-demo, also reported here: #4072 |
The error message is slightly different than the one in #4072, though.
I was not having the problem after I updated. I'm going to check again though, I've been developing locally since then. I'm using a |
Thanks, I’m going to try those
|
I've tried adding |
I have a little extra info here @DanRibbens upon closer inspection my seed script successfully |
I'm trying so hard to recreate the issues in this thread without success. I was able to get some errors when seeding the public-demo against an atlas connection with transactions. This makes sense as the seeding function is creating and updating documents synchronously in a way that it shouldn't work. That said, I was getting write conflict errors which are expected with how it is currently written. I think there is a documentation gap we need to cover. Ideally when you're doing your document changes like the seeding in public-demo you have two options:
This has been done throughout the core of payload and plugins and the The only known issue I've come across has to do with multiple synchrounous operations that error I don't see what has changed from 2.1 that would cause this looking at the changes released. I have a branch that will allow passing transactionOptions or disabling transaction usage altogether for mongodb. I don't want that to be the fix though. Thanks for your patience while I figure things out. |
I've got a bit more data here from my custom "payload shell", active transaction number -1 doesn't seem good! Also just to confirm I'm not doing any explicit transaction stuff at all. No accessing
|
I've gone through and confirmed every |
Also running into this issue with Payload 2.3.1 running on AWS app runner with DocumentDB. Our app runner instance stops/crash/restart with the following backtrace:
Will update this post if i find more. |
@DanRibbens I believe I've found the culprit (at least in our setup). I've been able to avoid this by disabling |
Could not find a stable version so went the nuclear option and disabled transactions all together on version 2.4.0. Seems to be stable. |
How do you disable transactions altogether? I can't seem to find it in the docs. |
This is not yet an option, but there is a PR (#4467) in the works. I patched @payloadcms/db-mongodb/dist/index.js with the following:
|
I've got a little more data here, I was able to bypass some of the most persistent errors here by disabling some |
A support rep at MongoDB Atlas pointed me to this related issue with the MongoDB Node.js driver. The rep said it was actively being looked into. |
Thanks for sharing that link @jnicewander! Given that information, it makes sense to me that we would just swallow the error with try/catch so that any errors from calling What do we think? |
|
Is there a workaround that doesn't involve removing |
I've still not been pointed to a reproducable repo for these transactions issues. Can somebody share their Reviewing our code for filterOptions, I could see there being a problem if the query is having an error resulting in an abort before the rest of the transaction, but I'm not very eager to make changes here without understand the problem more. |
Hi @DanRibbens Really nothing special with the
|
I don't know about others situations/projects, but it would be pretty difficult for me to create a reproduction. I think this only happens with a certain level of complexity/relationships/hooks that would be very difficult to simulate. I know that isn't helpful, but it's not for lack of trying! I also haven't been able to reproduce in reasonably simple sandboxes, but I'm certain there's an issue 😆 I think you're instinct here @DanRibbens is correct re an error/abort during transaction. I'm hesitant to speak to this because I have very cursory understanding of node process management / workers / tasks (or even the payload hook system) but I think the single biggest development difficulty in Payload is lack of transparency in errors. Again I have no idea if that is fundamental to node async tasks so not necessarily a knock on Payload, but there does seem like there's a huge swatch of errors like this that just get swallowed and if they were easily surfaced could significantly improve debugging. |
thanks @xHomu we have done this and its working, sorry to ask but do you know what the implications of doing this are in production? |
We have been using that flag since it's available on mana.wiki for about a month. Nothing is broken yet! The main problem would be we won't unearth transaction race condition issues in larger production apps for PayloadCMS. |
@iamacup @denolfe @DanRibbens I am seeing the exact same issue, and have confirmed these recommendations. I regularly receive these errors (latest payload) interacting with the UI, which causes the app to crash within minutes of use. These application crashed and errors occur on collections that have
This seems like a race condition of sorts, as collections with more The issues started when switching from a local mongodb instance to Atlas (free tier), during the last round of testing before launching an MVP.
|
Thanks for the extra info @BrianJM. I'll take another look into this tomorrow. |
@DanRibbens i have created a 'minimal' repo and shared it with you directly and am happy for you to share inside Payload - https://github.com/iamacup/payload-transactions/ - please read the readme. I am using the word 'minimal' because while it is simple in terms of collections and payload config (no plugins etc.) it's got a bunch of blocks and the error is coming from there. video here: https://drive.google.com/file/d/1ziwb8tCrkRKgqIk0Zuj-fsQe6AztKNdR/view?usp=sharing it's causing both:
reliably I SUSPECT it is to do with the media collection and uploads but i will do more today to see if i can get this more minimal. EDIT: i have since removed the media collection entirely and the error is still there, in the example there are no relationship fields now.
|
@iamacup What's in
|
Hi! Thank you Dan for your efforts!
And when I make the following request with an ID that doesn't exist:
in one of the two fields then no error is thrown.. This is my db config:
|
This is a really disgusting bug - our product team have spent 2 months building a Payload implementation only for it to fall down at deployment time. Really very worried about picking Payload now - because this error will only ever show when you try to deploy - and when you deploy it will crash your whole stack out - it needs some serious attention because you get to the end of your dev cycle and find out you can't send it live!! feels like a massive bait and switch. Perhaps an update to the docs encouraging people to build and test on replicasets is needed so they don't get too far down and get stuck like we have? We would not have picked Payload if we had found this earlier. looks like quite a few people in this thread are having 'go live' issues with Payload @jmikrut is this a P1 for your team? |
Hi @DanRibbens You still have access to my repo if you want to reproduce easily. The seed errors are definitely not generated by poorly handled async map. All the payload.create are awaited properly. |
I have shared it directly with you now @BrianJM. it ain't pretty though :) |
@sandwit, You can disable all transactions by setting |
Thanks @iamacup. I see you do not have relationships configured, but you do have Lexical (which can create relationships) so that may be what we have in common. I have two collections that reference each other with relationships and virtual fields.
@DanRibbens I can increase the frequency of errors by increasing the number or relationships and/or However, I consistently see two Aside from this having a very negative impact on performance, the race may be occurring there (although not likely the root cause). This code is responsible for duplicate The screenshots below show loading a record from either collection. The last I have const { payload } = req
const resp = await payload.find({
req,
collection: 'stages',
depth: 0,
sort: 'probability',
where: {
pipeline: {
equals: data?.id || null,
},
},
}) |
@BrianJM i just removed lexical and still getting errors both does Slate add relationships? |
@iamacup I'm not sure, but I think so. You can disable the features in Lexical. Link, relationship, and upload. Can you try that? It may not matter since you removed Lexical. |
I've gone and pruned out every single field apart from blocks and i am still getting the error reliably and instantly starting with no data - by creating a single record - my structure as you have seen is a bunch of nested blocks - i am certain somehow its getting messy down in the code when you get to a big nesting. https://github.com/iamacup/payload-transactions/pull/1 I can remove the error entirely if i just chop the block structure off half way down. edit: what i have said there is clearly a lie there is an array as well - brb removing edit2: removing the array did not fix it - it is now just blocks in the project |
Yes, good point. This is consistent with what I see with relationships and hooks. The more relationships in a collection, or the more field hooks (with Local API calls) that are triggered as a result, the faster the error occurs. |
Hey @iamacup!
I pulled your configuration into a new npx create-payload-app blank template. Connected to a mongodb Atlas instance to reproduce the errors. You can read the description, but what was happening was a race condition in the access function where the more fields you have, the more likely it was to error. There are still improvements to be made on transactions overall, as I am looking at feedback posted here: #4078 (comment) Thanks everyone contributing to the conversation! This is high priority for us to get right. |
@DanRibbens I pulled #5156, built and tested today. This fixed the errors I reported using the following config. Thank you! 👍 transactionOptions: {
readConcern: { level: 'majority' },
writeConcern: { w: 'majority' },
maxCommitTimeMS: 10000,
maxTimeMS: 20000,
}, Do you want me to open a new issue for the duplicate network requests?
|
Yes that is a good call! |
Link to reproduction
No response
Describe the Bug
Apologies in advance, I'm not going to be able to provide many details here. I'm experiencing this in a large project I can't share and I have very little insight into what's causing it. What I do know is my unit/integration tests are working and passing on my project on 2.2.1, when I upgrade to 2.3.0 with no other changes, my seeds break with this error during a locale
create
call.This only occurs in the seed script, calling create via local API. Creating a record in this same collection via admin works fine.
And seeing this response on the local API create call:
To Reproduce
Due to the complexity of my project and the relative lack of details in the error, I really have no way to isolate this issue or provide a simplified reproduction. My best guess is this is related to something happening in a hook executing during create. The one other thing that's relevant is that this problem does not occur on the first 10 or so collection seeds which all complete as expected.
Payload Version
2.3.0
Adapters and Plugins
db-mongodb
The text was updated successfully, but these errors were encountered: