New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix writeable CTE assign reader gang to writer slice. #11058
Fix writeable CTE assign reader gang to writer slice. #11058
Conversation
219cf28
to
3c4d533
Compare
In spite of the detailed description, I couldn't understand the problem until stumbling upon /*
* 1. for entrydb, it's never be writer.
* 2. for first QE, it must be a writer.
*/
isWriter = contentId == -1 ? false: (cdbinfo->numIdleQEs == 0 && cdbinfo->numActiveQEs == 0);
segdbDesc = cdbconn_createSegmentDescriptor(cdbinfo, nextQEIdentifer(cdbinfo->cdbs), isWriter); It is strange that the function takes into account the |
Yes, but we have an assumption that the first QE on the free list is the writer. And Pengzhou advised not to break the assumption before. As the workaround is very simple, so I just go with the workaround here. |
@asimrp Actually, as I dig into our implementation, the writer gang must be allocated first. For a reader QE, it'll find the writer QE proc and set the writer QE proc as gpdb/src/backend/storage/lmgr/lock.c Lines 875 to 906 in 12296f0
If reader QE allocated first, the libpq connection will even fail. Since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the current interfaces between executor and dispatcher, I'm convinced that this is the best possible alternative. Thank you Junfeng for your patience!
For the gang allocation, our current implementation required the first allocated gang must be the writer gang. This has several reasons: - For lock holding, Because of our MPP structure, we assign a LockHolder for each segment when executing a query. lockHolder is the gang member that should hold and manage locks for this transaction. On the QEs, it should normally be the Writer gang member. More details please refer to lockHolderProcPtr in lock.c. - For SharedSnapshot among session's gang processes on a particular segment. During initPostgres(), reader QE will try to lookup the shared slot written by writer QE. More details please reger to sharedsnapshot.c. Normally, the writer slice will be assign writer gang first when iterate the slice table. But this is not true for writable CTE (with only one writer gang). For below statement: explain (slicetable) WITH updated AS (update tbl set rank = 6 where id = 5 returning rank) select * from tbl where rank in (select rank from updated); QUERY PLAN -------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) -> Seq Scan on tbl Filter: (hashed SubPlan 1) SubPlan 1 -> Broadcast Motion 1:3 (slice2; segments: 1) -> Update on tbl -> Seq Scan on tbl Filter: (id = 5) Slice 0: Dispatcher; root 0; parent -1; gang size 0 Slice 1: Reader; root 0; parent 0; gang size 3 Slice 2: Primary Writer; root 0; parent 1; gang size 1 If we sill assign writer gang to Slice 1 here, the writer process will execute on reader gang. So, find the writer slice and assign writer gang first.
3c4d533
to
f124d98
Compare
For the gang allocation, we assert the first allocated gang always be the
writer gang. And when recycle, the writer QE aliways be putted at the
top of free list.
Normally, the writer slice will be assign writer gang first when iterate the
slice table. But this is not true for writable CTE (with only one writer gang).
For below statement:
If we sill assigns writer gang to Slice 1 here, the writer process will execute
on reader gang. So, find the writer slice and assign writer gang first.
This fixes the issue #11024
Here are some reminders before you submit the pull request
make installcheck