-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEGV when MULTI_PROCESS enabled #27
Comments
AsyncSource mechanism which uses a separate thread to read data can sometimes try to manipulate an address that may be invalid because of memory context reset or delete by the main thread. Fix that by implementing a dedicated memory context for AsyncSource, which is only manipulated in the reading thread. Fixes #27.
Thanks for the report! One solution might be to have a dedicated memory context for Could you review the PR #28? |
Thank you for the reply!
This is the same as what I imagined. PR #28 looks good but because I'm not familiar with pg_bulkload code I'm not sure whether current approach is safe. Is it possible that the memory context is changed unexpectedly from AsyncSource to another memory context like TupleChecker by main thread after switched to AsyncSource? |
To answer your question - you'll see that we switch back to the Also, as I commented on PR #28, the problem is not caused by interaction of different threads. It's rather the lack of coordination between different modules of pg_bulkload (such as parser, reader, source). Currently, we do have a dedicated context for parallel writer named I noticed however that, |
I've now understood what happened exactly. You're right. Reading source file is done by child thread but extending read buffer is done by main thread. The main thread extends the read buffer after switched memory context to "ParallelWriter", so it's cause of SEGV.
ISTM that AsyncSource can have context as current your PR does, or can we use repalloc instead? |
I assume you mean,
Due to involvement of the locking between threads here, I'm not immediately sure if that's going to be straightforward. It could be made to work with enough attention though. |
Yes. I guessed that we can use |
AsyncSource mechanism which uses a separate thread to read data can sometimes try to manipulate an address that may be invalid because of memory context reset or delete by the main thread. Fix that by implementing a dedicated memory context for AsyncSource, which is only manipulated in the reading thread. Fixes #27.
OK, merged the PR for now. Thanks for the review! |
AsyncSource mechanism which uses a separate thread to read data can sometimes try to manipulate an address that may be invalid because of memory context reset or delete by the main thread. Fix that by implementing a dedicated memory context for AsyncSource, which is only manipulated in the reading thread. Fixes #27.
Hi,
I got SEGV error with large input file that includes parse error data when MULTI_PROCES enabled. Input file is 13MB 500000 lines csv file and contains parse error data like "" at 153249 line. And the control file is written like follows,
Also error message I got is following.
I think the main cause of this problem is that AsyncSource could not allocate new memory in appropriate memory context when expanding read buffer. That memory context AsyncSource allocates for the read buffer could be switched by main thread and the current pg_bulkload doesn't take care about it. Because pg_bulkload resets the used memory context when parse error occurred and then clean up the read buffer, pg_bulkload could try to free the memory that is already reseted by MemoryContextReset().
The text was updated successfully, but these errors were encountered: