Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGV when MULTI_PROCESS enabled #27

Closed
MasahikoSawada opened this issue Dec 16, 2016 · 7 comments
Closed

SEGV when MULTI_PROCESS enabled #27

MasahikoSawada opened this issue Dec 16, 2016 · 7 comments

Comments

@MasahikoSawada
Copy link

Hi,

I got SEGV error with large input file that includes parse error data when MULTI_PROCES enabled. Input file is 13MB 500000 lines csv file and contains parse error data like "" at 153249 line. And the control file is written like follows,

OUTPUT = wikipedia_title                   # [<schema_name>.]table_name
#INPUT = /path/to/test.csv  # Input data location (absolute path)
TYPE = CSV                            # Input file type
QUOTE = "\""                          # Quoting character
ESCAPE = \                            # Escape character for Quoting
DELIMITER = ","                       # Delimiter
WRITER = BUFFERED
VERBOSE = YES
MULTI_PROCESS = YES

Also error message I got is following.

WARNING:  Parse error Record 1: Input Record 153249: Rejected - column 92. unterminated CSV quoted field
WARNING:  Maximum parse error count exceeded - 1 error(s) found in input file
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: LOG:  server process (PID 88994) was terminated by signal 11: Segmentation fault

I think the main cause of this problem is that AsyncSource could not allocate new memory in appropriate memory context when expanding read buffer. That memory context AsyncSource allocates for the read buffer could be switched by main thread and the current pg_bulkload doesn't take care about it. Because pg_bulkload resets the used memory context when parse error occurred and then clean up the read buffer, pg_bulkload could try to free the memory that is already reseted by MemoryContextReset().

amitlan added a commit that referenced this issue Dec 19, 2016
AsyncSource mechanism which uses a separate thread to read data can
sometimes try to manipulate an address that may be invalid because of
memory context reset or delete by the main thread.

Fix that by implementing a dedicated memory context for AsyncSource,
which is only manipulated in the reading thread.

Fixes #27.
@amitlan
Copy link
Contributor

amitlan commented Dec 19, 2016

Thanks for the report!

One solution might be to have a dedicated memory context for AsyncSource, such that, whenever it needs to allocate memory for its buffer, it allocates from the dedicated context and frees the old memory previously allocated in the same context. AsyncSourceClose() should delete the context at the end, or if an error occurs.

Could you review the PR #28?

@MasahikoSawada
Copy link
Author

Thank you for the reply!

One solution might be to have a dedicated memory context for AsyncSource, such that, whenever it needs to allocate memory for its buffer, it allocates from the dedicated context and frees the old memory previously allocated in the same context. AsyncSourceClose() should delete the context at the end, or if an error occurs.

This is the same as what I imagined. PR #28 looks good but because I'm not familiar with pg_bulkload code I'm not sure whether current approach is safe. Is it possible that the memory context is changed unexpectedly from AsyncSource to another memory context like TupleChecker by main thread after switched to AsyncSource?

@amitlan
Copy link
Contributor

amitlan commented Dec 21, 2016

To answer your question - you'll see that we switch back to the AsyncSource context every time we need to palloc for AsyncSource's needs. We also pfree the previously allocated memory while we are in the same context. When error occurs, like in case of this issue, we simply delete the context using MemoryContextDelete(), which we know is our own.

Also, as I commented on PR #28, the problem is not caused by interaction of different threads. It's rather the lack of coordination between different modules of pg_bulkload (such as parser, reader, source). Currently, we do have a dedicated context for parallel writer named "ParallelWriter", which I think somebody designed to take care of such hazard. With this PR, we do that for AsyncSource.

I noticed however that, context is a member of base class Writer, instead of ParallelWriter. Maybe, we should do something similar in case of AsyncSource. That is, make the newly introduced context a member of base class Source, instead of AsyncSource. But then again, only AsyncSource seems to need to do palloc, not other sources like FileSource and RemoteSource.

@MasahikoSawada
Copy link
Author

I've now understood what happened exactly. You're right. Reading source file is done by child thread but extending read buffer is done by main thread. The main thread extends the read buffer after switched memory context to "ParallelWriter", so it's cause of SEGV.

I noticed however that, context is a member of base class Writer, instead of ParallelWriter. Maybe, we should do something similar in case of AsyncSource. That is, make the newly introduced context a member of base class Source, instead of AsyncSource. But then again, only AsyncSource seems to need to do palloc, not other sources like FileSource and RemoteSource.

ISTM that AsyncSource can have context as current your PR does, or can we use repalloc instead?

@amitlan
Copy link
Contributor

amitlan commented Dec 21, 2016

ISTM that AsyncSource can have context as current your PR does, or can we use repalloc instead?

I assume you mean, repalloc(self->buffer, newsize) within the proposed dedicated context, instead of the following stanza:

        newbuf = palloc0(newsize);
        /* copy it in new buffer from old buffer */
        pthread_mutex_lock(&self->lock);
        if (self->begin > self->end)
        {
            memcpy(newbuf, self->buffer + self->begin,
                   self->size - self->begin);
            memcpy(newbuf + self->size - self->begin, self->buffer, self->end);
            self->end = self->size - self->begin + self->end;
        }
        else
        {
            memcpy(newbuf, self->buffer + self->begin, self->end - self->begin);
            self->end = self->end - self->begin;
        }
        pfree(self->buffer);
        self->buffer = newbuf;
        self->size = newsize;
        self->begin = 0;
        pthread_mutex_unlock(&self->lock);

Due to involvement of the locking between threads here, I'm not immediately sure if that's going to be straightforward. It could be made to work with enough attention though.

@MasahikoSawada
Copy link
Author

I assume you mean, repalloc(self->buffer, newsize) within the proposed dedicated context, instead of the following stanza:

Yes. I guessed that we can use repalloc(self->buffer, newsize) instead of switching memory context and then executing palloc0, memcpy and pfree. But ISTM that your original PR approach is safer.

amitlan added a commit that referenced this issue Dec 27, 2016
AsyncSource mechanism which uses a separate thread to read data can
sometimes try to manipulate an address that may be invalid because of
memory context reset or delete by the main thread.

Fix that by implementing a dedicated memory context for AsyncSource,
which is only manipulated in the reading thread.

Fixes #27.
@amitlan
Copy link
Contributor

amitlan commented Dec 27, 2016

OK, merged the PR for now. Thanks for the review!

amitlan added a commit that referenced this issue Dec 27, 2016
AsyncSource mechanism which uses a separate thread to read data can
sometimes try to manipulate an address that may be invalid because of
memory context reset or delete by the main thread.

Fix that by implementing a dedicated memory context for AsyncSource,
which is only manipulated in the reading thread.

Fixes #27.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants