Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store WAL end pointer in the replication cursor #800

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
3 participants
@grunskis
Copy link
Contributor

commented Oct 30, 2018

We have a couple of logical replication daemons running that are built using the psycopg2 library. They're replicating different databases with very different load patterns. One of these DBs has lots of load 24/7 and another one that has writes only during the office hours. We're closely monitoring the replication lag to avoid running out of the disk space on each of the DB instances.

I observed that the replication lag grows pretty fast during the night hours on the DB that has writes only during the day. I'm not 100% sure why the DB is growing when there are no writes, but I assume that's due to RDS or our Postgres instance configuration. In any case, we started getting replication lag notifications so I started digging into the problem.

I found that the pg_stat_replication view gives us the last LSN that was sent to the logical replication client. If we know this LSN and if we know that there were no changes made to the tables we're interested in we could just confirm from the client using the send_feedback call that the server can recycle the WAL files. Otherwise we have to wait until there's a write to the DB that will trigger a ReplicationMessage that contains the data_start.

But I couldn't find any way to get the current WAL end in the psycopg2 ReplicationCursor. So this PR adds the current WAL end pointer the client has received from the server with this specific replication cursor.

Let me know if this makes sense and/or you would like this to be changed in any way...

@dvarrazzo

This comment has been minimized.

Copy link
Member

commented Oct 30, 2018

I see you have changed the docs suggesting to use wal_end. But I understand from your description that using flush_lsn=wal_end is only safe if we know there is no change in the tables. What happens otherwise? Will the WAL files be recycled too early? If that's the case maybe this is a dangerous use and should be documented as a specific one (in wal_end docs) instead of in the generic example?

@@ -44,6 +44,8 @@ typedef struct replicationCursorObject {
struct timeval last_io; /* timestamp of the last exchange with the server */
struct timeval keepalive_interval; /* interval for keepalive messages in replication mode */

XLogRecPtr wal_end; /* WAL end pointer from the last exchange with the server */

This comment has been minimized.

Copy link
@dvarrazzo

dvarrazzo Oct 30, 2018

Member

Usually new members in a data structure should be added to the end

static struct PyMemberDef replicationCursorObject_members[] = {
{"wal_end", T_ULONGLONG, OFFSETOF(wal_end), READONLY,
"LSN position of the current end of WAL on the server."},
{NULL}

This comment has been minimized.

Copy link
@dvarrazzo

dvarrazzo Oct 30, 2018

Member

Would there be value in adding attributes to access the other LSN members too?

This comment has been minimized.

Copy link
@grunskis

grunskis Oct 30, 2018

Author Contributor

I thought about adding them, but decided not to b/c I didn't need them and wanted to keep the PR as small as possible.

This comment has been minimized.

Copy link
@dvarrazzo

dvarrazzo Oct 30, 2018

Member

Makes sense to make it in a different branch, yes.

@grunskis

This comment has been minimized.

Copy link
Contributor Author

commented Oct 30, 2018

I see you have changed the docs suggesting to use wal_end. But I understand from your description that using flush_lsn=wal_end is only safe if we know there is no change in the tables. What happens otherwise? Will the WAL files be recycled too early? If that's the case maybe this is a dangerous use and should be documented as a specific one (in wal_end docs) instead of in the generic example?

I don't have good answers to all your questions which makes it a good reason to leave it out from the example and document it separately. I will do that...

My current understanding is that the server won't send keep-alive messages if there are changes to send and in that case we won't end up in the code that handles the socket timeout cases.

@@ -1735,6 +1739,7 @@ pq_read_replication_message(replicationCursorObject *repl, replicationMessageObj
goto exit;
}

repl->wal_end = wal_end;

This comment has been minimized.

Copy link
@dvarrazzo

dvarrazzo Oct 30, 2018

Member

Doing this here seems inconsistent. The variable may be uninitialized. Shouldn't it be done in the buffer[0] == 'w' block?

This comment has been minimized.

Copy link
@grunskis

grunskis Oct 30, 2018

Author Contributor

I think you get here only from buffer[0] == 'w' block and only if there was no error there. I'll move it inside the block to be more clear.

@dvarrazzo

This comment has been minimized.

Copy link
Member

commented Oct 30, 2018

I'm sorry I can't be able to offer better guidance as of the safety of flush_lsn=wal_end: I'm not entirely practical with the replication subtleties. Maybe it's worth a chat to the pgsql-hackers to get better wisdom? I believe in the value of the MR you are offering: I'm just thinking about the better way to present it to the user and suggest them how to use it.

grunskis-bonial
@grunskis

This comment has been minimized.

Copy link
Contributor Author

commented Dec 14, 2018

I'm sorry I can't be able to offer better guidance as of the safety of flush_lsn=wal_end: I'm not entirely practical with the replication subtleties. Maybe it's worth a chat to the pgsql-hackers to get better wisdom? I believe in the value of the MR you are offering: I'm just thinking about the better way to present it to the user and suggest them how to use it.

I've removed the documentation change and added a warning instead.

I've looked into the source code of the "official" demo app https://github.com/postgres/postgres/blob/master/src/bin/pg_basebackup/pg_recvlogical.c

They basically do the same thing there. In the keep-alive message they include either the last LSN client has written to disk or the latest walEnd value.

Sorry for taking too long to address the code review :)

@gclinch

This comment has been minimized.

Copy link

commented Mar 29, 2019

Having access to the latest keep-alive message's wal_end would be a very useful addition for us too - can I help to progress this request?

@grunskis

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2019

I would love to see this merged ;)

We are running this in production for several months already in couple of projects w/o any issues.

@dvarrazzo Any tips on how to proceed to get closer to including this into one of the next versions?

@dvarrazzo

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

Merging to master to release in 2.8 :)

@dvarrazzo dvarrazzo closed this in b76ff2f Mar 30, 2019

@dvarrazzo

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

Merged! Will be released with Psycopg 2.8 in a few days. Thank you for the feature and the testing, @grunskis!

@gclinch

This comment has been minimized.

Copy link

commented Mar 31, 2019

@dvarrazzo I don't think the documentation changes in commit b8bf6d9 correctly reflect this change.

The NEWS entry mentions ReplicationMessage, but this PR is altering ReplicationCursor.
Similarly extras.rst marks ReplicationMessage.wal_end as new in 2.8, but it's ReplicationCursor.wal_end that's new in 2.8

@dvarrazzo

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

@gclinch Oops, you are right. fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.