Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

walSegmentFind is expensive #2289

Open
rdunklau opened this issue Feb 23, 2024 · 9 comments
Open

walSegmentFind is expensive #2289

rdunklau opened this issue Feb 23, 2024 · 9 comments
Assignees
Labels

Comments

@rdunklau
Copy link

Hello,

While performing some tests involving pgbackrest, I noticed it performs some expensive operations during wal archiving.
For every WAL file we archive on a cloud storage provider, we call walSegmentFind which performs a list (trying to find all segments possibly matching the one we want to archive, with a regexp on the checksum and compression extension), then actually store the object.

At least for GCS and S3, this list operation is expensive, and amounts to a big share of the costs associated to WAL archiving.

One way to reduce this could be to take advantage of the fact that those object storage allows for metadata to be associated to an object. We could then keep encoding the checksum and compression scheme in the archived WAL filename on posix storage (or other storage that doesn't support metadata) but rely on a unique name for the WAL files stored on an object storage instead of encoding the metadata in the filename.

This would allow to use the API to only fetch the metadata (for example HEAD on S3) to both check for the file presence and fetch it's checksum, which is much cheaper at least for those two.

I understand this is quite a big change, but what do you think about that ?

@dwsteele dwsteele self-assigned this Feb 23, 2024
@dwsteele
Copy link
Member

What version are you running? This should have been addressed in 2.48 by 9d3a605.

@rdunklau
Copy link
Author

I was running and reading from 2.50.

I may have misunderstood the code, but don't we end up calling walsegmentfind for every segment we want to push ? This at least matches what I've noticed, which is quite a lot of LIST operations on the remote storage.

My point is that the optimization for finding one wal segment could use something else than a list operation, namely metadata storage and retrieval to ensure both uniquenes of a wal file and a canonical key to archive them with. Right now this is not possible as the checksum and compression type are stored in the filename itself.

My apologies if I misunderstood how it works.

@dwsteele
Copy link
Member

I may have misunderstood the code, but don't we end up calling walsegmentfind for every segment we want to push ?

Yes, but a cache of prior results is kept so if we are looking for a lot of segments that are already present then the number of LIST commands should be greatly reduced. But if only one segment is found on each call to walSegmentFind() then a LIST is required each time. In this case we think the answer is to implement some sort of backoff, but we haven't done that yet.

My point is that the optimization for finding one wal segment could use something else than a list operation, namely metadata

This is a great idea (and we've thought about it) but it would break the repo format, so needs to be saved for a new repo format. Also, ultimately, it probably won't be practical so save the WAL segment with its exact name. If we do introduce a new repo format it will likely be lockless and so some suffixes will be required.

@rdunklau
Copy link
Author

Yes by definition for pushing a wal we expect it to not be archived already (or it's an error, and it's good pgbackrest detects it).

This is a great idea (and we've thought about it) but it would break the repo format, so needs to be saved for a new repo format. Also, ultimately, it probably won't be practical so save the WAL segment with its exact name. If we do introduce a new repo format it will likely be lockless and so some suffixes will be required.

Glad it's under consideration ! I'm not sure I understand the point about the required suffixes but as long as a simple HEAD http request can be performed to make sure a wal is present it will greatly reduce the monetary costs associated with archiving.

Do you think an option to remove the pre-push check could be a sensible thing to add in the meantime ?

@dwsteele
Copy link
Member

Glad it's under consideration ! I'm not sure I understand the point about the required suffixes but as long as a simple HEAD http request can be performed to make sure a wal is present it will greatly reduce the monetary costs associated with archiving.

Like I said, this might not be possible, depending on what we need to do to safely allow lockless archive-push. But this is all for future consideration.

Do you think an option to remove the pre-push check could be a sensible thing to add in the meantime ?

Not sure what you mean by pre-push? You can certainly use archive-check=n but I wouldn't recommend it in general.

@rdunklau
Copy link
Author

Not sure what you mean by pre-push? You can certainly use archive-check=n but I wouldn't recommend it in general.

This is not what I meant, I was talking about the check done here:

walSegmentFile = walSegmentFindOne(storageRepoIdx(repoData->repoIdx), repoData->archiveId, archiveFile, 0);

This means that we will in the majority of cases perform a list command only to know we can proceed with a push.
For reducing costs, I suggest an option leaving that check out. Then the detection of two different wal segment on the same repo can be performed outside the archive-push command itself, validating the whole repo in one go. This does not seem so dangerous as the checksum is part of the walfile name, it mostly means some cleaning up will have to happen if an error is detected after the fact.

@dwsteele
Copy link
Member

This is not what I meant, I was talking about the check done here:

Ah, yes, you were talking about walSegmentFind() above so I was thinking about backup.

Then the detection of two different wal segment on the same repo can be performed outside the archive-push command itself, validating the whole repo in one go.

This is something we've thought about, but it's a lot of work, and an incompatible repo format change. To do this each segment would need to be tagged with its source (as part of the file name) so the user could figure out how to reconcile the unwanted segments.

But for now removing this safeguard would allow massive pollution of the repo during a split brain event with no way to figure out what had come from where. As such, I think it is a pretty bad idea to disable it until we are able to do the reconciliation.

@rdunklau
Copy link
Author

Thanks for your input.

This is something we've thought about, but it's a lot of work, and an incompatible repo format change. To do this each segment would need to be tagged with its source (as part of the file name) so the user could figure out how to reconcile the unwanted segments.

Can't we do that already by using the repo1-storage-tag configuration key ?

@dwsteele
Copy link
Member

Can't we do that already by using the repo1-storage-tag configuration key ?

Not on Posix. Anyway, the entire point is that you could have multiple writers and they would never overwrite each other. That won't work if the filename is the same. Maybe with versioning, but that presents other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants