GSoC 2024: WAL Infrastructure #288
Replies: 5 comments 8 replies
-
Great ! Try and do a setup where both compression and encryption are enabled such that you can make sure that both scenarios are covered when reading the WAL files. Welcome to GSoC 2024 ! |
Beta Was this translation helpful? Give feedback.
-
Hi @jesperpedersen, I know that On the other hand, I started reading the Based on this, I see two areas that I should choose to put my focus on first: In general, I guess the first approach is more like a low-hanging fruit compared to the second one. Would be happy to know your opinion and corrections where I'm wrong, so we can set the goals and create the issues needed based on the chosen path. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi @jesperpedersen, I wanted to create a main branch for the whole WAL parsing functionality, so finally it will be merged to |
Beta Was this translation helpful? Give feedback.
-
I think you are making good progress ! Remember that you have to support the version range so,
Also, it would be good to have a work branch where your commits are squashed, and rebased against main - this will make it easier to test. See https://github.com/pgmoneta/pgmoneta/blob/main/doc/DEVELOPERS.md Use license headers, and uncrustify to get your coding style closer to the official one. |
Beta Was this translation helpful? Give feedback.
-
@sh-soltanpour How is it going ? |
Beta Was this translation helpful? Give feedback.
-
Intro
Hi everyone,
My name is Shahryar (pronounced as shah-ree-yar), and I'm selected as a contributor for
GSoC 2024
in thePostgreSQL
organization andPgmoneta - WAL Infrastructure
project.In this project, we want to implement an infrastructure in Pgmoneta that can read and parse the WAL files into WAL records. When our infrastructure has a deep understanding of the WAL files and records, we can apply the appropriate WAL records on a base backup that we have, and we can reach the desired point in the restored database, which is a very helpful feature to achieve point-in-time-recovery (PiTR). As a bonus goal, we can even extend this understanding from a physical to a logical understanding.
By the end of GSoC 2024, I’m aiming to achieve these items:
Milestone 1: Implement an in-house infrastructure for parsing WAL segment files to WAL records.
Milestone 2: Being able to restore the database using a base backup and applying a series of WAL records to get to the required point.
(Bonus) Milestone 3: Extract logical information from WAL records and implement a policy definition framework so developers can define some custom logic for their backup database and filter some tables, rows, etc.
For more details, feel free to check out my initial proposal for this project here.
I would be more than happy to receive any suggestions or ideas here or by sending them to my email: shahryar.soltanpour@gmail.com
I will update the sections below during my progress in this project.
List of the work done
My main work branch for the whole WAL parsing functionality is: parse-wal-records.
Progress Log
23/June/2024
rm_desc
function for different resource managers, until now, I have implemented it for the following list:16/June/2024
rm_desc
forBtree
records followingStandby
.Heap
andBtree
records, I spent most of my time this week figuring out the issue and fixing it, and was successful at the end :). The problem is that some of the flags such asBKPIMAGE_IS_COMPRESSED
are used in Postgres 14 but are changed in the newer versions, I was parsing a record generated by Postgres 14 with the flags from the latest version in here.BkpBlock
functionality and compared it to the actualpg_waldump
and verified that it's working correctly.parse-wal-records-squashed
branch here.09/June/2024
MAXALIGN
function, and this function aligned the mentioned index from 90 to 96. Afterward, the next records were being parsed correctly.pg_waldump
, I also compared and verified the record total length.rm_desc
function for theStandby
resource manager, I want to use this to make sure that the data is also parsed correctly, This was successful for theStandby
resource manager, and for the next week, I'm going to implement it for other resource managers as well. Note that the code still needs a lot of cleaning, I'm planning to refactor it at some point before merging and after where we have the basic expected functionality.02/June/2024
XLogRecord
. but I can't use thexl_tot_len
field to jump over the first record and get the secondXLogRecord
. For instance, I'm at offset 40 of the first page and I read the firstXLogRecord
, and thexl_tot_len
is 50, so I expect the next record to start at offset 90 = 40 + 50, but the parsed value in that offset is not correct, I also tried various ways such as 40 + 50 - 24 or 40 + 50 + 24 (24 is the header size), or even and removing 2 padding bytes, but none of them worked.XLogFindNextRecord
,ReadInternalPage
, and other dependent functions in thepg_waldump
source code. The difficult part of this process is figuring out what is useful in our case and what should be skipped, as well as navigating through a lot of nested functions.26/May/2024
XLogLongPageHeaderData
, andXLogPageHeaderData
). Verified the result by comparing thexlp_magic
value against the expected value.XLogRecord
fixed header of each record, until now, I was able to parse the fixed header of the first record and figured out how to get the resource manager data from the parsed header.20/May/2024
Find the longer version here, in summary:
pg_waldump
source code.pg_rewind
and watch some videos on how it can be used in production.17/May/2024
1- I have been able to run Pgmoneta on my Mac (Kudos to @GuChad369 for this PR, it made my life easier a lot so I don't need to run a Fedora OS virtually).
2- As @jesperpedersen suggested, I have enabled compression and encryption to make sure these scenarios are covered.
3- Have gone through the
wal.c
file in the project, to get some idea about how the WAL file is streamed.For the next step, I want to parse the content of a WAL file streamed into Pgmoneta and load it to WAL records (issue: #296).
In general, I think I have a roadmap in my mind about what issues need to be created for the project to be done, but I need to organize my thoughts and ideas until May 20th, then I can create the list of issues here in GitHub so the mentors can comment on it and we may also be able to use the community help to do some of them.
Beta Was this translation helpful? Give feedback.
All reactions