Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR history file "00000202.history" contains 1024 lines, pg_autoctl only supports up to 1023 lines #991

Closed
dbamu opened this issue Jun 1, 2023 · 1 comment · Fixed by #995

Comments

@dbamu
Copy link

dbamu commented Jun 1, 2023

I have repeatedly performed failover tests.

test version
pgf 2.0
postgresql 13.10

step1. Generate loads using pgbench on the primary and secondary.
step2. The pg_autoctl perform failover command continues to be executed periodically.

Failover was performed repeatedly and then stopped.
The log is:

#### check current state of the formation
$ pg_autoctl show state --formation test
         Name |  Node |          Host:Port |         TLI: LSN |   Connection |      Reported State |      Assigned State
 -------------+-------+--------------------+------------------+--------------+---------------------+--------------------
dev-pgf200003 |     3 | dev-pgf200003:5432 | 514: E2/710000D8 |   read-write |        wait_primary |        wait_primary
dev-pgf200002 |    21 | dev-pgf200002:5432 |           1: 0/0 |       none ! |        wait_standby |          catchingup


#### after drop node, execute "pg_autoctl create postgres" command in secondary
$ pg_autoctl create postgres \
 --pgctl $CmdPath \ 
 --pgdata $PGDATA \
 --pghost `hostname` \
 --name `hostname` \
 --pgport 5432 \
 --hostname `hostname` \
 --formation test --skip-pg-hba --no-ssl --maximum-backup-rate 1024M --monitor postgres://autoctl_node@dev-pgf200001:5432/pg_auto_failover

10:32:59 130213 WARN  PG_REGRESS_SOCK_DIR is set to "$path", and our setup is using "dev-pgf200002"
10:32:59 130213 INFO  Continuing from a previous `pg_autoctl create` failed attempt
10:32:59 130213 INFO  PostgreSQL state at registration time was: PGDATA does not exist
10:32:59 130213 INFO  FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
10:32:59 130213 INFO  Initialising PostgreSQL as a hot standby
10:32:59 130213 WARN  PG_REGRESS_SOCK_DIR is set to "$path", and our setup is using "dev-pgf200003"
-10:32:59 130213 ERROR history file "00000202.history" contains 1024 lines, pg_autoctl only supports up to 1023 lines
10:32:59 130213 ERROR Failed to connect to the primary with a replication connection string. See above for details
10:32:59 130213 ERROR Failed to initialize standby server, see above for details
10:32:59 130213 ERROR Failed to transition from state "wait_standby" to state "catchingup", see above.
10:33:00 130203 ERROR pg_autoctl service node-init exited with exit status 12
10:33:00 130203 FATAL pg_autoctl service node-init has already been restarted 5 times in the last 1 seconds, stopping now
10:33:00 130205 INFO  Postgres controller service received signal SIGTERM, terminating
10:33:00 130203 FATAL Something went wrong in sub-process supervision, stopping now. See above for details.
10:33:00 130203 INFO  Stop pg_autoctl


#### check current state of the formation
$ pg_autoctl show state --formation test
         Name |  Node |          Host:Port |         TLI: LSN |   Connection |      Reported State |      Assigned State
 -------------+-------+--------------------+------------------+--------------+---------------------+--------------------
dev-pgf200003 |     3 | dev-pgf200003:5432 | 514: E2/710000D8 |   read-write |        wait_primary |        wait_primary
dev-pgf200002 |    21 | dev-pgf200002:5432 |           1: 0/0 |       none ! |        wait_standby |          catchingup


#### check timeline history file on primary
$ cat 00000202.history | tail -10

509     D0/3714D748     no recovery target specified

510     D0/B09A8F40     no recovery target specified

511     D0/FC976330     no recovery target specified

512     D1/A50307D8     no recovery target specified

513     D1/F89E3FB8     no recovery target specified

$ cat 00000202.history | wc -l
1025

#### remove empty string
$ sed -i '/^$/d' 00000202.history 

#### retry "pg_autoctl create postgres " command in secondary
$ pg_autoctl drop node

$ pg_autoctl create postgres \
 --pgctl $CmdPath \ 
 --pgdata $PGDATA \
 --pghost `hostname` \
 --name `hostname` \
 --pgport 5432 \
 --hostname `hostname` \
 --formation test --skip-pg-hba --no-ssl --maximum-backup-rate 1024M --monitor postgres://autoctl_node@dev-pgf200001:5432/pg_auto_failover

nohup pg_autoctl run >> /home1/postgres/db/pglog/pg_autoctl.log 2>&1 &


$ pg_autoctl show state --formation test
         Name |  Node |          Host:Port |         TLI: LSN |   Connection |      Reported State |      Assigned State
 -------------+-------+--------------------+------------------+--------------+---------------------+--------------------
dev-pgf200003 |     3 | dev-pgf200003:5432 | 514: E2/73000110 |   read-write |             primary |             primary
dev-pgf200002 |    21 | dev-pgf200002:5432 | 514: E2/73000110 |    read-only |           secondary |           secondary


Checking the source code, the maximum lines of the .history file is set to 1024.
#define PG_AUTOCTL_MAX_TIMELINES 1024

#define PG_AUTOCTL_MAX_TIMELINES 1024
typedef struct TimeLineHistoryEntry
{
uint32_t tli;
uint64_t begin; /* inclusive */
uint64_t end; /* exclusive, InvalidXLogRecPtr means infinity */
} TimeLineHistoryEntry;
typedef struct TimeLineHistory
{
int count;
TimeLineHistoryEntry history[PG_AUTOCTL_MAX_TIMELINES];
} TimeLineHistory;

I would like to know why you set PG_AUTOCTL_MAX_TIMELINES to 1024.

Information recorded in the timelineID.history file is not deleted.
As a result of the test, failover is performed up to 513 times.

If there is no reason to set PG_AUTOCTL_MAX_TIMELINES to 1024, could you modify the PG_AUTOCTL_MAX_TIMELINES value to a very large value (e.g 1048576(2^20))?

@hancci
Copy link

hancci commented Jun 1, 2023

I sincerely hope that this problem will be fixed.

dimitri pushed a commit that referenced this issue Jun 2, 2023
The previous coding used a statically allocated array of pointers to
newlines in that file, limiting our parsing abilities to files of 1024 lines
maximum. Apparently that's not a good limit, so use dynamically allocated
memory instead.

Fixes #991.
dimitri added a commit that referenced this issue Jun 2, 2023
The previous coding used a statically allocated array of pointers to
newlines in that file, limiting our parsing abilities to files of 1024 lines
maximum. Apparently that's not a good limit, so use dynamically allocated
memory instead.

Fixes #991.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants