Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGReplay query limit ? #4

Closed
mvives-broadsign opened this issue May 4, 2018 · 7 comments
Closed

PGReplay query limit ? #4

mvives-broadsign opened this issue May 4, 2018 · 7 comments
Labels

Comments

@mvives-broadsign
Copy link
Contributor

Hi,
I'm currently working with pgreplay in order to evaluate a migration for my company and I am facing this kind of message:
Execution is 3 minutes behind schedule

I'm running pgreplay and postgres on two separate servers and the postgres server does not seem to have any load issue at all (CPU/RAM/I/O are good). The pgreplay server however has one CPU at 100% for a couple of hours now.
The pgreplay files I'm replaying is around 20M records on a ~20 hours timeframe. Only read only queries (It's traffic from a pg hot_standby).
Is it possible that we are hitting an issue where the machine running pgreplay is not powerful enough ? (c5.large on AWS)

P.S: I am also using the -j option of pgreplay but seeing that the replay is still running after 12+ hours, I don't think it changes something in our case :)

@koleo
Copy link

koleo commented May 4, 2018

Maybe you want to use the "-s" option to speed up the replay on the destination host?

I often use pgreplay -j -s 10000 [...] to replay a sql load as fast as possible.

@mvives-broadsign
Copy link
Contributor Author

I'm not that interested in speeding up the replay, I'm more interested in knowing if the message Execution is 3 minutes behind schedule comes because postgres is loaded or because pgreplay is loaded.. I tend to thinks it's pgreplay but I am trying to confirm now.
We expect to test our master server also but this one has 10 times the amount of queries so if pgreplay is too loaded, it's gonna be an issue.

@laurenz
Copy link
Owner

laurenz commented May 4, 2018

It means one of the following:

  • pgreplay is indeed overloaded and cannot cope with running that many statements simultaneously.

  • Your database is slower than the original database.

It is normal for pgreplay to keep one core busy, since it is constantly polling the database connections for messages from the database server; that is not necessarily a sign that it is not keeping up.

If execution does not fall more then 3 minutes behind schedule, I'd suspect that a couple of queries just took longer than expected. You might use a tool like pg_stat_statements or pgBadger to figure out which queries took long.

Usually, if the target system is consistently slower than the original database, you'll see pgreplay falling behind schedule more and more. If pgreplay does not fall more than 3 minutes behind schedule on a 20 hour run, I'd say there is nothing much to worry.

How many statements per second do you have? pgreplay tends to get overloaded if that goes into the thousands.

The option -j only speeds up execution if there are times without activity — it just skips these intervals instead of doing nothing.

Since pgreplay is single-threaded, it won't be faster if you run it on a machine with more cores.

@mvives-broadsign
Copy link
Contributor Author

Thanks for the explanations. I'll wait for the run to end in order to draw conclusions.
For the moment I'm at ~500 stmts/sec but when testing the next server (in a month probably ), I expect to reach ~5000stmts/secs. Will it be an issue ?

@laurenz
Copy link
Owner

laurenz commented May 4, 2018

I don't know the limits of pgreplay (never used it on such busy databases), but I have heard reports that it cannot keep up with very many statements per second.

Try it and give me feedback :^)

@mvives-broadsign
Copy link
Contributor Author

All right, I'll you know at that point ;)
In the meantime I'll try to finish the test with my current setup and see how far it drifts

Thanks for the quick answers

@mvives-broadsign
Copy link
Contributor Author

The test finishes somehow properly (I open another question for that) and does not drift more than 3 minutes after 22hours. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants