Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fedmsg-hub pulls a huge amount of backlog from datagrepper when simply restarted #371

Open
AdamWill opened this issue Apr 15, 2016 · 3 comments

Comments

@AdamWill
Copy link
Contributor

When I restart fedmsg-hub on the openQA server boxes, it seems to pull a huge amount of data from datagrepper:

Apr 15 23:50:19 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:19][    fedmsg    INFO] Retrieving datagrepper page 350 of 700
Apr 15 23:50:19 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:19][    fedmsg    INFO] Retrieving datagrepper page 355 of 583
Apr 15 23:50:19 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:19][    fedmsg    INFO] Retrieving datagrepper page 351 of 700
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][    fedmsg    INFO] Retrieving datagrepper page 356 of 583
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][    fedmsg    INFO] Retrieving datagrepper page 352 of 700
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][    fedmsg    INFO] Retrieving datagrepper page 357 of 583
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][    fedmsg    INFO] Retrieving datagrepper page 353 of 700

that's just an extract, it really does apparently do two retrievals of 1283 pages of data (or whatever, the numbers differ each time). This is just after doing a simple 'systemctl restart fedmsg-hub'. I assume this is for the backlog processing stuff, but it seems to be going wrong.

@AdamWill
Copy link
Contributor Author

Note at the end it says:

Apr 15 23:51:33 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:51:33][    fedmsg    INFO] Retrieved 0 messages from datagrepper.

@ralphbean
Copy link
Contributor

Can you tell us anything about the modified time, permissions, and contents of files in /var/run/fedmsg/status/fedmsg-hub/? That's where it leaves itself a note for after restart.

@AdamWill
Copy link
Contributor Author

ls --l /var/[adamwill@openqa01 ~][PROD]$ ls -l /var/run/fedmsg/status/fedmsg-hub/
total 12
-rw-r--r--. 1 fedmsg fedmsg 2850 Apr 18 11:25 CheckComposeProductionConsumer
-rw-r--r--. 1 fedmsg fedmsg 2683 Apr 18 13:20 OpenQAConsumer
-rw-r--r--. 1 fedmsg fedmsg 2850 Apr 18 11:25 OpenQAProductionWikiConsumer
[adamwill@openqa01 ~][PROD]$ date -u
Mon Apr 18 14:17:48 UTC 2016

so those have all been touched quite recently (i.e. they're not just not getting updated, I guess). Each one contains a single fedmsg message that corresponds to the timestamp of the file.

Hmm. All three consumers trigger on fairly infrequent messages: OpenQAConsumer triggers on compose completion, the other two trigger on openQA job completion (which of course is closely tied to compose completion; after each tested compose there's a flood of openQA job completion messages for an hour or so, then nothing till the next compose). Is it just that when the service is restarted some time since the previous compose, it replays every message since the last openQA job complete message? Perhaps this function should log the last received message somehow, not just the last consumed message, if that's the case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants