New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fedmsg - some messages are not registered by the plugin #206
Comments
@vojtechsokol , this is normal behavior for the plugin. Can I get access to the Jenkins instance to take a look? |
@vojtechsokol , no it shouldn't miss messages. I mean that resubscribing after 1 hour of no messages is normal, and those are the messages you are seeing in the log. That behavior and those messages are expected. Let me take a look. |
@vojtechsokol , can you give me an example of a message that was handled and one that was missed? I don't see anything obvious yet. |
@ggallen I've created short script that dumps builds of the Jenkins job and messages from datagrepper: https://gist.github.com/vojtechsokol/e427c65d541ac294d3170df7a133ac9c. When you compare the resulting files (
|
@vojtechsokol , information overload! All I want it a job, a message that triggered, and a message that didn't. I don't see how to find the actual messages in all the data the script produces. |
@vojtechsokol , I just checked your jms-messaging-plugin version and you are running 1.1.9. The current version is 1.1.18. Could you please update to that version and see if things improve? 1.9 was released over a year ago, and there have been a lot of changes since then. |
@ggallen After updating the plugin to 1.1.18 it still misses some messages. https://apps.fedoraproject.org/datagrepper/id?id=2020-b1d06e1e-1984-4962-b3e7-0a739b3c72ee&size=extra-large https://apps.fedoraproject.org/datagrepper/id?id=2020-2d2003b7-e6be-4815-9829-960f124a533d&size=extra-large missed messages: |
@ggallen should we provide more information here pls? This is becoming a blocker for migrating to these new jobs which use github -> fedsmg as triggers ... |
There's nothing obvious I can see here. This is going to require some more in-depth investigation. Can you establish any sort of a pattern for when messages are missed vs. when they are received? @scoheb , any thoughts? |
@vojtechsokol contact me offline to discuss. |
actually can you please check the status of the webhook deliveries at https://github.com/ceph/ceph/settings/hooks for https://apps.fedoraproject.org/github2fedmsg/webhook ? It is possible there may be some 404 errors. There might be some continual data centre migration happening. |
It does seem that there may be an issue with your missed messages script. I found that build 2520, 2519, 2518 were all triggered but do not show up as matched in output. |
@scoheb Indeed, there was a bug in the script, It should work properly now. Regarding the status of the webhook - I have access only to the oamg/leapp and oam/leapp-repository and the are two failures (500 Internal Server Error) in last 150 deliveries - however I don't think it relates to our problem - those events didn't even make it to the fedmsg. @ggallen No idea at all if there is any pattern. The only hint I have is that maybe it affects more topics with smaller volume of messages - job |
The problem was traced to some proxies behind the fedora hub that were not working correctly. This caused the issues you were seeing since it all depended on which of the 10 proxies you connected to in Jenkins when the job subscribed to a topic. Only 2 of the 10 had been working correctly. This has since been fixed on the fedora infra side. All proxies are working well now. Please see https://pagure.io/fedora-infrastructure/issue/9363 |
@scoheb thanks a lot for hunting this down! |
@scoheb \o/ Thanks for helping to fix that! |
Hi,
our team uses JMS plugin to consume fedmsg messages with
org.fedoraproject.prod.github.*
topics (forwarded to fedmsg by github2fedmsg app).Everything works fine, but after a while no jobs are triggered by the plugin. One hour after the last message was received, the plugin resubscribes the job to the topic and it works again for a while.
When this happens, Jenkins log contains these entries but no errors:
The text was updated successfully, but these errors were encountered: