Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telegraf not recovering after mosquitto restart. #6232

Closed
vincems opened this issue Aug 9, 2019 · 5 comments · Fixed by #6236
Closed

telegraf not recovering after mosquitto restart. #6232

vincems opened this issue Aug 9, 2019 · 5 comments · Fixed by #6236
Assignees
Labels
area/mqtt bug unexpected problem or unintended behavior
Milestone

Comments

@vincems
Copy link

vincems commented Aug 9, 2019

Relevant telegraf.conf:

[global_tags]
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  quiet = false
  logfile = "/var/log/telegraf/telegraf.log"
  hostname = ""
  omit_hostname = false
[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "dbname"
  username = "dbuser"
  password = "dbpass"
  user_agent = "telegraf"
  
[[inputs.mqtt_consumer]]
   servers = ["tcp://myhost:1883"]
   qos = 0
   connection_timeout = "30s"
   topics = [ "topic1" ]
   persistent_session = true
   client_id = "telegraf_things"
   username = "mqttuser"
   password = "mqttpass"
   data_format = "json"

System info:

mosquitto version 1.4.15 (build date 2018-07-24 13:34:50+1200)
InfluxDB v1.7.7 (git: 1.7 f8fdf652f348fc9980997fe1c972e2b79ddd13b0)
Telegraf 1.11.4 (git: HEAD d9ca76e)

maybe not relevant
Telegraf 1.11.4 (git: HEAD d9ca76e)

Steps to reproduce:

  1. Start mosquitto
  2. Start influxd
  3. Start telegraf
  4. Publish some data to the mosquitto topic
  5. Restart mosquitto
  6. Publish more data to the mosquitto topic

Expected behavior:

Watching the mosquitto log you can see the initial start up sequence:

1565310635: Sending CONNACK to telegraf_things (1, 0)
1565310635: Received SUBSCRIBE from telegraf_things
1565310635: Sending SUBACK to telegraf_things
1565310637: Sending PUBLISH to telegraf_things (d0, q0, r0, m0, 'topic1', ... (32 bytes))
...
restart mosquitto
...
1565310691: Sending CONNACK to telegraf_things (0, 0)
1565310635: Received SUBSCRIBE from telegraf_things
1565310635: Sending SUBACK to telegraf_things
1565310637: Sending PUBLISH to telegraf_things (d0, q0, r0, m0, 'topic1', ... (32 bytes))

Actual behavior:

1565310635: Sending CONNACK to telegraf_things (1, 0)
1565310635: Received SUBSCRIBE from telegraf_things
1565310635: Sending SUBACK to telegraf_things
1565310637: Sending PUBLISH to telegraf_things (d0, q0, r0, m0, 'topic1', ... (32 bytes))
...
restart mosquitto
...
1565310691: Sending CONNACK to telegraf_things (0, 0)
1565310751: Received PINGREQ from telegraf_things
1565310751: Sending PINGRESP to telegraf_things

Additional info:

If you wait long enough you will see the PINGs coming through but never see any subscription or published data.

excerpt from the telegraf log after restarting mosquitto (timestamps removed for clarity's sake).

E! [inputs.mqtt_consumer]: Error in plugin: connection lost: EOF
D! [inputs.mqtt_consumer] Disconnected [tcp://myhost:1883]
D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics.
D! [inputs.mqtt_consumer] Connecting [tcp://myhost:1883]
I! [inputs.mqtt_consumer] Connected [tcp://myhost:1883]
D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics.

@danielnelson
Copy link
Contributor

Took a quick peek at the code, does disabling persistent session help?

@danielnelson danielnelson added area/mqtt bug unexpected problem or unintended behavior labels Aug 9, 2019
@danielnelson
Copy link
Contributor

If so I think we just need to make use of the enhancement mentioned here in the paho mqtt library. eclipse-paho/paho.mqtt.golang#240

@vincems
Copy link
Author

vincems commented Aug 9, 2019 via email

@vincems
Copy link
Author

vincems commented Aug 9, 2019

Despite being far from an adept go developer - and pretty new to this code, I've been doing more digging into this. It appears that what we're seeing here is an intentional design decision made by the paho.mqtt client developers.

See post eclipse-paho/paho.mqtt.c#221

In which case I believe the approach taken in the application code
mqtt_consumer.go lines 178-193 needs to be looked at. If the paho library doesn't re-subscribe automatically then it's up to the client application to do so.

Most of the work is already done... just need to flag the loss of the session, then re-subscribe when re-established and bob's your antie.

@vincems
Copy link
Author

vincems commented Aug 9, 2019

unless of course said enhancement happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/mqtt bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants