Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agents not (re-)established on collector #342

Closed
webjoel opened this issue Mar 28, 2018 · 6 comments
Closed

Agents not (re-)established on collector #342

webjoel opened this issue Mar 28, 2018 · 6 comments
Labels
Milestone

Comments

@webjoel
Copy link

webjoel commented Mar 28, 2018

I had to disconnect and reconnect the glowroot collector as shown below, but the agents were not able to recover, only the agents "groupB :: server4" and "groupB :: server8" succeeded because I restarted JBoss, those who did not restart JBoss reestablished connection but did not send any more monitoring data.

Logs "Central Collector":

2018-03-27 20:22:44.677 INFO  org.glowroot - Glowroot version: 0.10.4, built 2018-03-05 05:02:59 +0000
2018-03-27 20:22:44.685 INFO  org.glowroot - Java version: 1.8.0_161
2018-03-27 20:22:45.165 INFO  c.datastax.driver.core.ClockFactory - Using native clock to generate timestamps.
2018-03-27 20:22:45.209 INFO  c.d.driver.core.GuavaCompatibility - Detected Guava >= 19 in the classpath, using modern compatibility layer
2018-03-27 20:22:45.516 INFO  com.datastax.driver.core.NettyUtil - Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
2018-03-27 20:22:46.232 WARN  org.glowroot.central.CentralModule - waiting for Cassandra (127.0.0.1) ...
2018-03-27 20:22:52.790 INFO  c.d.d.c.p.DCAwareRoundRobinPolicy - Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
2018-03-27 20:22:52.796 INFO  com.datastax.driver.core.Cluster - New Cassandra host /127.0.0.1:9042 added
2018-03-27 20:22:53.018 INFO  org.glowroot - connected to Cassandra (version 3.11.2), using keyspace 'glowroot' (replication factor 1) and consistency level QUORUM
2018-03-27 20:22:53.531 INFO  org.glowroot - creating glowroot central schema ...
2018-03-27 20:23:16.297 INFO  org.glowroot - glowroot central schema created
2018-03-27 20:23:16.612 INFO  org.glowroot - gRPC listening on 0.0.0.0:8181
2018-03-27 20:23:17.289 INFO  org.glowroot - UI listening on 0.0.0.0:4000
2018-03-27 20:23:17.289 INFO  org.glowroot - startup complete
2018-03-27 20:23:20.812 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupA :: server1
2018-03-27 20:23:21.544 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupB :: server2
2018-03-27 20:23:22.892 INFO  com.datastax.driver.core.utils.UUIDs - PID obtained through native call to getpid(): 28
2018-03-27 20:23:25.460 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupA :: server3
2018-03-27 20:23:26.450 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupB :: server4
2018-03-27 20:23:26.526 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupB :: server5
2018-03-27 20:23:26.598 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupA :: server6
2018-03-27 20:23:32.459 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupB :: server7
2018-03-27 20:23:36.489 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupA :: server9
2018-03-27 20:23:38.475 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupB :: server8
2018-03-28 01:00:07.138 INFO  o.g.central.DownstreamServiceImpl - downstream connection lost with agent: groupB :: server4
2018-03-28 01:00:07.621 INFO  o.g.central.DownstreamServiceImpl - downstream connection lost with agent: groupB :: server7
2018-03-28 01:00:07.766 INFO  o.g.central.DownstreamServiceImpl - downstream connection lost with agent: groupB :: server8
2018-03-28 06:17:13.515 INFO  o.g.central.DownstreamServiceImpl - downstream connection lost with agent: groupA :: server6
2018-03-28 06:32:44.004 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupA :: server6
2018-03-28 09:00:25.848 INFO  o.g.central.CollectorServiceImpl - agent connected: groupB :: server4, version 0.10.4, built 2018-03-05 05:02:59 +0000
2018-03-28 09:00:25.863 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupB :: server4
2018-03-28 09:00:27.287 INFO  o.g.central.CollectorServiceImpl - agent connected: groupB :: server8, version 0.10.4, built 2018-03-05 05:02:59 +0000
2018-03-28 09:00:27.308 INFO  o.g.central.DownstreamServiceImpl - downstream connection (re-)established with agent: groupB :: server8

Logs agent "groupB :: server2":

2018-03-27 20:21:31,893 INFO  [stdout] (Glowroot-GRPC-Executor) 2018-03-27 17:21:31.893 WARN  o.g.agent.central.CentralConnection - unable to send data to the central collector: UNKNOWN (this warning will be logged at most once a minute, 11 warnings were suppressed since it was last logged)
2018-03-27 20:22:18,483 INFO  [stdout] (Glowroot-GRPC-Executor) 2018-03-27 17:22:18.483 WARN  o.g.a.c.DownstreamServiceObserver - lost connection to the central collector (will keep trying to re-establish...): Connection refused
2018-03-27 20:23:21,546 INFO  [stdout] (Glowroot-GRPC-Executor) 2018-03-27 17:23:21.545 INFO  o.g.a.c.DownstreamServiceObserver - re-established connection to the central collector
@trask
Copy link
Member

trask commented Mar 28, 2018

Hi @webjoel, thanks for reporting this! It looks like you re-created your Cassandra schema?

2018-03-27 20:22:53.531 INFO  org.glowroot - creating glowroot central schema ...
2018-03-27 20:23:16.297 INFO  org.glowroot - glowroot central schema created

That would explain what you are seeing, because the agents only send over their config as part of collectInit() when they start up, and without any config stored from the agent (after re-creating the Cassandra schema), the central collector is not recording those agents as having data during that time (https://github.com/glowroot/glowroot/blob/v0.10.5/central/src/main/java/org/glowroot/central/repo/AgentDao.java#L143).

The good news is that the agents are still reporting data, and the collector is still storing the data. It's just that the agents are not showing up in the UI agent dropdown.

You should be able to bypass the UI agent dropdown and see the missing data in the UI by manually modifying the agent-id= or agent-rollup-id= portion of your URL when in the UI.

This behavior is definitely confusing (and has happened to me before when testing), I'll look into improving it.

@webjoel
Copy link
Author

webjoel commented Mar 29, 2018

Yes, my central collector and cassandra is in the same container docker and my disk was full and I had to restart the container erasing the data, is there another way I can clean this data? I reconfigured to store for fewer days for now.

You're right, access by url parameter is working.

@trask
Copy link
Member

trask commented Mar 30, 2018

Hi @webjoel, check this out for how to delete all the data while retaining the configuration:

#310 (comment)

@webjoel
Copy link
Author

webjoel commented Apr 12, 2018

Is it possible for you to provide a cleanup script for this? Or a feature for this, where I can clear this data for a particular period? Example: A cassandra cleanup without config tables.

Because the collector has reestablished the connection and is receiving data, it would be sufficient to re-create the agent configuration with the default data so that it appears in the filters, right?

trask added a commit that referenced this issue Apr 19, 2018
@trask
Copy link
Member

trask commented Apr 19, 2018

Hi @webjoel, utility to wipe out all collected data (but preserve config) is now available in 0.10.8:

java -jar glowroot-central.jar truncate-all-data

@trask trask added the bug label May 16, 2018
@trask trask added this to the v0.10.10 milestone May 16, 2018
@trask trask closed this as completed in 89f0fd8 May 16, 2018
@trask
Copy link
Member

trask commented May 16, 2018

Hi @webjoel, thanks again for reporting this! In the latest agent and central collector snapshots, the central collector will ask for the agent to re-send the AgentConfig and Environment data if it doesn't already have them (e.g. after completely wiping the cassandra data instead of using java -jar glowroot-central truncate-all-data).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants