Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network-instance on host in Reconnecting state gets picked up for performing healtchesk #2622

Closed
alena1108 opened this issue Nov 9, 2015 · 4 comments
Assignees
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release

Comments

@alena1108
Copy link

Steps to reproduce:

  • have 2 hosts in the system.
  • bring down host2. Start instance with healtcheck enabled on host1. Bug: network agent on host2 gets pickeup as a healtchecker for the instance, even though host2 is currently reconnecting.
@alena1108 alena1108 added the kind/bug Issues that are defects reported by users or that we know have reached a real release label Nov 9, 2015
@alena1108 alena1108 self-assigned this Nov 9, 2015
@alena1108 alena1108 added this to the Milestone 11/18/2015 milestone Nov 9, 2015
@alena1108
Copy link
Author

Related issues:

#1173
#2516

@sangeethah
Copy link
Contributor

Tested with build from master:

Scenario 1:

  1. Had 2 hosts in the system - host1 and host2.
  2. Bring down host2 so that host is in "reconnecting" state.
  3. Start instance with healtcheck enabled with scale 2.
  4. Both the instances get marked as being healthy.
  5. Health check is assigned to host1 ( not host2 which is in "reconnecting" state.
mysql> select id,state,name,agent_state from host;
+----+--------+-------------------------------------------+--------------+
| id | state  | name                                      | agent_state  |
+----+--------+-------------------------------------------+--------------+
|  1 | purged | sangeemaster-10acre-1                     | purged       |
|  2 | active | sangeemaster-10acre-3                     | NULL         |
|  3 | active | sangeemaster-10acre-2                     | reconnecting |
|  4 | purged | ip-172-31-9-39.us-west-1.compute.internal | purged       |
+----+--------+-------------------------------------------+--------------+
4 rows in set (0.01 sec)

mysql> select health_state, host_id from healthcheck_instance_host_map where healthcheck_instance_id=(select id from healthcheck_instance where instance_id=206);
+--------------+---------+
| health_state | host_id |
+--------------+---------+
| healthy      |       2 |
+--------------+---------+
1 row in set (0.00 sec)

mysql> select health_state, host_id from healthcheck_instance_host_map where healthcheck_instance_id=(select id from healthcheck_instance where instance_id=207);
+--------------+---------+
| health_state | host_id |
+--------------+---------+
| healthy      |       2 |
+--------------+---------+
1 row in set (0.00 sec)

Scenario 2:

  1. Had 2 hosts in the system - host1 and host2.
  2. Bring down host2 so that host is in "reconnecting" state.
  3. Start a service with global flag enabled .
  4. Only 1 instances in spawned in host1 and service gets to "Active" state.

@sangeethah
Copy link
Contributor

In the following use case , I see that host that is in "reconnecting" state is being picked for monitoring health check:

Had a service with scale 2 and health check enabled. Both instances were running on host id - 3
Shutdown host 3 , so that it gets to "Reconnecting" state.

Health check marked both instance as "unhealthy".
1 of the instances gets deployed in the same host id - 3 which is in "reconnecting" state which is the bug tracked in #1173.
Another instance - 222 gets deployed in an host id - 2 which is active. But the host id -3 gets picked for monitoring this instance health and so the health_state of this instance is stuck in "Initializing".

mysql> select host_id,instance_id from instance_host_map where instance_id in (218,219,222,223);
+---------+-------------+
| host_id | instance_id |
+---------+-------------+
|       3 |         218 |
|       3 |         219 |
|       2 |         222 |
|       3 |         223 |
+---------+-------------+
4 rows in set (0.00 sec)

mysql> select id, name, state, agent_state from host;
+----+-------------------------------------------+--------+--------------+
| id | name                                      | state  | agent_state  |
+----+-------------------------------------------+--------+--------------+
|  1 | sangeemaster-10acre-1                     | purged | purged       |
|  2 | sangeemaster-10acre-3                     | active | active       |
|  3 | sangeemaster-10acre-2                     | active | reconnecting |
|  4 | ip-172-31-9-39.us-west-1.compute.internal | purged | purged       |
+----+-------------------------------------------+--------+--------------+
4 rows in set (0.00 sec)

mysql>  select health_state, host_id from healthcheck_instance_host_map where healthcheck_instance_id=(select id from healthcheck_instance where instance_id=222);
+--------------+---------+
| health_state | host_id |
+--------------+---------+
| healthy      |       3 |
+--------------+---------+
1 row in set (0.00 sec)

mysql> select id,name,state,health_state from instance where id in (218,219,222,223);
+-----+-----------------------------+----------+--------------+
| id  | name                        | state    | health_state |
+-----+-----------------------------+----------+--------------+
| 218 | test494388_newhealthcheck_1 | running  | unhealthy    |
| 219 | test494388_newhealthcheck_2 | stopping | unhealthy    |
| 222 | test494388_newhealthcheck_3 | running  | initializing |
| 223 | test494388_newhealthcheck_2 | starting | initializing |
+-----+-----------------------------+----------+--------------+

@sangeethah
Copy link
Contributor

Tested with latest build from master:
Host that is in "reconnecting" state is not being picked for monitoring health checks anymore. If there are existing instances that were monitored by host , once the host gets to "reconnecting" state , they get removed from healthcheck_instance_host_map and new host gets assigned for monitoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release
Projects
None yet
Development

No branches or pull requests

2 participants