Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster join never happens for manual clustering #380

Merged
merged 2 commits into from Sep 15, 2016

Conversation

Projects
None yet
7 participants
@akadoya
Copy link
Contributor

commented Jun 11, 2016

When using manual clustering in this cookbook, rabbitmq(e.g. rabbit1-ubuntu-1404) starts as a member of a single node cluster named as its own hostname.
And the current join action will never happen by just skipping with the warning message below.

[rabbitmq_cluster] Node is already member of rabbit@rabbit1-ubuntu-1404. Joining cluster will be skipped.

This is because the current join action process doesn't consider the current cluster name and only checks if the node is a part of whatever cluster or not.

This PR is to check if the node is a part of the desired cluster defined through the attribute ( node['rabbitmq']['clustering']['cluster_name'] ) so that the join action will happen if the node is a part of the wrong/(in my case, default=hostname) cluster.

@satyabhan

This comment has been minimized.

Copy link

commented Jun 16, 2016

I ran into the same issue today while testing. Since the node is already part of its own cluster at the start joined_cluster always returns true and doesn't actually attempt to join the desired cluster. The proposed fix should work.

@satyabhan

This comment has been minimized.

Copy link

commented Jun 17, 2016

Added proof that the fix is working and creating the manual cluster correctly

image

@jjasghar

This comment has been minimized.

Copy link
Collaborator

commented Jul 28, 2016

Can you add some unit tests around this? I know my future-self will probably break this if it's not protected via a test.

@akadoya

This comment has been minimized.

Copy link
Contributor Author

commented Sep 1, 2016

Hi @jjasghar
I've been thinking about the best way to create a unit test for this logic but actually it's really difficult to make it.
when you test this recipe by building a single node cluster, it'd not hit join action and it's what it's supposed to be. (because this node tries to join itself)

The issue of the current logic was because it checks only if the node is a part of some cluster or not and skipped join process - so the missing test point here is to verify if the node is a part of the cluster with the target peers as well as being a member of the cluster named after the attribute.
What I could think of to test this point is ;

  1. add a new kitchen configuration to test cluster recipe with multi-nodes <- this is how I noticed this issue by the way
  2. add an option attribute in rabbitmq_cluster provider to verify if node became a part of the desired cluster or not after join action and raise an error if not.

By both ways, you could notice in case clustering logic is not working as it's expected.
What would you think?

@richardnaik

This comment has been minimized.

Copy link

commented Sep 15, 2016

Is there any word on when this will be merged into master? I am still encountering this problem as of 4.9.0.

@michaelklishin

This comment has been minimized.

Copy link
Member

commented Sep 15, 2016

@jjasghar I can see how testing this can be a real pain. Can we consider merging this as is?

@jjasghar

This comment has been minimized.

Copy link
Collaborator

commented Sep 15, 2016

Yeah i'm good with this. I'll merge now, and get a release pushed to the supermarket here in the next day or so.

@jjasghar jjasghar merged commit a60b7a8 into rabbitmq:master Sep 15, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@richardnaik

This comment has been minimized.

Copy link

commented Sep 16, 2016

Thanks!

@mikeprigodich

This comment has been minimized.

Copy link

commented May 18, 2017

There still seems to be a bug in this fix. I had to make the following change to get the second node to join the first node because the second node is already in a cluster with itself:

<   elsif joined_cluster?(var_node_name, var_cluster_status) && current_cluster_name(var_cluster_status) == var_cluster_name
<     Chef::Log.warn("[rabbitmq_cluster] Node is already member of your desired cluster #{current_cluster_name(var_cluster_status)}. Joining cluster will be skipped.")
---
>   elsif joined_cluster?(var_node_name, var_cluster_status) &&
>         joined_cluster?(var_node_name_to_join, var_cluster_status) &&
>         current_cluster_name(var_cluster_status) == var_cluster_name
>     Chef::Log.warn('[rabbitmq_cluster] Node is already member of your desired cluster ' \
>                    "#{current_cluster_name(var_cluster_status)}. Joining cluster will be skipped.")```
@scalp42

This comment has been minimized.

Copy link
Contributor

commented May 26, 2017

@mikeprigodich could you open a PR by any chance?

@akadoya

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2017

@michaelklishin Maybe you're setting cluster name to the second node before joining the node to the first node?

jjasghar pushed a commit that referenced this pull request Jun 16, 2017

Cluster join never happens for manual clustering (#380)
* adding cluster name check for cluster join action

* fix typo and rubocop offences
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.