Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rabbitmq::cluster resource type doesn't match the docs #498

Closed
mengesb opened this issue Aug 8, 2018 · 29 comments
Closed

rabbitmq::cluster resource type doesn't match the docs #498

mengesb opened this issue Aug 8, 2018 · 29 comments

Comments

@mengesb
Copy link

mengesb commented Aug 8, 2018

I'm trying to operate the cluster.rb recipe I'm having difficulty getting teh ndoes to cluster. This appears to be a recipe defect, or a defect in what I'm providing however it looks like I'm following available documentation.

RMQ: 3.6.9-1.el7
OS: RedHat 7.4
ERLANG: 19.3.4-1.el7
Cookbook version: 5.6.1
Chef: 13.9.4

Rendered node json attributes (bootstrap):

{
  "rabbitmq": {
    "clustering": {
      "cluster_nodes": [
        {
          "name": "rabbit@us1a-rmq1-kamehameha.domain.tld",
          "type": "disc"
        },
        {
          "name": "rabbit@us1a-rmq2-kamehameha.domain.tld",
          "type": "disc"
        },
        {
          "name": "rabbit@us1a-rmq3-kamehameha.domain.tld",
          "type": "disc"
        }
      ]
    }
  }
}

My wrapped recipe segment:

include_recipe 'rabbitmq::default'
include_recipe 'rabbitmq::cluster' if node['ap_rabbitmq']['service']['clustering']['auto']
include_recipe 'ap_rabbitmq::user'

rabbitmq_policy 'ha-all' do
  pattern    '^ha\.'
  parameters 'ha-mode' => 'all'
  action     :set
end if node['ap_rabbitmq']['service']['clustering']['enable']

include_recipe 'rabbitmq::plugin_management'
include_recipe 'rabbitmq::mgmt_console'

chef client run:

Recipe: rabbitmq::cluster
  * rabbitmq_cluster[[{"name":"rabbit@us1a-rmq1-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq2-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq3-kamehameha.domain.tld","type":"disc"}]] action set_cluster_name (up to date)
  * rabbitmq_cluster[[{"name":"rabbit@us1a-rmq1-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq2-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq3-kamehameha.andomain.tld","type":"disc"}]] action change_cluster_node_type
    
    ================================================================================
    Error executing action `change_cluster_node_type` on resource 'rabbitmq_cluster[[{"name":"rabbit@us1a-rmq1-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq2-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq3-kamehameha.domain.tld","type":"disc"}]]'
    ================================================================================
    
    NoMethodError
    -------------
    undefined method `[]' for nil:NilClass
    
    Cookbook Trace:
    ---------------
    /var/chef/cache/cookbooks/rabbitmq/providers/cluster.rb:259:in `block in class_from_file'
    
    Resource Declaration:
    ---------------------
    # In /var/chef/cache/cookbooks/rabbitmq/recipes/cluster.rb
    
     41: rabbitmq_cluster static_cluster_nodes do
     42:   cluster_name node['rabbitmq']['clustering']['cluster_name']
     43:   action [:set_cluster_name, :change_cluster_node_type]
     44: end
    
    Compiled Resource:
    ------------------
    # Declared in /var/chef/cache/cookbooks/rabbitmq/recipes/cluster.rb:41:in `from_file'
    
    rabbitmq_cluster("[{"name":"rabbit@us1a-rmq1-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq2-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq3-kamehameha.domain.tld","type":"disc"}]") do
      action [:set_cluster_name, :change_cluster_node_type]
      default_guard_interpreter :default
      declared_type :rabbitmq_cluster
      cookbook_name "rabbitmq"
      recipe_name "cluster"
      cluster_name "kamehamehaus1a"
      cluster_nodes "[{\"name\":\"rabbit@us1a-rmq1-kamehameha.domain.tld\",\"type\":\"disc\"},{\"name\":\"rabbit@us1a-rmq2-kamehameha.domain.tld\",\"type\":\"disc\"},{\"name\":\"rabbit@us1a-rmq3-kamehameha.domain.tld\",\"type\":\"disc\"}]"
    end
    
    System Info:
    ------------
    chef_version=13.9.4
    platform=redhat
    platform_version=7.4
    ruby=ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-linux]
    program_name=chef-client worker: ppid=1916;start=18:23:21;
    executable=/opt/chef/bin/chef-client
@michaelklishin
Copy link
Member

I'm afraid there isn't enough information to work with here. I don't see any evidence that this is "a cookbook defect". The resource is question is this one. How does the right hand side of the comparison end up being nil, I'm not sure. The left side cannot be since compact is called on the filtered collection.

On an unrelated note, mirroring to all nodes is excessive and usually unnecessary.

@michaelklishin
Copy link
Member

Highly relevant (but without any feedback that would help us reproduce or reach a conclusion): #476.

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

@michaelklishin I could certainly provide information.... I just am curious what to provide.

So in digging around the resources, I see this
https://github.com/rabbitmq/chef-cookbook/blob/v5.x/resources/cluster.rb#L26

I'm providing the cluster_nodes as an array, not a string, so it appears to be a data type issue but your example clearly is an array of hashes in your README.md

Could this be the problem? I'm unsure of the debugging required in order to provide more context around what you'd require - if you could tell me I'd appreciate it.

@michaelklishin
Copy link
Member

Yes, that must be it: an array of hashes would be iterable still but each element of such collection would be a character and not a map. So invoking a [] on it is expected to fail.

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

so what is the correct string to provide to the variable that will allow parsing? a string of an array of hashses? or is this a provider type issue that needs changing?

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

OK so the template clearly doesn't want a string:

Chef::Mixin::Template::TemplateError (undefined method `map' for #<String:0x0000000003f6af10>) on line #46:

 44: <% if node['rabbitmq']['clustering']['enable'] -%>
 45:     <%if node['rabbitmq']['clustering']['use_auto_clustering'] -%>
 46:     {cluster_nodes, {[<%= node['rabbitmq']['clustering']['cluster_nodes'].map { |n| "'#{n['name']}'"}.join(',') %>], disc}},
 47:     <% end %>
 48:     {cluster_partition_handling,<%= node['rabbitmq']['clustering']['cluster_partition_handling'] %>},

@michaelklishin
Copy link
Member

An array of hashes as demonstrated in the docs. I will correct the provided type.

@michaelklishin michaelklishin reopened this Aug 8, 2018
@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

My syntax matches the docs; so if there's a fix that I can publish and test with that syntax i can validate it quickly

@michaelklishin michaelklishin changed the title rabbitmq::cluster recipe issue rabbitmq::cluster resource type doesn't match the docs Aug 8, 2018
@michaelklishin
Copy link
Member

michaelklishin commented Aug 8, 2018

@mengesb please give the tip of master a try? (5204cc1)

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

trying now

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

looks like something in the provider converts that to a string

Recipe: rabbitmq::cluster
  * rabbitmq_cluster[[{"name":"rabbit@us1a-rmq1-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq2-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq3-kamehameha.domain.tld","type":"disc"}]] action set_cluster_name
    
    ================================================================================
    Error executing action `set_cluster_name` on resource 'rabbitmq_cluster[[{"name":"rabbit@us1a-rmq1-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq2-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq3-kamehameha.domain.tld","type":"disc"}]]'
    ================================================================================
    
    Chef::Exceptions::ValidationFailed
    ----------------------------------
    Option cluster_nodes must be a kind of [Array]!  You passed "[{\"name\":\"rabbit@us1a-rmq1-kamehameha.domain.tld\",\"type\":\"disc\"},{\"name\":\"rabbit@us1a-rmq2-kamehameha.domain.tld\",\"type\":\"disc\"},{\"name\":\"rabbit@us1a-rmq3-kamehameha.domain.tld\",\"type\":\"disc\"}]".
    
    Cookbook Trace:
    ---------------
    /var/chef/cache/cookbooks/rabbitmq/providers/cluster.rb:235:in `block in class_from_file'
    
    Resource Declaration:
    ---------------------
    # In /var/chef/cache/cookbooks/rabbitmq/recipes/cluster.rb
    
     41: rabbitmq_cluster static_cluster_nodes do
     42:   cluster_name node['rabbitmq']['clustering']['cluster_name']
     43:   action [:set_cluster_name, :change_cluster_node_type]
     44: end
    
    Compiled Resource:
    ------------------
    # Declared in /var/chef/cache/cookbooks/rabbitmq/recipes/cluster.rb:41:in `from_file'
    
    rabbitmq_cluster("[{"name":"rabbit@us1a-rmq1-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq2-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq3-kamehameha.domain.tld","type":"disc"}]") do
      action [:set_cluster_name, :change_cluster_node_type]
      default_guard_interpreter :default
      declared_type :rabbitmq_cluster
      cookbook_name "rabbitmq"
      recipe_name "cluster"
      cluster_name "kamehamehaus1a"
    end
    
    System Info:
    ------------
    chef_version=13.9.4
    platform=redhat
    platform_version=7.4
    ruby=ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-linux]
    program_name=chef-client worker: ppid=1924;start=21:10:04;
    executable=/opt/chef/bin/chef-client

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

I think name attributes must be a string, not sure they can be an array

@michaelklishin
Copy link
Member

set_cluster_name should do the right thing and convert its value to something more reasonable.

@michaelklishin
Copy link
Member

RabbitMQ cluster will use first node's name if no name is configured explicitly. set_cluster_name should do the same.

@michaelklishin
Copy link
Member

I pushed a couple more updates.

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

Alrighty

@michaelklishin
Copy link
Member

Please pull.

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

OK i pulled in your latest changes; and the cluster recipe never fired.

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

I see that the config included the list of cluster nodes

@michaelklishin
Copy link
Member

Do you have node['rabbitmq']['clustering']['use_auto_clustering'] enabled? All recent changes are in the cluster name area.

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

these are the attributes from the node object:

      "clustering": {
        "enable": true,
        "cluster_partition_handling": "autoheal",
        "use_auto_clustering": true,
        "cluster_name": "kamehamehaus1a",
        "cluster_nodes": [
          {
            "name": "rabbit@us1a-rmq1-kamehameha.domain.tld",
            "type": "disc"
          },
          {
            "name": "rabbit@us1a-rmq2-kamehameha.domain.tld",
            "type": "disc"
          },
          {
            "name": "rabbit@us1a-rmq3-kamehameha.domain.tld",
            "type": "disc"
          }
        ],

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

unless node['rabbitmq']['clustering']['cluster_name']
  target_name = if cluster_nodes.any?
                  # this is what RabbitMQ would do anyway
                  cluster_nodes.first.name
                else
                  'rabbitmq-cluster'
                end

  # Set cluster name to the first node's name, if any
  rabbitmq_cluster "rabbitmq-cluster-#{target_name}" do
    cluster_name target_name
    action [:set_cluster_name, :change_cluster_node_type]
  end
end

I'm setting a cluster_name so that will not fire

irb(main):009:0> cluster_name = "unnamed_cluster"
=> "unnamed_cluster"
irb(main):010:0> unless cluster_name
irb(main):011:1>   puts "this message is inside the unless statement"
irb(main):012:1> end
=> nil
irb(main):013:0> cluster_name = nil
=> nil
irb(main):014:0> unless cluster_name
irb(main):015:1>   puts "this message is inside the unless statement"
irb(main):016:1> end
this message is inside the unless statement
=> nil

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

Unsetting that exposes:

================================================================================
Recipe Compile Error in /var/chef/cache/cookbooks/ap_rabbitmq/recipes/default.rb
================================================================================

NoMethodError
-------------
undefined method `name' for #<Chef::Node::ImmutableMash:0x00000000040b9768>

Cookbook Trace:
---------------
  /var/chef/cache/cookbooks/rabbitmq/recipes/cluster.rb:38:in `from_file'
  /var/chef/cache/cookbooks/ap_rabbitmq/recipes/service.rb:25:in `from_file'
  /var/chef/cache/cookbooks/ap_rabbitmq/recipes/default.rb:22:in `from_file'

Relevant File Content:
----------------------
/var/chef/cache/cookbooks/rabbitmq/recipes/cluster.rb:

 31:      action :join
 32:    end
 33:  end
 34:
 35:  unless node['rabbitmq']['clustering']['cluster_name']
 36:    target_name = if cluster_nodes.any?
 37:                    # this is what RabbitMQ would do anyway
 38>>                   cluster_nodes.first.name
 39:                  else
 40:                    'rabbitmq-cluster'
 41:                  end
 42:
 43:    # Set cluster name to the first node's name, if any
 44:    rabbitmq_cluster "rabbitmq-cluster-#{target_name}" do
 45:      cluster_name target_name
 46:      action [:set_cluster_name, :change_cluster_node_type]
 47:    end

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

Looking like the root might be a confusiong between use_auto_clustering and just enabling cluster

@mengesb
Copy link
Author

mengesb commented Aug 8, 2018

So I went back to your 5.x branch; deployed 5.6.2; turned use_auto_clustering to false , but forgot to set the cluster_name from debugging 6.x series...

Recipe: rabbitmq::cluster
  * rabbitmq_cluster[[{"name":"rabbit@us1a-rmq1-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq2-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq3-kamehameha.domain.tld","type":"disc"}]]action set_cluster_name[2018-08-08T23:21:30+00:00] INFO: Processing rabbitmq_cluster[[{"name":"rabbit@us1a-rmq1-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq2-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq3-kamehameha.domain.tld","type":"disc"}]] action set_cluster_name (rabbitmq::cluster line 41)
[2018-08-08T23:21:31+00:00] INFO: var_cluster_status: [{nodes,[{disc,['rabbit@us1a-rmq1-kamehameha']}]},{running_nodes,['rabbit@us1a-rmq1-kamehameha']},{cluster_name,<<"rabbit@us1a-rmq1-kamehameha.anaplan.io">>},{partitions,[]},{alarms,[{'rabbit@us1a-rmq1-kamehameha',[]}]}]
[2018-08-08T23:21:31+00:00] INFO: var_cluster_name:


    ================================================================================
    Error executing action `set_cluster_name` on resource 'rabbitmq_cluster[[{"name":"rabbit@us1a-rmq1-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq2-kamehameha.domain.tld","type":"disc"},{"name":"rabbit@us1a-rmq3-kamehameha.domain.tld","type":"disc"}]]'
    ================================================================================

    NoMethodError
    -------------
    undefined method `empty?' for nil:NilClass

    Cookbook Trace:
    ---------------
    /var/chef/cache/cookbooks/rabbitmq/providers/cluster.rb:244:in `block in class_from_file'

    Resource Declaration:
    ---------------------
    # In /var/chef/cache/cookbooks/rabbitmq/recipes/cluster.rb

     41: rabbitmq_cluster static_cluster_nodes do
     42:   cluster_name node['rabbitmq']['clustering']['cluster_name']
     43:   action [:set_cluster_name, :change_cluster_node_type]
     44: end

I had some debugging added there and noticed this:

# Action for set cluster name
action :set_cluster_name do
  Chef::Application.fatal!('rabbitmq_cluster with action :join requires a non-nil/empty cluster_nodes.') if new_resource.cluster_nodes.nil? || new_resource.cluster_nodes.empty?
  var_cluster_status = cluster_status
  var_cluster_name = new_resource.cluster_name
  Chef::Log.info("var_cluster_status: #{var_cluster_status}")
  Chef::Log.info("var_cluster_name: #{var_cluster_name}")
  if current_cluster_name(var_cluster_status).nil?
    Chef::Log.warn('[rabbitmq_cluster] Currently not a cluster. Set cluster name will be skipped.')
  else
    unless current_cluster_name(var_cluster_status) == var_cluster_name
244: >>>      unless var_cluster_name.empty? <<<
        run_rabbitmqctl("set_cluster_name #{var_cluster_name}")
        Chef::Log.info("[rabbitmq_cluster] Cluster name has been set : #{current_cluster_name(cluster_status)}")
      end
    end
  end
end

Seems that it caught empty here on var_cluster_name. Not sure why but after setting a cluster_name once more I'm able to see the clustering attempt on 5.6.2

@michaelklishin
Copy link
Member

I think this has been addressed in master but I'm not yet sure if it can be backported. As you correctly pointed out, name attributes cannot be anything other than a String, at least in Chef 13.

What that means for us is that if we change the type to be what's actually expected, it can be potentially breaking. Cluster name doesn't bear any significant meaning in RabbitMQ itself but plugins and operator tools can use them as identifiers :( I need to at least research what tier 1 (core) plugins do and we'll have to provide a detailed change log entry.

@michaelklishin
Copy link
Member

This was a great proof to me that the cluster provider is insanely complicated and dependent on existing cluster state in some ways. So in 6.0 I think it should only support peer discovery. This way RabbitMQ can (and already does) handle some of the complications and how to retry/reconcile cluster state.

@mengesb
Copy link
Author

mengesb commented Aug 10, 2018

@michaelklishin turns out this is largely a documentation issue and order of operations issue.

If you wouldn't mind I can make some updates to the README.md in a PR so that hopefully this red-herring is no longer experienced

@mengesb
Copy link
Author

mengesb commented Aug 10, 2018

FYI - i did get to a working state with 5.6.2 - would you mind publishing that to the supermarket please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants