Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After successful initial run, subsequent runs blow up at server_extend #55

Open
donovanmuller opened this issue Feb 9, 2016 · 16 comments
Labels

Comments

@donovanmuller
Copy link

Initial run of gluster::server is successful. Volume created and started.
When gluster::server runs again, the following gets vomited out:

NoMethodError
-------------
private method `select' called for nil:NilClass

...

Relevant File Content:
----------------------
/var/chef/cache/cookbooks/gluster/recipes/server_extend.rb:

   17:        next
   18:      end
   19:
   20:      unless node.default['gluster']['server']['volumes'][volume_name].attribute?('bricks_waiting_to_join')
   21:        node.default['gluster']['server']['volumes'][volume_name]['bricks_waiting_to_join'] = ''
   22:      end
   23:
   24>>     peer_bricks = chef_node['gluster']['server']['volumes'][volume_name]['bricks'].select { |brick| brick.include? volume_name }
   25:      brick_count += (peer_bricks.count || 0)
   26:      peer_bricks.each do |brick|
   27:        Chef::Log.info("Checking #{peer}:#{brick}")
   28:        unless brick_in_volume?(peer, brick, volume_name)
   29:          node.default['gluster']['server']['volumes'][volume_name]['bricks_waiting_to_join'] << " #{peer}:#{brick}"
   30:        end
   31:      end
   32:    end
   33:
@donovanmuller
Copy link
Author

@shortdudey123
Copy link
Owner

It is set here: https://github.com/shortdudey123/chef-gluster/blob/master/recipes/server_setup.rb#L52

Can you verify node['gluster']['server']['volumes']['ose3-vol']['peers'] contains the FQDN or hostname of the node?

@donovanmuller
Copy link
Author

It does, I left it unexpanded for the screenshot but it was definitely populated.

@shortdudey123
Copy link
Owner

Can you post the context of the failed run? (not just the exception)

@andyrepton
Copy link
Collaborator

Hi @donovanmuller!

Sorry to hear you are having issues. This appears that your node is trying to load another chef-client that doesn't have that attribute set. Could you please confirm that the same cookbook was run on all nodes that are in your peer list, that the chef node name is the same as the peer name that gluster is using (sometimes when the chef node name is an FQDN and not a hostname or vice versa this can cause a problem like this).

What would really help is the output of your node['gluster']['server']['volumes'] entry in your cookbook attributes file, and the attribute node['gluster']['server']['volumes']['ose3-vol'] from each of your peers.

Thanks in advance!

Andy

@donovanmuller
Copy link
Author

@Seth-Karlo Below is my complete gluster attributes:

default['gluster']['version'] = '3.7'
default['gluster']['server']['brick_mount_path'] = '/data'
default['gluster']['server']['disks'] = []
default['gluster']['server']['volumes'] = {
  'ose3' => {
    'peers' => ['master01.bison.pi.b','node01.bison.pi.b'],
    'replica_count' => 2,
    'volume_type' => 'replicated',
    'disks' => ['/dev/sda4'],
    'size' => '10G'
  }
}

master01-attr

node02-attr

Is there anything else you need?

@andyrepton
Copy link
Collaborator

Thank you for your report, I apologise for taking so long to respond. I'll see if I can reproduce at this end and get back to you.

@alez007
Copy link

alez007 commented Apr 12, 2016

Any news about this ? I'm experiencing the same problem on opsworks

@shortdudey123
Copy link
Owner

@alez007 can you verify the cookbook version you are using so that we make sure we are looking at the same thing?

@andyrepton
Copy link
Collaborator

I'm pretty confident this is caused by chef_node not being set. I've been a bit distracted lately, but I'll try and look into this.

@laurencepettitt
Copy link

I am using OpsWorks and experiencing this problem, I am wondering if it could be OpsWorks' fault and the way it updates the cookbooks on each node such that every time the "custom cookbooks" are updated, it wipes the node's attributes?

@shortdudey123
Copy link
Owner

@LorenzoPetite possibly? i don't use OpsWorks and am not too familiar with it
@Seth-Karlo you use OpsWorks at all and might be able to shed light here?

@andyrepton
Copy link
Collaborator

@shortdudey123 @LorenzoPetite Sorry no, I've never used Opsworks before. We could possibly test this by adding some echo statements into the cookbook in print out those attributes during compile time. If they report as empty we can then start looking into whether or not they are set properly.

@laurencepettitt
Copy link

laurencepettitt commented Jul 18, 2016

Following @Seth-Karlo's suggestion, I tested with some echo statements. In the server_setup recipe, i found that:
node['gluster']['server']['volumes'][volume_name]['bricks']
produces: ["/gluster/servu/brick"]

However in the server_extend recipe, the reason chef_node['gluster']['server']['volumes'][volume_name]['bricks'] causes an error undefined method '[]' for nil:NilClass is because chef_node['gluster'] is somehow nil. Strangely, echoing chef_node produces node[gluster1]

I realise now this is actually a slightly different error than @donovanmuller's, but in both cases there seems to be a problem with attribute persistence.

How could this be possible?

@theundefined
Copy link
Contributor

chef_node - iterates over all nodes in cluster. So - it won't iterate when on any node bricks are empty. I have the same problem on one of my test environment. I'm not sure but it can be connected with any chef error during setup cluster, when bricks aren't propagated to chef server, methinks.

@wndhydrnt
Copy link

I stumbled upon this today too.
On the initial run of the chef-client, the cookbook failed due to an error in the configuration on my side. The chef-client was able to create the volume on the first run though. Executing knife node show <NODE NAME> -a gluster confirmed that ['gluster]['server']['volumes']['myvolume']['bricks'] was empty.
Subsequent runs of chef-client failed with the error stated in the first comment of this issue.
As far as I know a chef-client persists its attributes on the Chef server only after a successful run. No run of the chef-client completed successfully so the bricks attribute can never be saved.

My workaround was to set ['gluster']['server']['server_extend_enabled'] to false, trigger a run of the chef-client (which succeeded) and set ['gluster']['server']['server_extend_enabled'] back to true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants