Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nuage Installation breaks during serviceaccount creation #3583

Closed
vishpat opened this issue Mar 7, 2017 · 22 comments
Closed

Nuage Installation breaks during serviceaccount creation #3583

vishpat opened this issue Mar 7, 2017 · 22 comments

Comments

@vishpat
Copy link
Contributor

vishpat commented Mar 7, 2017

<HTPASSWD_AUTH>

We are aware of the current issues related to htpasswd_auth failures
Please downgrade to ansible 2.2.0.0 until a fix is released.
You can track the status of the bug fix in this issue:
#3111
Please erase this <HTPASSWD_AUTH> section if it does not apply to you.

Thanks - 2017-01-31

</HTPASSWD_AUTH>

Description

Installation of Openshift with Nuage is currently broken due to service account changes

Version

ansible 2.2.0.0
config file = /home/vpati011/projects/openshift-ansible/ansible.cfg
configured module search path = Default w/o overrides

[vpati011@radcnode007 openshift-ansible]$ git describe
openshift-ansible-3.4.17-1-1169-gf017f5a

VERSION INFORMATION HERE PLEASE
Steps To Reproduce

Perform HA install of Openshift along with Nuage as the SDN provider

Expected Results
Observed Results

u'cmd': u'/usr/bin/oc create -f /tmp/nuage-CulDIi -n default', u'returncode': 1, u'results': {}, u'stderr': u'error: error when creating "/tmp/nuage-CulDIi": Post https://dns.nuageopenshift.com:8443/api/v1/namespaces/default/serviceaccounts: EOF\n', u'stdout': u''}

2017-03-07 11:27:48,250 p=2634 u=vpati011 | fatal: [master1.nuageopenshift.com]: FAILED! => {
"changed": false,
"failed": true,
"invocation": {
"module_args": {
"debug": false,
"image_pull_secrets": null,
"kubeconfig": "/etc/origin/master/admin.kubeconfig",
"name": "nuage",
"namespace": "default",
"secrets": null,
"state": "present"
},
"module_name": "oc_serviceaccount"
}
}

Example com
[ansible.log.txt](https://github.com/openshift/openshift-ansible/files/825108/ansible.log.txt)
[nodes.ha.vagrant.txt](https://github.com/openshift/openshift-ansible/files/825109/nodes.ha.vagrant.txt)

mand and output or error messages

For long output or logs, consider using a gist

Additional Information
EXTRA INFORMATION GOES HERE
@vishpat
Copy link
Contributor Author

vishpat commented Mar 7, 2017

@vishpat vishpat changed the title Nuage Installation is broken Nuage Installation breaks during serviceaccount creation Mar 7, 2017
@vishpat
Copy link
Contributor Author

vishpat commented Mar 7, 2017

@ashcrow Any idea how can I debug this further ?

@ashcrow
Copy link
Member

ashcrow commented Mar 7, 2017

@vishpat I'm taking a look at the output now.

@ashcrow
Copy link
Member

ashcrow commented Mar 7, 2017

@vishpat the following stands out to me:

'error: error when creating "/tmp/nuage-CulDIi

This seems to indicate that ansible was unable to write the temporary file.

Here is the code reference. Search for create_tmpfile.

Can you verify that your user as access to the kubeconfig that is referenced in the output and that it is able to create files in /tmp/?

@ashcrow
Copy link
Member

ashcrow commented Mar 7, 2017

/cc @kwoodson

@vishpat
Copy link
Contributor Author

vishpat commented Mar 8, 2017

@ashcrow I tried an install with an older release (release-1.4) on the same setup and it went through fine. I don't think there any permission issues on /tmp directory

@ashcrow
Copy link
Member

ashcrow commented Mar 8, 2017

@vishpat After more review u'error: error when creating "/tmp/nuage-CulDI is coming back from OpenShift. Generally there is more data given back provided back from the server. For instance, noting that the user can not make a new secret. Can you verify the temporary file it created is valid (just in case)?

The following will produce more debug output for your instance:

diff --git a/roles/nuage_master/tasks/serviceaccount.yml b/roles/nuage_master/tasks/serviceaccount.yml
index 16ea082..54e6dfe 100644
--- a/roles/nuage_master/tasks/serviceaccount.yml
+++ b/roles/nuage_master/tasks/serviceaccount.yml
@@ -18,6 +18,7 @@
     name: nuage
     namespace: default
     state: present
+    debug: True
 
 - name: Configure role/user permissions
   command: >

@kwoodson any ideas?

@vishpat
Copy link
Contributor Author

vishpat commented Mar 9, 2017

@ashcrow Have attached the log file with debug set to True

ansible.log.txt

@vishpat
Copy link
Contributor Author

vishpat commented Mar 9, 2017

I think the issue is the failure

"Post https://dns.nuageopenshift.com:8443/api/v1/namespaces/default/serviceaccounts: EOF"

I reverted creating the service account the old way, and I am hitting the above error.

@ashcrow
Copy link
Member

ashcrow commented Mar 9, 2017

@vishpat I agree. Just to verify, when you say 'the old way' you mean using the nuage ansible code from before the change to oc_serviceaccount, correct?

Can you provide the the log from atomic-openshift-master-api.service? @kwoodson had the idea the error may be more descriptive there since it does look API side.

@vishpat
Copy link
Contributor Author

vishpat commented Mar 9, 2017

@ashcrow @abutcher I think I know what the issue is.

The nuage components are being brought up before the openshift api server is ready. I think with change d113f03 the order in which things are brought up has changed. I am trying out a fix where we bring the nuage components up after the API server is ready.

@ashcrow
Copy link
Member

ashcrow commented Mar 9, 2017

@vishpat interesting. Keep us posted!

If you are on Freenode feel free to reach out to us there.

@vishpat
Copy link
Contributor Author

vishpat commented Mar 9, 2017

@ashcrow Unfortunately IRC is blocked at my workplace.

btw the change in the order is indeed the cause of the issue. I tried moving the nuage component install in playbooks/common/openshift-cluster/additional_config.yml However the issue is, in case of HA configuration the nuage components get installed only on one of the master nodes, instead of all of the master nodes.

My changes in additional_config.yml were as follows

- role: nuage_master
  when: openshift.common.use_nuage | bool

@vishpat
Copy link
Contributor Author

vishpat commented Mar 9, 2017

I think I need to use "oo_masters_to_config" as the hosts in my case

@ashcrow
Copy link
Member

ashcrow commented Mar 9, 2017

@vishpat I spoke @abutcher and it sounds like we can pull the nuage related items from d113f03 back to the playbook level. I'll put a PR together for that. Would you mind testing once it's ready?

ashcrow added a commit to ashcrow/openshift-ansible that referenced this issue Mar 9, 2017
d113f03 moved role dependencies out of playbooks. However, this ended up
causing the masters to not be configured before the nuage steps required
configured masters. This change moves the nuage specific change in
d113f03 back to the config.

Resolves openshift#3583
@ashcrow
Copy link
Member

ashcrow commented Mar 9, 2017

@vishpat see #3615

@vishpat
Copy link
Contributor Author

vishpat commented Mar 9, 2017

@ashcrow I will test this out and let you know. Many thanks for this.

@ashcrow
Copy link
Member

ashcrow commented Mar 9, 2017

@vishpat my pleasure! 😄

@vishpat
Copy link
Contributor Author

vishpat commented Mar 9, 2017

@ashcrow the install failed but it is not because of your changes. I know what the issue is, I need to run the service account commands on only one master node. I will make those changes and re-run your PR.

ansible.log.txt

@vishpat
Copy link
Contributor Author

vishpat commented Mar 10, 2017

@ashcrow The fix works !!! I need to make some changes to the nuage roles, I will create for it once you merge these changes in. Really appreciate your help with this.

@ashcrow
Copy link
Member

ashcrow commented Mar 10, 2017

Great! I'll update the PR for merging.

Kudos to @kwoodson and @abutcher for debugging and brainstorming!

ashcrow added a commit to ashcrow/openshift-ansible that referenced this issue Mar 10, 2017
d113f03 moved role dependencies out of playbooks. However, this ended up
causing the masters to not be configured before the nuage steps required
configured masters. This change moves the nuage specific change in
d113f03 back to the config.

Resolves openshift#3583
@ashcrow
Copy link
Member

ashcrow commented Mar 10, 2017

@vishpat PR merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants