Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASB deployment fails #585

Closed
matzew opened this issue Dec 9, 2017 · 13 comments
Closed

ASB deployment fails #585

matzew opened this issue Dec 9, 2017 · 13 comments
Assignees
Labels
3.10 | release-1.2 Kubernetes 1.10 | Openshift 3.10 | Broker release-1.2

Comments

@matzew
Copy link
Member

matzew commented Dec 9, 2017

Bug:

What happened:

The instructions for 3.7.0 are buggy, since the latest is now mapping to the v3.9.0-alpha images.

Running the suggested snippet:

wget https://raw.githubusercontent.com/openshift/ansible-service-broker/master/scripts/run_latest_build.sh
chmod +x run_latest_build.sh
./run_latest_build.sh

Provisions the system, but as the server it's using v3.9.0-alpha, which is buggy:

➜  ~ oc version
oc v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.9.0-alpha.0+892ae5d-78
kubernetes v1.8.1+0d5291c

Logs from the asb-etcd-1-z6zrc pod:

2017-12-09 09:00:05.754924 I | etcdmain: etcd Version: 3.2.11
2017-12-09 09:00:05.755031 I | etcdmain: Git SHA: 1e1dbb2
2017-12-09 09:00:05.755040 I | etcdmain: Go Version: go1.8.5
2017-12-09 09:00:05.755043 I | etcdmain: Go OS/Arch: linux/amd64
2017-12-09 09:00:05.755046 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
2017-12-09 09:00:05.755694 I | embed: listening for peers on http://localhost:2380
2017-12-09 09:00:05.755738 I | embed: listening for client requests on 0.0.0.0:2379
2017-12-09 09:00:05.758439 I | etcdserver: name = default
2017-12-09 09:00:05.758454 I | etcdserver: data dir = /data
2017-12-09 09:00:05.758460 I | etcdserver: member dir = /data/member
2017-12-09 09:00:05.758465 I | etcdserver: heartbeat = 100ms
2017-12-09 09:00:05.758471 I | etcdserver: election = 1000ms
2017-12-09 09:00:05.758474 I | etcdserver: snapshot count = 100000
2017-12-09 09:00:05.758482 I | etcdserver: advertise client URLs = https://0.0.0.0:2379
2017-12-09 09:00:05.758493 I | etcdserver: initial advertise peer URLs = http://localhost:2380
2017-12-09 09:00:05.758500 I | etcdserver: initial cluster = default=http://localhost:2380
2017-12-09 09:00:05.764514 I | etcdserver: starting member 8e9e05c52164694d in cluster cdf818194e3a8c32
2017-12-09 09:00:05.764542 I | raft: 8e9e05c52164694d became follower at term 0
2017-12-09 09:00:05.764558 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2017-12-09 09:00:05.764562 I | raft: 8e9e05c52164694d became follower at term 1
2017-12-09 09:00:05.773799 W | auth: simple token is not cryptographically signed
2017-12-09 09:00:05.776735 I | etcdserver: starting server... [version: 3.2.11, cluster version: to_be_decided]
2017-12-09 09:00:05.776767 I | embed: ClientTLS: cert = /etc/tls/private/tls.crt, key = /etc/tls/private/tls.key, ca = , trusted-ca = /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt, client-cert-auth = true
2017-12-09 09:00:05.777162 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2017-12-09 09:00:06.665003 I | raft: 8e9e05c52164694d is starting a new election at term 1
2017-12-09 09:00:06.665060 I | raft: 8e9e05c52164694d became candidate at term 2
2017-12-09 09:00:06.665071 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2
2017-12-09 09:00:06.665081 I | raft: 8e9e05c52164694d became leader at term 2
2017-12-09 09:00:06.665086 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2
2017-12-09 09:00:06.665238 I | etcdserver: setting up the initial cluster version to 3.2
2017-12-09 09:00:06.666143 N | etcdserver/membership: set the initial cluster version to 3.2
2017-12-09 09:00:06.666176 I | etcdserver/api: enabled capabilities for version 3.2
2017-12-09 09:00:06.666197 I | etcdserver: published {Name:default ClientURLs:[https://0.0.0.0:2379]} to cluster cdf818194e3a8c32
2017-12-09 09:00:06.666207 I | embed: ready to serve client requests
2017-12-09 09:00:06.670495 I | etcdserver/api/v3rpc: dialing to target with scheme: ""
2017-12-09 09:00:06.670513 I | etcdserver/api/v3rpc: could not get resolver for scheme: ""
2017-12-09 09:00:06.670661 I | embed: serving client requests on [::]:2379
2017-12-09 09:00:06.680088 I | etcdserver/api/v3rpc: Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

Logs from the asb-1-deploy pod:

error: update acceptor rejected asb-1: pods for rc 'ansible-service-broker/asb-1' took longer than 600 seconds to become available

using ORIGIN_VERSION=v3.7.0

Running the script with the v3.7.0 label:

ORIGIN_VERSION=v3.7.0 ./run_latest_build.sh 

I am getting these errors, on provision:

Starting OpenShift using docker.io/openshift/origin:v3.7.0 ...
Pulling image docker.io/openshift/origin:v3.7.0
Pulled 2/4 layers, 53% complete
Pulled 3/4 layers, 96% complete
Pulled 4/4 layers, 100% complete
Extracting
Image pull complete
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for docker.io/openshift/origin:v3.7.0 image ... 
   Pulling image docker.io/openshift/origin:v3.7.0
   Pulled 2/4 layers, 53% complete
   Pulled 3/4 layers, 96% complete
   Pulled 4/4 layers, 100% complete
   Extracting
   Image pull complete
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... 
   WARNING: Binding DNS on port 8053 instead of 53, which may not be resolvable from all clients.
-- Checking type of volume mount ... 
   Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ... 
   Using 127.0.0.1 as the server IP
-- Checking service catalog version requirements ... OK
-- Starting OpenShift container ... 
   Creating initial OpenShift configuration
   Starting OpenShift using container 'origin'
   Waiting for API server to start listening
   OpenShift server started
-- Adding default OAuthClient redirect URIs ... OK
-- Installing registry ... 
   scc "privileged" added to: ["system:serviceaccount:default:registry"]
-- Installing router ... OK
-- Importing image streams ... OK
-- Importing templates ... OK
-- Importing service catalog templates ... OK
-- Installing service catalog ... FAIL
   Error: failed to start the service catalog apiserver: timed out waiting for the condition
Logged into "https://127.0.0.1:8443" as "system:admin" using existing credentials.

You have access to the following projects and can switch between them with 'oc project <projectname>':

  * default
    kube-public
    kube-service-catalog
    kube-system
    openshift
    openshift-infra
    openshift-node

Using project "default".
Now using project "ansible-service-broker" on server "https://127.0.0.1:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.
service "asb" created
service "asb-etcd" created
serviceaccount "asb" created
clusterrolebinding "asb" created
clusterrole "asb-auth" created
clusterrolebinding "asb-auth-bind" created
clusterrole "access-asb-role" created
persistentvolumeclaim "etcd" created
deploymentconfig "asb" created
deploymentconfig "asb-etcd" created
secret "asb-auth-secret" created
secret "registry-auth-secret" created
secret "etcd-auth-secret" created
secret "broker-etcd-auth-secret" created
configmap "broker-config" created
serviceaccount "ansibleservicebroker-client" created
clusterrolebinding "ansibleservicebroker-client" created
secret "ansibleservicebroker-client" created
route "asb-1338" created
error: unable to recognize servicecatalog.k8s.io/v1beta1, Kind=Broker: no matches for servicecatalog.k8s.io/, Kind=Broker
Error processing template and creating deployment

What you expected to happen:

a working ASB 😄

How to reproduce it:

running the scripts, as above

@jmontleon
Copy link
Contributor

The script deploys 3.7 or 3.9 for me OK. I see similar etcd messages with either 3.7 or 3.9.

2017-12-11 16:29:05.486190 I | etcdserver/api: enabled capabilities for version 3.2
2017-12-11 16:29:05.486214 I | embed: ready to serve client requests
2017-12-11 16:29:05.486381 I | etcdserver: published {Name:default ClientURLs:[https://0.0.0.0:2379]} to cluster cdf818194e3a8c32
2017-12-11 16:29:05.766329 I | etcdserver/api/v3rpc: dialing to target with scheme: ""
2017-12-11 16:29:05.766385 I | etcdserver/api/v3rpc: could not get resolver for scheme: ""
2017-12-11 16:29:05.767332 I | embed: serving client requests on [::]:2379
2017-12-11 16:29:05.778078 I | etcdserver/api/v3rpc: Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.
$ oc cluster down
$ docker cp $(docker create docker.io/openshift/origin:v3.7.0):/bin/oc ~/bin
$ ORIGIN_VERSION=v3.7.0 ./run_latest_build.sh 
Starting OpenShift using docker.io/openshift/origin:v3.7.0 ...
OpenShift server started.

The server is accessible via web console at:
    https://172.18.0.1.nip.io:8443

You are logged in as:
    User:     developer
    Password: <any value>

To login as administrator:
    oc login -u system:admin

Logged into "https://127.0.0.1:8443" as "system:admin" using existing credentials.

You have access to the following projects and can switch between them with 'oc project <projectname>':

    default
    kube-public
    kube-service-catalog
    kube-system
  * myproject
    openshift
    openshift-infra
    openshift-node
    openshift-template-service-broker

Using project "myproject".
Now using project "ansible-service-broker" on server "https://127.0.0.1:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.
Generating a 4096 bit RSA private key
......................................................++
...........++
writing new private key to '/tmp/etcd-cert/key.pem'
-----
Generating RSA private key, 2048 bit long modulus
.............................................................................................................................+++
................................................................................................................+++
e is 65537 (0x010001)
Signature ok
subject=CN = client
Getting CA Private Key
service "asb" created
service "asb-etcd" created
serviceaccount "asb" created
clusterrolebinding "asb" created
clusterrole "asb-auth" created
clusterrolebinding "asb-auth-bind" created
clusterrole "access-asb-role" created
persistentvolumeclaim "etcd" created
deploymentconfig "asb" created
deploymentconfig "asb-etcd" created
secret "asb-auth-secret" created
secret "registry-auth-secret" created
secret "etcd-auth-secret" created
secret "broker-etcd-auth-secret" created
configmap "broker-config" created
serviceaccount "ansibleservicebroker-client" created
clusterrolebinding "ansibleservicebroker-client" created
secret "ansibleservicebroker-client" created
route "asb-1338" created
clusterservicebroker "ansible-service-broker" created

@jmontleon
Copy link
Contributor

i see the same etcd error using catasb but it's not breaking anything. I think we may have a misconfiguration with CA certs (still trying to figure out what), but it doesn't seem to be affecting functionality.

@matzew matzew closed this as completed Dec 11, 2017
@matzew
Copy link
Member Author

matzew commented Dec 11, 2017

I don't know why / how

but, I after I did delete the contents of my /var/lib/origin/** folder, it worked again ...

@slaterx
Copy link

slaterx commented Jan 22, 2018

Reproducible on 3.7 environment when broker is created using openshift-ansible playbook:

[root@example-master ~]# oc version
oc v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://openshift.example.com
openshift v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62

I can confirm on tls secrets that both asb and asb-etcd are created under the same CA.

@eriknelson eriknelson reopened this Jan 23, 2018
@eriknelson
Copy link
Contributor

@jmontleon sounds like this may still be an issue we gotta look into.

@eriknelson eriknelson self-assigned this Jan 23, 2018
@rthallisey rthallisey added bug 3.9 | release-1.1 Kubernetes 1.9 | Openshift 3.9 | Broker release-1.1 3.10 | release-1.2 Kubernetes 1.10 | Openshift 3.10 | Broker release-1.2 and removed 3.9 | release-1.1 Kubernetes 1.9 | Openshift 3.9 | Broker release-1.1 labels Jan 23, 2018
@shawn-hurley
Copy link
Contributor

I think this is related to this etcd-io/etcd#8603

@djzager djzager closed this as completed Mar 22, 2018
@djzager djzager reopened this Mar 22, 2018
@leifmadsen
Copy link
Member

I'm running into the same thing now with a new OpenShift Origin 3.9 deploy. I have the asb-etcd working fine (the PVC is bound in my glusterfs cluster), but for some reason, the asb-1 won't start up correctly, resulting in failure of the controller-manager.

@leifmadsen
Copy link
Member

OK, in my case this issue was a configuration issue. I had openshift_service_catalog_image_version set to latest and not v3.9 which caused the v3.10 alpha image to be pulled instead. No issue on my side once I fixed it.

@jmrodri
Copy link
Contributor

jmrodri commented May 17, 2018

Closing issue fixed in current release. Please re-open if this happens again in a future release.

@jmrodri jmrodri closed this as completed May 17, 2018
@ron1
Copy link

ron1 commented May 31, 2018

I encountered this same problem with OpenShift Origin 3.9. As reported above by matzew, the problem disappeared after I attempted to delete the contents of my /var/lib/origin/** folder and then rebooted.

@wenchma
Copy link

wenchma commented Jul 10, 2018

I encountered this same issue(deploy failed) as above with origin 3.9.0:

# oc logs asb-1-deploy
--> Scaling asb-1 to 1
error: update acceptor rejected asb-1: pods for rc 'openshift-ansible-service-broker/asb-1' took longer than 600 seconds to become available
# oc logs asb-etcd-1-deploy
--> Scaling asb-etcd-1 to 1
error: update acceptor rejected asb-etcd-1: pods for rc 'openshift-ansible-service-broker/asb-etcd-1' took longer than 600 seconds to become available

@djzager
Copy link
Member

djzager commented Jul 10, 2018

@wenchma We aren't supporting the run_latest_build.sh any longer. Here is me bringing up the broker in 3.9:

➜  ~ oc version
oc v3.9.0+191fece
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://172.17.0.1:8443
openshift v3.9.0+71543b2-33
kubernetes v1.9.1+a0ce1bc657

➜  ~ curl https://raw.githubusercontent.com/openshift/ansible-service-broker/master/scripts/run_latest_build.sh
#!/bin/bash

echo "========================================================================"
echo "                RUN LATEST BUILD IS NO LONGER SUPPORTED"
echo "========================================================================"
echo ""
echo "To install the broker, please use our apb/install.yaml. For example:"
echo "   curl https://raw.githubusercontent.com/openshift/ansible-service-broker/master/apb/install.yaml | kubectl create -f -"

➜  ~ curl https://raw.githubusercontent.com/openshift/ansible-service-broker/master/apb/install.yaml | kubectl create -f -
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   866  100   866    0     0   3918      0 --:--:-- --:--:-- --:--:--  3918
namespace "automation-broker-apb" created
serviceaccount "automation-broker-apb" created
clusterrolebinding "automation-broker-apb" created
pod "automation-broker-apb" created

➜  ~ kubectl logs -n automation-broker-apb automation-broker-apb -f
PLAY [automation-broker-apb provision] *****************************************
TASK [automation-broker-apb : Set facts] ***************************************
ok: [localhost]
TASK [automation-broker-apb : Debug important facts] ***************************
ok: [localhost] => {
    "msg": [
        "Cluster: openshift",
        "broker_auto_escalate False",
        "broker_local_openshift_enabled True"
    ]
}
TASK [automation-broker-apb : Set broker namespace state=present] **************
changed: [localhost]
TASK [automation-broker-apb : Verify preconditions] ****************************
ok: [localhost] => {
    "changed": false,
    "msg": "All assertions passed"
}
TASK [automation-broker-apb : include_tasks] ***********************************
included: /opt/ansible/roles/automation-broker-apb/tasks/dao_crd.yaml for localhost
TASK [automation-broker-apb : Set broker clusterresourcedefinitions state=present] ***
changed: [localhost] => (item=bundle.crd.yaml)
changed: [localhost] => (item=bundlebindings.crd.yaml)
changed: [localhost] => (item=bundleinstances.crd.yaml)
TASK [automation-broker-apb : include_tasks] ***********************************
skipping: [localhost]
TASK [automation-broker-apb : Set broker objects state=present] ****************
changed: [localhost] => (item={u'name': u'broker.service.yaml'})
changed: [localhost] => (item={u'apply': True, u'name': u'broker.route.yaml'})
changed: [localhost] => (item={u'name': u'broker.serviceaccount.yaml'})
changed: [localhost] => (item={u'name': u'broker.clusterrolebinding.yaml'})
changed: [localhost] => (item={u'name': u'broker.configmap.yaml'})
changed: [localhost] => (item={u'name': u'broker-auth.clusterrole.yaml'})
changed: [localhost] => (item={u'name': u'broker-auth.clusterrolebinding.yaml'})
changed: [localhost] => (item={u'name': u'broker-client.serviceaccount.yaml'})
changed: [localhost] => (item={u'name': u'broker-client.secret.yaml'})
changed: [localhost] => (item={u'name': u'broker-client.clusterrolebinding.yaml'})
changed: [localhost] => (item={u'name': u'broker-access.clusterrole.yaml'})
skipping: [localhost] => (item={u'apply': False, u'name': u'broker-auth.secret.yaml'})
changed: [localhost] => (item={u'name': u'broker.deployment.yaml'})
changed: [localhost] => (item={u'name': u'broker.servicecatalog.yaml'})
TASK [automation-broker-apb : Wait for clusterservicebroker to become ready] ***
skipping: [localhost]
PLAY RECAP *********************************************************************
localhost                  : ok=7    changed=3    unreachable=0    failed=0

➜  ~ oc get all -n automation-broker
NAME                                  REVISION   DESIRED   CURRENT   TRIGGERED BY
deploymentconfigs/automation-broker   1          1         1         config

NAME         HOST/PORT                                 PATH      SERVICES            PORT        TERMINATION   WILDCARD
routes/asb   asb-automation-broker.172.17.0.1.nip.io             automation-broker   port-1338   reencrypt     None

NAME                           READY     STATUS    RESTARTS   AGE
po/automation-broker-1-wjqmz   1/1       Running   0          11m

NAME                     DESIRED   CURRENT   READY     AGE
rc/automation-broker-1   1         1         1         11m

NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
svc/automation-broker   ClusterIP   172.30.138.190   <none>        1338/TCP   12m

@tech-mint
Copy link

https://raw.githubusercontent.com/openshift/ansible-service-broker/master/apb/install.yaml

the link is no longer valid

is there any better guide to help break through ASB installation on OKD 3.9?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 | release-1.2 Kubernetes 1.10 | Openshift 3.10 | Broker release-1.2
Projects
None yet
Development

No branches or pull requests