Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a new DeploymentConfig with replicas=1 creates a ReplicationController with replicas=0 #9216

Closed
jstrachan opened this issue Jun 8, 2016 · 28 comments

Comments

@jstrachan
Copy link
Contributor

How do I create a DeploymentConfig with replicas=1 so that it actually creates a ReplicationController with replicas > 0?

This one had me confused for a while; I figured OpenShift was broken ;)

Version
# oc version
oc v1.3.0-alpha.1
kubernetes v1.3.0-alpha.1-331-g0522e63
Steps To Reproduce

Here's the YAML I'm using to create a DC

kind: DeploymentConfig
apiVersion: v1
metadata:
  name: funkything
  namespace: default-staging
  selfLink: /oapi/v1/namespaces/default-staging/deploymentconfigs/funkything
  uid: d4f6f9f4-2d48-11e6-9cc4-080027b5c2f4
  resourceVersion: '1616'
  generation: 2
  creationTimestamp: '2016-06-08T07:15:51Z'
  labels:
    group: io.fabric8.funktion.quickstart
    project: funkything
    provider: fabric8
    version: 1.0.3
  annotations:
    fabric8.io/build-url: 'http://jenkins.vagrant.f8/job/funky1/3'
    fabric8.io/git-branch: funky1-1.0.3
    fabric8.io/git-commit: 317304e59ce4fcac045c0b47ed5613196e36748d
    fabric8.io/git-url: >-
      http://gogs.vagrant.f8/gogsadmin/funky1/commit/317304e59ce4fcac045c0b47ed5613196e36748d
    fabric8.io/iconUrl: img/icons/funktion.png
spec:
  strategy:
    type: Rolling
    rollingParams:
      updatePeriodSeconds: 1
      intervalSeconds: 1
      timeoutSeconds: 600
      maxUnavailable: 25%
      maxSurge: 25%
    resources: {}
  triggers:
    - type: ConfigChange
  replicas: 1
  test: false
  selector:
    group: io.fabric8.funktion.quickstart
    project: funkything
    provider: fabric8
  template:
    metadata:
      creationTimestamp: null
      labels:
        group: io.fabric8.funktion.quickstart
        project: funkything
        provider: fabric8
        version: 1.0.3
    spec:
      containers:
        - name: quickstart-funkything
          image: 'quickstart/funkything:1.0.3'
          ports:
            - containerPort: 8080
              protocol: TCP
          env:
            - name: KUBERNETES_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
          resources: {}
          livenessProbe:
            httpGet:
              path: /health
              port: 8081
              scheme: HTTP
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /health
              port: 8081
              scheme: HTTP
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: false
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      securityContext: {}
status:
  latestVersion: 1
  details:
    causes:
      - type: ConfigChange
  observedGeneration: 2
Current Result

Here's the DC

$ oc get dc
NAME         REVISION   REPLICAS   TRIGGERED BY
funkything   1          1          config
$ oc get rc
NAME           DESIRED   CURRENT   AGE
funkything-1   0         0         11m
Expected Result
$ oc get dc
NAME         REVISION   REPLICAS   TRIGGERED BY
funkything   1          1          config
$ oc get rc
NAME           DESIRED   CURRENT   AGE
funkything-1   1         1        11m
Additional Information

I don't see any warnings/errors/events in openshift itself, the DC, RC or deploy pod to indicate why its not deciding to scale up the RC.

@jstrachan
Copy link
Contributor Author

BTW if I manually scale the RC it works fine and scales up a new pod. Just not sure the magic to make the DC make an RC of replicas=1 to start with?

@jstrachan
Copy link
Contributor Author

I tried test: false and test: true based on the docs of the field just in case but no difference

@jstrachan
Copy link
Contributor Author

jstrachan commented Jun 8, 2016

If its any help here's the actual YAML used to create the DC without the status stuff etc

---
apiVersion: "v1"
items:
- apiVersion: "v1"
  kind: "Service"
  metadata:
    annotations:
      fabric8.io/iconUrl: "img/icons/funktion.png"
    labels:
      project: "funky1"
      provider: "fabric8"
      version: "1.0.2"
      group: "io.fabric8.funktion.quickstart"
    name: "funky1"
  spec:
    ports:
    - port: 8080
      protocol: "TCP"
      targetPort: 8080
    selector:
      project: "funky1"
      provider: "fabric8"
      group: "io.fabric8.funktion.quickstart"
    type: "LoadBalancer"
- apiVersion: "v1"
  kind: "DeploymentConfig"
  metadata:
    annotations:
      fabric8.io/iconUrl: "img/icons/funktion.png"
    labels:
      project: "funky1"
      provider: "fabric8"
      version: "1.0.2"
      group: "io.fabric8.funktion.quickstart"
    name: "funky1"
  spec:
    replicas: 1
    selector:
      project: "funky1"
      provider: "fabric8"
      group: "io.fabric8.funktion.quickstart"
    template:
      metadata:
        labels:
          project: "funky1"
          provider: "fabric8"
          version: "1.0.2"
          group: "io.fabric8.funktion.quickstart"
      spec:
        containers:
        - env:
          - name: "KUBERNETES_NAMESPACE"
            valueFrom:
              fieldRef:
                fieldPath: "metadata.namespace"
          image: "quickstart/funky1:1.0.2"
          imagePullPolicy: "IfNotPresent"
          livenessProbe:
            httpGet:
              path: "/health"
              port: 8081
          name: "quickstart-funky1"
          ports:
          - containerPort: 8080
            protocol: "TCP"
          readinessProbe:
            httpGet:
              path: "/health"
              port: 8081
          securityContext:
            privileged: false
    triggers:
    - type: "ConfigChange"
kind: "List"

@jstrachan
Copy link
Contributor Author

jstrachan commented Jun 8, 2016

Here's the log of the deploy pod:

--> Scaling funkything-1 to 1
error: couldn't scale funkything-1 to 1: timed out waiting for the condition

In case this helps:

$ oc describe dc funkything
Name:       funkything
Created:    4 minutes ago
Labels:     group=io.fabric8.funktion.quickstart,project=funkything,provider=fabric8,version=1.0.4
Annotations:    fabric8.io/build-url=http://jenkins.vagrant.f8/job/funky1/4
        fabric8.io/git-branch=funky1-1.0.4
        fabric8.io/git-commit=317304e59ce4fcac045c0b47ed5613196e36748d
        fabric8.io/git-url=http://gogs.vagrant.f8/gogsadmin/funky1/commit/317304e59ce4fcac045c0b47ed5613196e36748d
        fabric8.io/iconUrl=img/icons/funktion.png
Latest Version: 1
Selector:   group=io.fabric8.funktion.quickstart,project=funkything,provider=fabric8
Replicas:   1
Triggers:   Config
Strategy:   Rolling
Template:
  Labels:   group=io.fabric8.funktion.quickstart,project=funkything,provider=fabric8,version=1.0.4
  Containers:
  quickstart-funkything:
    Image:  quickstart/funkything:1.0.4
    Port:   8080/TCP
    QoS Tier:
      memory:   BestEffort
      cpu:  BestEffort
    Liveness:   http-get http://:8081/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:8081/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables:
      KUBERNETES_NAMESPACE:  (v1:metadata.namespace)
  No volumes.

Deployment #1 (latest):
    Name:       funkything-1
    Created:    4 minutes ago
    Status:     Failed
    Replicas:   0 current / 0 desired
    Selector:   deployment=funkything-1,deploymentconfig=funkything,group=io.fabric8.funktion.quickstart,project=funkything,provider=fabric8
    Labels:     group=io.fabric8.funktion.quickstart,openshift.io/deployment-config.name=funkything,project=funkything,provider=fabric8,version=1.0.4
    Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed

Events:
  FirstSeen LastSeen    Count   From                SubobjectPath   Type        Reason          Message
  --------- --------    -----   ----                -------------   --------    ------          -------
  4m        4m      1   {deploymentconfig-controller }          Normal      DeploymentCreated   Created new deployment "funkything-1" for version 1


$ oc describe rc funkything-1
Name:       funkything-1
Namespace:  default-staging
Image(s):   quickstart/funkything:1.0.4
Selector:   deployment=funkything-1,deploymentconfig=funkything,group=io.fabric8.funktion.quickstart,project=funkything,provider=fabric8
Labels:     group=io.fabric8.funktion.quickstart,openshift.io/deployment-config.name=funkything,project=funkything,provider=fabric8,version=1.0.4
Replicas:   0 current / 0 desired
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
No volumes.
No events.

@0xmichalis
Copy link
Contributor

There is nothing special about your deployment config. I run the template you provided and managed to get that pod up, albeit I don't have the image you are pointing to so I get ImagePullBackOff

[vagrant@localhost sample-app]$ oc create -f d.yaml 
service "funky1" created
deploymentconfig "funky1" created
[vagrant@localhost sample-app]$ oc get dc
NAME      REVISION   REPLICAS   TRIGGERED BY
funky1    1          1          config
[vagrant@localhost sample-app]$ oc get po
NAME              READY     STATUS              RESTARTS   AGE
funky1-1-deploy   0/1       ContainerCreating   0          5s
[vagrant@localhost sample-app]$ oc get rc
NAME       DESIRED   CURRENT   AGE
funky1-1   1         1         8s
[vagrant@localhost sample-app]$ oc get po
NAME              READY     STATUS              RESTARTS   AGE
funky1-1-deploy   1/1       Running             0          12s
funky1-1-jmzxi    0/1       ContainerCreating   0          4s
[vagrant@localhost sample-app]$ oc status
In project test on server https://10.0.2.15:8443

svc/funky1 - 172.30.68.38:8080
  dc/funky1 deploys docker.io/quickstart/funky1:1.0.2 
    deployment #1 running for 20 seconds - 1 pod

View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'.
[vagrant@localhost sample-app]$ oc logs -f dc/funky1
--> Scaling funky1-1 to 1
--> Waiting up to 10m0s for pods in deployment funky1-1 to become ready

The rc is always created with zero replicas and then handed off to the deployer pod that is responsible for scaling it up. There are a couple of cases where we can create a rc with replicas (#8315) but I don't think it's an issue atm. Are you able to run other pods in your environment? Can you post the output of oc get events and oc get pods?

@0xmichalis
Copy link
Contributor

Are you able to run other pods in your environment?

Nvmd, you can run pods. You can also manually scale as you said. The deployer pod should be able to scale your rc for you. Are you running anything else in the same namespace that may be interferring with the deployer?

@jstrachan
Copy link
Contributor Author

No, its in a namespace by itself with nothing else running at all.

FWIW others have managed to get it to scale up too; I've just no clue at all why the deployer times out for me and does nothing

@jstrachan
Copy link
Contributor Author

I suspect there's some issue - but the error message is totally hidden?

@jimmidyson
Copy link
Contributor

timed out waiting for the condition - so fix THE condition & all will be fine. Now the question is: what condition?

@0xmichalis
Copy link
Contributor

The timeout message comes from the scaler - agreed that it's super-cryptic and needs fixing.

@jstrachan
Copy link
Contributor Author

Yeah, if I knew why its timing out and what THE condition is it'd really help ;)

@0xmichalis
Copy link
Contributor

@jstrachan I opened kubernetes/kubernetes#27048 upstream for this and backported it in Origin in #9228. Can you pull my branch, rebuild both OpenShift and the deployer image and retest in your environment? It should help you debug your issue. Or you could wait for the upstream pull to merge (not anytime soon due to the 1.3 code freeze upstream).

@jstrachan
Copy link
Contributor Author

@Kargakis thanks!

So I tried building your branch. It failed after making the binaries (though I've never tried building from source openshift before).

I replaced the binaries in the vagrant VM I'm using to run origin from a binary distro and tried again.

Here's the logs of the deployer:

I0613 08:01:37.860803       1 deployer.go:200] Deploying default-staging/myfunk-1 for the first time (replicas: 1)
I0613 08:01:37.879229       1 recreate.go:126] Scaling default-staging/myfunk-1 to 1 before performing acceptance check
F0613 08:03:37.893218       1 deployer.go:70] couldn't scale default-staging/myfunk-1 to 1: timed out waiting for the condition

I'm guessing though my issue is that the origin-deployer docker image didn't get rebuilt right?

@jstrachan
Copy link
Contributor Author

Aha - I forgot to use:

export OS_OUTPUT_GOPATH=1

build working much better now ;)...

@jstrachan
Copy link
Contributor Author

@Kargakis I've replaced the binaries and have local docker images of your branch; but it seems if I restart openshift and create pods its still using the previous versions. e.g. its using openshift/origin-pod:v1.3.0-alpha.0 for pod containers.

Is there some way to make the new openshift build use the local build images of things like pod & deployer?

@0xmichalis
Copy link
Contributor

@jstrachan I usually run make release in Vagrant and get all images rebuilt inside.

@jstrachan
Copy link
Contributor Author

I've got the images, just couldn't figure out how to make the new binaries use the newly built docker images (which have label 0625c5d). Tried stopping openshift, zapping all containers and manually labelling them like this...

docker tag  -f openshift/node:0625c5d openshift/node:v1.3.0-alpha.0
docker tag  -f openshift/origin-deployer:0625c5d openshift/origin-deployer:v1.3.0-alpha.0
docker tag  -f openshift/origin-pod:0625c5d openshift/origin-pod:v1.3.0-alpha.0
docker tag  -f openshift/origin-recycler:0625c5d openshift/origin-recycler:v1.3.0-alpha.0
docker tag  -f openshift/origin-recycler:0625c5d openshift/origin-recycler:v1.3.0-alpha.0
docker tag  -f openshift/origin:0625c5d openshift/origin:v1.3.0-alpha.0
docker tag  -f openshift/origin-keepalived-ipfailover:0625c5d openshift/origin-keepalived-ipfailover:v1.3.0-alpha.0

lets see if that helps...

@jstrachan
Copy link
Contributor Author

@Kargakis yay! your branch gave me a reason it didn't work; here's the logs from the deployer

--> Scaling myfunk2-1 to 1
error: couldn't scale myfunk2-1 to 1: Scaling the resource failed with: User "system:serviceaccount:default-staging:deployer" cannot update replicationcontrollers in project "default-staging"; Current resource version 9403

@0xmichalis
Copy link
Contributor

Cool! The scaler has been ignoring all errors except invalid errors when it should ignore only update conflicts.

@jstrachan
Copy link
Contributor Author

@Kargakis thanks for you help!

@jstrachan
Copy link
Contributor Author

Thanks to @Kargakis and @jimmidyson we've figured out what went wrong. It turns out the namespace I was trying to use the DeploymentConfig inside was created via the kubernetes Namespace REST API; rather than the OpenShift Project REST API; so the necessary deployer RoleBinding wasn't created - hence the issue!

If I zapped the project and recreated it via oc new-project then the DC worked like a charm!

@0xmichalis
Copy link
Contributor

Great! @deads2k do we need to start warning about missing rolebindings in oc status?

@jstrachan
Copy link
Contributor Author

jstrachan commented Jun 13, 2016

FWIW I've added the lazy creation of the deployer RoleBinding to our code that was using the Project REST API to create new projects; so that its all working now for projects created via our CD pipeline tooling. It turns out that oc new-project creates a Project plus some RoleBindings too

@jstrachan
Copy link
Contributor Author

@jimmidyson suggested a nicer fix; not to create a Project via the Project REST API; but use ProjectRequest instead which works much better now; I don't have to manually add any RoleBindings any more.

Kinda confusing REST API mind you! :) It'd be less confusing to return a 404 on create Namespace or Project (with a comment to mention ProjectRequest) - as they generally don't work too well if folks wanna use a DeploymentConfig or S2I.

@jimmidyson
Copy link
Contributor

@jstrachan "Normal" (non-cluster admin) users don't have access to the namespaces endpoint or the create project endpoint - can only create project via projectsrequests endpoint AFAIK - so shouldn't be a big problem (most users shouldn't be cluster-admins).

@deads2k
Copy link
Contributor

deads2k commented Jun 13, 2016

Great! @deads2k do we need to start warning about missing rolebindings in oc status?

Most users can't see rolebindings. I'm fine with checking them as long as we don't display any messages if they don't have the power to see them.

Could you key off of an annotation in the namespace instead? Everyone could see that.

@0xmichalis
Copy link
Contributor

#9228 is merged and I opened #9479 for adding a warning in oc status that could help. Closing this

@jstrachan
Copy link
Contributor Author

thanks @Kargakis!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants