Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not deploy Origin 3.10 with packages from CBS repository #8550

Closed
cynepco3hahue opened this issue May 28, 2018 · 29 comments
Closed

Can not deploy Origin 3.10 with packages from CBS repository #8550

cynepco3hahue opened this issue May 28, 2018 · 29 comments

Comments

@cynepco3hahue
Copy link

Description

I am trying to deploy OpenShift origin with packages from CBS repository(https://cbs.centos.org/repos/paas7-openshift-origin310-candidate/x86_64/os/), but deployment fails on

TASK [openshift_service_catalog : wait for api server to be ready] *************
FAILED - RETRYING: wait for api server to be ready (60 retries left).
FAILED - RETRYING: wait for api server to be ready (59 retries left).
...

I checked the cluster and I can see that apiserver-5t2n4 fails to run

# oc get pods --all-namespaces
NAMESPACE               NAME                          READY     STATUS             RESTARTS   AGE
default                 docker-registry-1-lg6x6       1/1       Running            0          4m
default                 registry-console-1-m2xms      1/1       Running            0          3m
default                 router-1-5hmdq                1/1       Running            0          4m
kube-service-catalog    apiserver-5t2n4               0/1       CrashLoopBackOff   4          2m
kube-service-catalog    controller-manager-psjmp      0/1       CrashLoopBackOff   4          2m
openshift-web-console   webconsole-5b4d568df4-dd4dd   1/1       Running            0          3m

Under the apiserver-5t2n4 log I can see:

# oc -n kube-service-catalog logs apiserver-5t2n4
Error: unknown flag: --admission-control
Version
[root@node01 ~]# rpm -qa | grep origin
centos-release-openshift-origin-1-1.el7.centos.noarch
origin-docker-excluder-3.10.0-0.alpha.0.el7.git.0.42168f5.noarch
origin-excluder-3.10.0-0.alpha.0.el7.git.0.42168f5.noarch
origin-3.10.0-0.alpha.0.el7.git.0.42168f5.x86_64
origin-master-3.10.0-0.alpha.0.el7.git.0.42168f5.x86_64
origin-node-3.10.0-0.alpha.0.el7.git.0.42168f5.x86_64
origin-sdn-ovs-3.10.0-0.alpha.0.el7.git.0.42168f5.x86_64
origin-clients-3.10.0-0.alpha.0.el7.git.0.42168f5.x86_64

[root@node01 ~]# rpm -qa | grep ansible
openshift-ansible-3.10.0-0.1.0.git.1.0b52cf9.el7.noarch
openshift-ansible-playbooks-3.10.0-0.1.0.git.1.0b52cf9.el7.noarch
openshift-ansible-docs-3.10.0-0.1.0.git.1.0b52cf9.el7.noarch
ansible-2.4.2.0-2.el7.noarch
openshift-ansible-roles-3.10.0-0.1.0.git.1.0b52cf9.el7.noarch


# oc version
oc v3.10.0-alpha.0+42168f5-81
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://node01:8443
openshift v3.10.0-alpha.0+42168f5-81
kubernetes v1.9.1+a0ce1bc657
Steps To Reproduce
  1. Install relevant package from the above repository
  2. Use inventory
[OSEv3:children]
masters
nodes

[OSEv3:vars]
ansible_ssh_user=root
ansible_ssh_pass=vagrant
deployment_type=origin
openshift_deployment_type=origin
openshift_clock_enabled=true
openshift_master_identity_providers=[{'name': 'allow_all_auth', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}]
openshift_disable_check=memory_availability,disk_availability,docker_storage,package_availability,docker_image_availability
openshift_image_tag=v3.10.0
ansible_service_broker_registry_whitelist=['.*-apb$']
openshift_hosted_etcd_storage_kind=nfs
openshift_hosted_etcd_storage_nfs_options="*(rw,root_squash,sync,no_wdelay)"
openshift_hosted_etcd_storage_nfs_directory=/opt/etcd-vol
openshift_hosted_etcd_storage_volume_name=etcd-vol
openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"]
openshift_hosted_etcd_storage_volume_size=1G
openshift_hosted_etcd_storage_labels={'storage': 'etcd'}
openshift_node_kubelet_args={'max-pods': ['40'], 'pods-per-core': ['40']}
openshift_master_admission_plugin_config={"ValidatingAdmissionWebhook":{"configuration":{"kind": "DefaultAdmissionConfig","apiVersion": "v1","disable": false}},"MutatingAdmissionWebhook":{"configuration":{"kind": "DefaultAdmissionConfig","apiVersion": "v1","disable": false}}}

[nfs]
node01 openshift_ip=node01_ip

[masters]
node01 openshift_ip=node01_ip

[etcd]
node01 openshift_ip=node01_ip

[nodes]
node01 openshift_node_labels="{'region': 'infra','zone': 'default'}" openshift_schedulable=true openshift_ip=node01_ip
  1. Run command
ansible-playbook -i $inventory_file /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml
ansible-playbook -i $inventory_file /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
Expected Results

Deployment succeeds without any errors

Observed Results

Deployment fails

TASK [openshift_service_catalog : wait for api server to be ready] *************
FAILED - RETRYING: wait for api server to be ready (60 retries left).
FAILED - RETRYING: wait for api server to be ready (59 retries left).
FAILED - RETRYING: wait for api server to be ready (58 retries left).

# oc -n kube-service-catalog logs apiserver-5t2n4
Error: unknown flag: --admission-control
Additional Information

I checked that the problem that I have solved under commit 2fba651#diff-f5c4b4675369f72d180a86be3772fe87

So puting updated packages under the repository can be enough to solve the issue.
I think that it can be a good approach to run nighty builds and put generate packages under the relevant repository.

@cynepco3hahue
Copy link
Author

And I apologize if I opened the issue under the wrong repository, I just do not sure who responsible for CBS repository.

@DanyC97
Copy link
Contributor

DanyC97 commented May 29, 2018

@cynepco3hahue i and other guys are looking after creating the Origin rpms which then are pushed into CentOS repos.

Saying that, currently for 3.10 (master branch) the rpm creation doesn't work, i'll look at it next week once i'm back from holiday - sorry for delay/ inconvenience.

@cynepco3hahue
Copy link
Author

@DanyC97 No worries, I will wait with patience 😄

@cynepco3hahue
Copy link
Author

@DanyC97 Hi Doni, do you have any updates?

@alexxa
Copy link

alexxa commented Jun 14, 2018

@cynepco3hahue The same error on a different oc command: openshift/origin#19590 Maybe it's connected.

@DanyC97
Copy link
Contributor

DanyC97 commented Jun 15, 2018

@cynepco3hahue not sure if you are subscribed to openshift dev mailing list or centos-devel (where the PaaS meetings logs are being shared) but this week i've announced that we've built a 3.10 origin rpm from Origin master branch (note there is not RC candidate for 3.10 origin yet) for playing a bit and get ourselves ready from PaaS pov when a release will be cut out.

The rpm are located https://cbs.centos.org/koji/taskinfo?taskID=449606 , they are not pushed to any other repos ( like centos mirror or release)

Let me know how successful you are in getting 3.10 up

and thank you for your patience

@cynepco3hahue
Copy link
Author

@DanyC97 Thanks for the update, I will try to deploy 3.10 with packages from the link.

@cynepco3hahue
Copy link
Author

@DanyC97 I tried to deploy Origin 3.10 with packages from https://cbs.centos.org/koji/taskinfo?taskID=449606 and with openshift-ansible branch v3.10.0-rc.0, but origin-node.service failed to start with the error message

Jun 17 07:41:07 node01 origin-node[4145]: /usr/local/bin/openshift-node: line 17: /usr/bin/openshift-node-config: No such
Jun 17 07:41:07 node01 systemd[1]: origin-node.service: main process exited, code=exited, status=1/FAILURE

Does this file must be part of some package or does it must be created dynamically via ansible execution?

@DanyC97
Copy link
Contributor

DanyC97 commented Jun 19, 2018

@cynepco3hahue that is a v good question and i'm afraid i don't know the answer.

@vrutkovs @sdodson do you guys have any insights into this? is build-rpms.sh not building an rpm which we need ?

@vrutkovs
Copy link
Member

@DanyC97 this seems to be provides by origin-node in CI builds, not sure why would build-rpms not create it

@DanyC97
Copy link
Contributor

DanyC97 commented Jun 20, 2018

hmm that is interesting then @vrutkovs i'll need to dig in to see what is going on

@sdodson
Copy link
Member

sdodson commented Jun 20, 2018

openshift-node-config is a new binary, building 3.10-rc.0 should make this problem go away.

@cynepco3hahue
Copy link
Author

@sdodson Thanks, I can confirm that deployment of 3.10-rc.0 with docker succeeds, with CRI-O we have cri-tools dependency issue.

@DanyC97 Where can I check for Origin release dates?

@sdodson
Copy link
Member

sdodson commented Jun 21, 2018

@cynepco3hahue Can you post the dependency issue? Like is it failing on cri-tools or on a dependency of cri-tools?

@cynepco3hahue
Copy link
Author

@sdodson Sure

TASK [container_runtime : Install cri-o] ***************************************
FAILED - RETRYING: Install cri-o (3 retries left).
FAILED - RETRYING: Install cri-o (2 retries left).
FAILED - RETRYING: Install cri-o (1 retries left).
fatal: [node01]: FAILED! => {"attempts": 3, "changed": false, "msg": "No package matching 'cri-o' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o' found available, installed or updated"]}
        to retry, use: --limit @/root/openshift-ansible/playbooks/prerequisites.retry

@DanyC97
Copy link
Contributor

DanyC97 commented Jun 21, 2018

@sdodson i think we need to tag the cri-o to CentOS release repo too. I'll try to sort it soon.

@cynepco3hahue if the question is about how long do i need to wait for the centOs origin rpms once a new Origin release was cut the answer is: we doing all we can to improve the time and iron few things as we go on in parallel to our daily job

@cynepco3hahue
Copy link
Author

@DanyC97 😄 Just want to be aware of exact dates, because our testing strongly bounded with OpenShift, so I want to be sure that our code works correctly on k8s and OpenShift.

@tsikorski
Copy link

It seems our ability to test new releases of openshift is hampered by the lack of documentation on how to test an upgrade. I have a 13 machine cluster which I have deemed to be my poc environment. It would be nice that once a release candidate is announced there be a way of testing upgrading this cluster using ansible. The same thing happened in 3.9 only there 3.9 went golden with no rpms in site for a number of weeks. So while we can test a single node system using the oc command there is no way of testing a multi-machine cluster. If I am wrong can someone please point me at the documentation on what I would have to do test a release candidate using ansible. Is the ansible method of installation an afterthought because it doesn't seem to be on the same release cycle as the main product? I.e lack of rpms make ansible not usable from a testing perspective.

Ted

@debianmaster
Copy link

openshift-node-config is still missing ! any idea where i can get these rpms?

@cynepco3hahue
Copy link
Author

cynepco3hahue commented Jul 3, 2018

@debianmaster It missed for alpha build, I created rc.0 RPM's for our project, it placed under https://plain.resources.ovirt.org/repos/origin/3.10/ , so you can use this repo as the temporary solution(but it without CRI-O packages)
And for openshift-ansible you will need to use GitHub repository with relevant tag, something like
git clone https://github.com/openshift/openshift-ansible.git -b v3.10.0-rc.0

@debianmaster
Copy link

@cynepco3hahue can you share a your inventory file if its possible? removing sensitive info? thanks for your help

@cynepco3hahue
Copy link
Author

@debianmaster Sure,

[OSEv3:children]
masters
nodes

[OSEv3:vars]
deployment_type=origin
openshift_deployment_type=origin
openshift_clock_enabled=true
openshift_master_identity_providers=[{'name': 'allow_all_auth', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}]
openshift_disable_check=memory_availability,disk_availability,docker_storage,package_availability,docker_image_availability
openshift_image_tag=v3.10.0-rc.0
ansible_service_broker_registry_whitelist=['.*-apb$']
openshift_hosted_etcd_storage_kind=nfs
openshift_hosted_etcd_storage_nfs_options="*(rw,root_squash,sync,no_wdelay)"
openshift_hosted_etcd_storage_nfs_directory=/opt/etcd-vol
openshift_hosted_etcd_storage_volume_name=etcd-vol
openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"]
openshift_hosted_etcd_storage_volume_size=1G
openshift_hosted_etcd_storage_labels={'storage': 'etcd'}
openshift_node_kubelet_args={'max-pods': ['40'], 'pods-per-core': ['40']}
openshift_master_admission_plugin_config={"ValidatingAdmissionWebhook":{"configuration":{"kind": "DefaultAdmissionConfig","apiVersion": "v1","disable": false}},"MutatingAdmissionWebhook":{"configuration":{"kind": "DefaultAdmissionConfig","apiVersion": "v1","disable": false}}}

[nfs]
node01 openshift_ip=$node01_ip

[masters]
node01 openshift_ip=$node01_ip

[etcd]
node01 openshift_ip=$node01_ip

[nodes]
node01 openshift_schedulable=true openshift_ip=$node01_ip openshift_node_group_name="node-config-master-infra"
node02 openshift_node_group_name="node-config-compute" openshift_schedulable=true openshift_ip=$node02_ip

@debianmaster
Copy link

thanks i will give a try

@debianmaster
Copy link

debianmaster commented Jul 5, 2018

not much luck here. guess i will wait for release instead of wasting time.
thanks for the help!

@DanyC97
Copy link
Contributor

DanyC97 commented Aug 3, 2018

@cynepco3hahue can we close this issue in favour of #8399 ? to many duplicated issues ....

@DanyC97
Copy link
Contributor

DanyC97 commented Aug 3, 2018

@cynepco3hahue until you close this issue as mentioned in my previous comment, have a look here with my last update

looking forward to get some feedback

@sdodson
Copy link
Member

sdodson commented Aug 3, 2018

Dupe of #8399

@sdodson sdodson closed this as completed Aug 3, 2018
@cynepco3hahue
Copy link
Author

@DanyC97 Thanks will monitor the issue that you specified.

@abhiroopghatak
Copy link

any update here guys .. I see issue persists for origin 3.11 as well ..

TASK [openshift_node : Install node, clients, and conntrack packages] ******************************************************************************************
Tuesday 29 January 2019 14:42:09 +0000 (0:00:08.279) 0:04:59.844 *******
FAILED - RETRYING: Install node, clients, and conntrack packages (3 retries left).
FAILED - RETRYING: Install node, clients, and conntrack packages (2 retries left).
FAILED - RETRYING: Install node, clients, and conntrack packages (1 retries left).
fatal: [okd-m1]: FAILED! => {"attempts": 3, "changed": false, "msg": "No package matching 'origin-node-3.11*' found available, installed or updated", "rc": 126, "results": ["No package matching 'origin-node-3.11*' found available, installed or updated"]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants