Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masters and Workers can't pivot behind the proxy #19

Closed
jomeier opened this issue Nov 30, 2019 · 9 comments
Closed

Masters and Workers can't pivot behind the proxy #19

jomeier opened this issue Nov 30, 2019 · 9 comments
Milestone

Comments

@jomeier
Copy link
Contributor

jomeier commented Nov 30, 2019

Hi,

I try to install OKD4 on vSphere like this:

https://docs.openshift.com/container-platform/4.2/installing/installing_vsphere/installing-vsphere.html#installing-vsphere

For sure I had to change the version of the append-bootstrap.ign from 2.x.x to 3.0.0 to get it running on the bootstrap server.

I set the platform to 'none' because I create the ignition files and manually create the VMs in vSphere with them.

I'm behind a corporate proxy. Because of that I used the proxy settings in the install-config.yaml.

If I ssh into the bootstrap server the proxy env vars are set. The bootstrap server initialization runs successfully, it servers the ignition files for masters and workers on port 22623.

If I ssh into the master server, I see this in the journal:

Nov 30 09:32:02 localhost sh[1108]: Error: unable to pull quay.io/openshift/okd-content@sha256:6625bb97a35604080af348340f0788df36455352b1a039f073cb6894c548fb78: unable to pull image: Error initializing source docker://quay.io/openshift/okd-content@sha256:6625bb97a35604080af348340f0788df36455352b1a039f073cb6894c548fb78: pinging docker registry returned: Get https://quay.io/v2/: dial tcp 3.230.48.144:443: i/o timeout

This also doesn't work:
sudo podman run hello-world

It seems as if the proxy.sh is missing in /etc/profile.d on the master.

Greetings,

Josef

@jomeier jomeier changed the title No /etc/profile.d/proxy.sh in master.ign and worker.ign No proxy env vars set on master server Nov 30, 2019
@jomeier
Copy link
Contributor Author

jomeier commented Nov 30, 2019

I saw this during sshing into the server:

Fedora 30.20191014.1 (CoreOS preview)
Tracker: https://github.com/coreos/fedora-coreos-tracker
Preview release: breaking changes may occur

Last login: Sat Nov 30 10:07:06 2019 from 172.23.240.46
[systemd]
Failed Units: 3
  machine-config-daemon-host.service
  machine-config-daemon-pull.service
  rpc-statd.service
[core@localhost ~]$

@jomeier
Copy link
Contributor Author

jomeier commented Nov 30, 2019

systemctl status machine-config-daemon-host shows that:

[core@localhost ~]$ sudo systemctl status machine-config-daemon-host
● machine-config-daemon-host.service - Machine Config Daemon Initial
   Loaded: loaded (/etc/systemd/system/machine-config-daemon-host.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/machine-config-daemon-host.service.d
           └─10-default-env.conf
   Active: failed (Result: exit-code) since Sat 2019-11-30 09:32:02 UTC; 40min ago
  Process: 2078 ExecStart=/usr/local/bin/machine-config-daemon pivot (code=exited, status=203/EXEC)
 Main PID: 2078 (code=exited, status=203/EXEC)
      CPU: 954us

Nov 30 09:32:02 localhost systemd[1]: Starting Machine Config Daemon Initial...
Nov 30 09:32:02 localhost systemd[2078]: machine-config-daemon-host.service: Failed to execute command: Permission denied
Nov 30 09:32:02 localhost systemd[2078]: machine-config-daemon-host.service: Failed at step EXEC spawning /usr/local/bin/machine-config-daemon: Permission denied
Nov 30 09:32:02 localhost systemd[1]: machine-config-daemon-host.service: Main process exited, code=exited, status=203/EXEC
Nov 30 09:32:02 localhost systemd[1]: machine-config-daemon-host.service: Failed with result 'exit-code'.
Nov 30 09:32:02 localhost systemd[1]: Failed to start Machine Config Daemon Initial.
Nov 30 09:32:02 localhost systemd[1]: machine-config-daemon-host.service: Consumed 954us CPU time.

@jomeier
Copy link
Contributor Author

jomeier commented Nov 30, 2019

It seems as if the machine-config-daemon is an empty binary:

[core@localhost ~]$ ls -alh /usr/local/bin
total 40K
drwxr-xr-x.  2 root root  270 Nov 30 10:44 .
drwxr-xr-x. 11 root root  114 Nov 30 10:44 ..
-rwxr-xr-x.  1 root root 1.4K Nov 30 10:44 etcd-member-add.sh
-rwxr-xr-x.  1 root root 2.5K Nov 30 10:44 etcd-member-recover.sh
-rwxr-xr-x.  1 root root  642 Nov 30 10:44 etcd-member-remove.sh
-rwxr-xr-x.  1 root root  931 Nov 30 10:44 etcd-snapshot-backup.sh
-rwxr-xr-x.  1 root root 1.7K Nov 30 10:44 etcd-snapshot-restore.sh
-rw-r--r--.  1 root root    0 Nov 30 10:44 machine-config-daemon
-rw-r--r--.  1 root root  12K Nov 30 10:44 openshift-recovery-tools
-rwxr-xr-x.  1 root root 1.2K Nov 30 10:44 recover-kubeconfig.sh
-rwxr-xr-x.  1 root root 1.1K Nov 30 10:44 tokenize-signer.sh

@jomeier
Copy link
Contributor Author

jomeier commented Nov 30, 2019

journalctl --no-pager | grep machine-config-dameon shows:

ine-config-daemon-host.service"
Nov 30 10:51:08 localhost ignition[818]: INFO     : files: op(2d): [finished] processing unit "machine-config-daemon-host.service"
Nov 30 10:51:08 localhost ignition[818]: INFO     : files: op(2f): [started]  enabling unit "machine-config-daemon-host.service"
Nov 30 10:51:08 localhost ignition[818]: INFO     : files: op(2f): [finished] enabling unit "machine-config-daemon-host.service"
Nov 30 10:51:08 localhost ignition[818]: INFO     : files: op(30): [started]  processing unit "machine-config-daemon-pull.service"
Nov 30 10:51:08 localhost ignition[818]: INFO     : files: op(30): op(31): [started]  writing unit "machine-config-daemon-pull.service" at "/sysroot/etc/systemd/system/machine-config-daemon-pull.service"
Nov 30 10:51:08 localhost ignition[818]: INFO     : files: op(30): op(31): [finished] writing unit "machine-config-daemon-pull.service" at "/sysroot/etc/systemd/system/machine-config-daemon-pull.service"
Nov 30 10:51:08 localhost ignition[818]: INFO     : files: op(30): [finished] processing unit "machine-config-daemon-pull.service"
Nov 30 10:51:08 localhost ignition[818]: INFO     : files: op(32): [started]  enabling unit "machine-config-daemon-pull.service"
Nov 30 10:51:08 localhost ignition[818]: INFO     : files: op(32): [finished] enabling unit "machine-config-daemon-pull.service"
Nov 30 10:51:10 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=machine-config-daemon-pull comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Nov 30 10:51:10 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=machine-config-daemon-host comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Nov 30 10:51:10 localhost systemd[1]: machine-config-daemon-pull.service: Main process exited, code=exited, status=125/n/a
Nov 30 10:51:10 localhost systemd[1]: machine-config-daemon-pull.service: Failed with result 'exit-code'.
Nov 30 10:51:10 localhost systemd[1]: machine-config-daemon-pull.service: Consumed 178ms CPU time.
Nov 30 10:51:10 localhost systemd[1263]: machine-config-daemon-host.service: Failed to execute command: Permission denied
Nov 30 10:51:10 localhost systemd[1263]: machine-config-daemon-host.service: Failed at step EXEC spawning /usr/local/bin/machine-config-daemon: Permission denied
Nov 30 10:51:10 localhost systemd[1]: machine-config-daemon-host.service: Main process exited, code=exited, status=203/EXEC
Nov 30 10:51:10 localhost systemd[1]: machine-config-daemon-host.service: Failed with result 'exit-code'.
Nov 30 10:51:10 localhost systemd[1]: machine-config-daemon-host.service: Consumed 767us CPU time.

The service machine-config-daemon-pull fails. It can't download the initial docker image.

If I set the proxy manually in the systemd file for it, and restart the both services:

sudo systemctl restart machine-config-daemon-pull
sudo systemctl restart machine-config-daemon-host

The downloads work until the master restarts.

So in my opinion the proxy credentials should be rendered in the ignition files which are served from the bootstrap server to the masters and workers.

@jomeier jomeier changed the title No proxy env vars set on master server No proxy env vars set for master.ign and worker.ign Dec 1, 2019
@vrutkovs
Copy link
Member

vrutkovs commented Dec 5, 2019

Dupe of #14?

@jomeier
Copy link
Contributor Author

jomeier commented Dec 5, 2019

@vrutkovs:
It's not related to VMWARE. If a proxy is configured in the install-config.yaml it should also be rendered in the master.ign and worker.ign served by the machine-config-server.

@vrutkovs
Copy link
Member

vrutkovs commented Dec 5, 2019

Ah, I see now. Please rename to "Master can't pivot behind the proxy". OKD-specific machine-config-daemon-pull service doesn't yet use proxy env vars

@jomeier jomeier changed the title No proxy env vars set for master.ign and worker.ign Masters and Workers can't pivot behind the proxy Dec 6, 2019
@vrutkovs vrutkovs added this to the Release milestone Feb 11, 2020
@evertmulder
Copy link

A workaround I used to get the installation (OKD beta5) working behind a firewall is to execute the following in every master and worker on first boot:

sudo -i
mkdir /etc/systemd/system/machine-config-daemon-firstboot.service.d
cp /etc/systemd/system/machine-config-daemon-host.service.d/10-default-env.conf /etc/systemd/system/machine-config-daemon-firstboot.service.d
systemctl daemon-reload
systemctl stop machine-config-daemon-firstboot
systemctl start machine-config-daemon-firstboot

@vrutkovs
Copy link
Member

vrutkovs commented Jun 3, 2020

Fix merged in 4.4.0-0.okd-2020-06-02-224917 (thanks @evertmulder!), also backported to fcos-4.5 branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants