Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Support airgapped setup on vSphere, make pause image location configurable/prepull pause image #579

Closed
jomeier opened this issue Aug 11, 2021 · 11 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@jomeier
Copy link

jomeier commented Aug 11, 2021

Hi,

as of #141 (comment)

the only reason, why airgapped OCP installations with Windows Containers are not supported seems to be that the pause image must be pulled from the internet.

Because we must provide our customers support for Windows Containers in production we are definitely interested to get support for this from RH.

An idea to mitigate this problem (as we did it in our own OCP Windows container test setup):

  • we prepared a golden Windows VM as described in the documentation (with modifications we had to reverse engineer because the docs are not complete) in an environment where we had internet access
  • we prepulled the pause image from the internet to the golden Windows VM
  • we sysprepped the golden Windows VM
  • we created the MachineSet for vSphere using the previously configured golden Windows VM in our airgapped cluster

In our setup everything works great with Windows containers, no problems in our airgapped environment.

The second option would be to make the hard coded pause image configurable (environment variable of WMCO operator?) so OCP users can store the pause image in an airgapped registry on premises. This should also be rather simple.

In our opinion airgapped clusters are very convenient with OCP intalled on vSphere. So this feature definitely should be support or at least the workaround with prepulled pause images on the golden Windows VM should be supported.

Could you have a look on that, please? It's very important for us.

Thanks and greetings,

Josef

@saifshaikh48
Copy link
Contributor

There does not seem to be any problems to me in the workaround you have tried and outlined. Glad to hear Windows containers ran without issue.

Air-gapped environment support is still something that is not officially supported, until we fully test and add documentation around it ourselves. I have created a few stories (WINC-662 & WINC-663) to put this work on our radar; the team will work to prioritize these soon.

@aravindhp
Copy link
Contributor

@jomeier, given you are using vSphere and you anyhow need to create a golden image, would it be sufficient to update the golden image creation docs to add a step to pre-pull the pause image? BTW, we are aware about the issues and are in the process of fixing them. Please see 48fbd2e

@jomeier
Copy link
Author

jomeier commented Aug 11, 2021

@saifshaikh48
@aravindhp
Thank you !

@philipp1992
Could you provide the unattend.xml we used to create the initial setup of the workers and the steps we provided to create the golden image, please? I could provide a PR for the OpenShift docs with that information.

@philipp1992
Copy link

philipp1992 commented Aug 11, 2021

@philipp1992
Could you provide the unattend.xml we used to create the initial setup of the workers and the steps we provided to create the golden image, please?

we used the unattend.xml provided in the docs minus the product-key section

https://docs.openshift.com/container-platform/4.8/windows_containers/creating_windows_machinesets/creating-windows-machineset-vsphere.html#creating-the-vsphere-windows-vm-golden-image_creating-windows-machineset-vsphere

by the way, since we are talking about the documentation and golden image, we found many errors within this documentation.

1: Install-Module -Force OpenSSHUtils -> this module is deprecated and can't be installed as described. also its not needed anymore

2: the way the ssh authorized_keys is supposed to be placed in the Administrators homedir does not work at all, because the users homedir will be wiped by sysprep. the docs mention, that this should be prevented by the provided unattend.xml, but there is no such mechanism in the xml. so we had to put the authorized_keys files outside of the homedir and modify the sshd_config accordingly.
this behaviour was also reported months ago in this issue: #484

3: another thing that bothers me is, that the provided unattend.xml enabled auto-logon. this means anyone with console access to the vm can access the operating system without logging in first. useful for troubleshooting not so nice security wise. Can you please explain, if this is really needed?

4: also the ssh key pair didn't work, when creating it on linux with ssh-keygen, so I had to create it on the windows machine with the same command and use the private key from the windows machine for the operator secret. didn't further investigate into this.

@aravindhp
Copy link
Contributor

@jrvaldes please address @philipp1992's questions regarding the golden image creation process.

@jrvaldes
Copy link
Contributor

jrvaldes commented Aug 11, 2021

Dear @philipp1992, thanks for sharing your notes; there is an ongoing effort to correct the documentation around the golden image creation process, please refer to c8ff886 and 48fbd2e.

1: Install-Module -Force OpenSSHUtils -> this module is deprecated and can't be installed as described. also its not needed anymore

Corrected in Set up SSH section. OpenShift documentation will be updated soon.

2: the way the ssh authorized_keys is supposed to be placed in the Administrators homedir does not work at all, because the users homedir will be wiped by sysprep. the docs mention, that this should be prevented by the provided unattend.xml, but there is no such mechanism in the xml. so we had to put the authorized_keys files outside of the homedir and modify the sshd_config accordingly.
this behaviour was also reported months ago in this issue: #484

Corrected in Deploying the public key section. OpenShift documentation will be updated soon.

3: another thing that bothers me is, that the provided unattend.xml enabled auto-logon. this means anyone with console access to the vm can access the operating system without logging in first. useful for troubleshooting not so nice security wise. Can you please explain, if this is really needed?

That's a good question and yes it's a security risk to leave an Administrator' terminal open. I need to run few more tests to identify if is "really needed". However, to mitigate this, you can tune the LogonCount value to specify the number of times that you can log on to the computer by using AutoLogon. Be aware of the LogonCount known issue.

4: also the ssh key pair didn't work, when creating it on linux with ssh-keygen, so I had to create it on the windows machine with the same command and use the private key from the windows machine for the operator secret. didn't further investigate into this.

Glad you found a workaround to this issue. can you provide a set of instructions to replicate it? Looks like need further investigation in a separate issue, since is not related with the golden image creation process.

@jrvaldes
Copy link
Contributor

@jomeier, @philipp1992

3: another thing that bothers me is, that the provided unattend.xml enabled auto-logon. this means anyone with console access to the vm can access the operating system without logging in first. useful for troubleshooting not so nice security wise. Can you please explain, if this is really needed?

As a follow-up see 90d680d, you're right the AutoLogon feature is not needed. Thanks for calling it out.

jrvaldes added a commit to jrvaldes/windows-machine-config-operator that referenced this issue Aug 16, 2021
Include instructions to pre-pull the Pause Image into the golden
image during the creation process, to support disconnected network
environments.

Follow-up for openshift#579
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 10, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 10, 2021
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Jan 9, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 9, 2022

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants