Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nginx fails after downgrade from 2.7 to 2.6 #3564

Closed
joseph-reynolds opened this issue Jun 24, 2019 · 9 comments
Closed

nginx fails after downgrade from 2.7 to 2.6 #3564

joseph-reynolds opened this issue Jun 24, 2019 · 9 comments
Assignees

Comments

@joseph-reynolds
Copy link

The nginx service fails to start on 2.6 and earlier systems in the following scenario:

  1. Install a firmware image based on OpenBMC 2.7.
  2. Perform code update to install the firmware image based on OpenBMC 2.6 or earlier. (Terminology: downgrade = code upgrade to older release.)
  3. Boot the BMC.

Note that OpenBMC 2.6 and earlier used the nginx web server; OpenBMC 2.7 is the first release that used the BMCWeb web server.

The nginx service attempts to create certificates in the /etc/ssl/certs/nginx directory, but only /etc/ssl/certs exists, so the service fails, and the web server is not available.

The workaround and recovery is to ssh to the BMC, create the directory (for example, via mkdir -p /etc/ssl/certs/nginx), and start nginx (systemctl start nginx). This only needs to be performed once at the time of downgrade, and it can be performed either before or after the downgrade.

@joseph-reynolds
Copy link
Author

The underlying problems seems to be that the /etc/ssl/certs/nginx directory is only installed when the 2.6 image is installed from scratch (not from code update).

Two approaches to fix the problem are:

  1. Have the 2.6 and earlier systems create the directory before attempting to use them.
  2. Have the 2.7 and later systems create the directory in case the system is downgraded to 2.6 or earlier.

@mdmillerii
Copy link
Contributor

This would seem to be related to our use of overlayfs over the /etc directory.
However, normally this doesn't come up because the directories are left transparent (ie unmodified entries in the lower layer are passed up), unless something like the whole certs directory was removed and re-added. Unfortunately the distinguishing read is in a hidden xattribute in the cow directory.

@joseph-reynolds were you able to recreate this behavior on your own downgrade, or could there be hidden state here?

Another approach would be to factory-reset and/or reset-rwfs-filesystem for those using the initramfs.

@joseph-reynolds
Copy link
Author

I re-created this issue by installing an image based on the OpenBMC 2.7 development branch, then used code update to downgrade to the older release (exactly similar to the scenario above). I tested the workaround "creating the directory after the downgrade" and it worked. I don't fully understand everything that's going on.

@joseph-reynolds
Copy link
Author

The direction I am getting is to fix this in the 2.7 release. That is, have the 2.7 firmware create the /etc/ssl/certs/nginx directory (which is needed in case the system is downgraded to 2.6). I think the right way to do that is a new service, something like "nginx-prep-downgrade" which will create the directory. In this way, the service can be cleanly deleted when it is no longer needed, for example, when a downgrade directly to 2.6 is not supported.

I believe the right place to do this is in https://github.com/openbmc/meta-ibm - meta-ibm/meta-ibm/recipes-httpd/nginx/nginx-prep-downgrade.bb

joseph-reynolds referenced this issue in openbmc/meta-ibm Jun 25, 2019
Nginx on OpenBMC has a number of issues that matter to openbmc.

1. It increases the binary size.  This is an issue given that OpenBMC
targets a relatively minimal flash footprint.
2. It increases the runtime overhead.  Running nginx as a reverse proxy
to the application servers causes a runtime overhead, and context switch
for every single page load, as well as an extra socket.
3. nginx doesn't implement any kind of authentication, so auth needs to
be implemented in every application server.  This removes a lot of the
advantages of the reverse proxy, and duplicates a lot of code amongst
multiple application servers
4. A number of nginx parameters run from the nginx config file.  Some of
these parameters (like cipher suite support) are desired to be changed
at runtime, rather than fixed at compile time.

Related to commit here to move system to bmcweb:
https://gerrit.openbmc-project.xyz/#/c/openbmc/meta-phosphor/+/12933/

Change-Id: I988fce8dae565808bd0eeacd8b7a71f3cc06d98f
Signed-off-by: Ed Tanous <ed.tanous@intel.com>
(cherry picked from commit 699e296)
@anoo1
Copy link
Contributor

anoo1 commented Jun 26, 2019

It seems the nginx directory is marked as opaque for some reason and only the upper dir of the overlay is shown. If the overlay is mounted read-only, the nginx dir shows up:

root@witherspoon:~# ls -l /var/persist/etc/ssl/certs/
-rw-r--r--    1 root     root          2879 Jun 22 09:56 Root-CA.pem
drwxr-xr-x    2 root     root           376 Jun 25 14:04 https
drwxr-xr-x    2 root     root           232 Jun 26 17:30 nginx

root@witherspoon:~# cat /proc/mounts
/dev/root / squashfs ro,relatime 0 0
devtmpfs /dev devtmpfs rw,relatime,size=197864k,nr_inodes=49466,mode=755 0 0
ubi0:rwfs /var ubifs rw,relatime,assert=read-only,ubi=0,vol=2 0 0
overlay /etc overlay ro,relatime,lowerdir=/etc:/var/persist/etc 0 0

It'd be interesting to check if non-witherspoon systems using the overlay from the initramfs have the same issue or not.

@anoo1 anoo1 self-assigned this Jun 27, 2019
@joseph-reynolds
Copy link
Author

Pushed a mitigation for this: https://gerrit.openbmc-project.xyz/c/openbmc/meta-ibm/+/23203

@gtmills
Copy link
Member

gtmills commented Nov 16, 2019

@joseph-reynolds That commit got abandoned. Leave this issue open in case someone else hits and we can close after some time has passed? Close now?

@stale
Copy link

stale bot commented May 17, 2020

This issue has been automatically marked as stale because no activity has occurred in the last 6 months. It will be closed if no activity occurs in the next 30 days. If this issue should not be closed please add a comment. Thank you for your understanding and contributions.

@stale stale bot added the stale label May 17, 2020
@stale
Copy link

stale bot commented Jun 16, 2020

This issue has been closed because no activity has occurred in the last 7 months. Please reopen if this issue should not have been closed. Thank you for your contributions.

@stale stale bot closed this as completed Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants