Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reports of updater hanging without observable activity #668

Closed
eloquence opened this issue Mar 18, 2021 · 5 comments · Fixed by #686
Closed

Reports of updater hanging without observable activity #668

eloquence opened this issue Mar 18, 2021 · 5 comments · Fixed by #686
Assignees
Labels

Comments

@eloquence
Copy link
Member

eloquence commented Mar 18, 2021

We've received several reports of the preflight updater hanging and not recovering. I've personally seen it happen with the following STR:

  1. Uninstall
  2. Install 0.5.3 prod RPM and ensure config is set to prod
  3. Run sdw-admin --apply
  4. Run preflight updater (preferably in terminal so you see full output)

Expected behavior

Updater will trigger full migration due to postinst logic added in 6cf625c, but will eventually run to completion

Actual behavior

Updater gets stuck at 35% with no observable activity. The issue is resolved on reboot, which is expected because /tmp will be empty and the migration will not be applied.

@eloquence
Copy link
Member Author

As part of investigating this issue, one immediate improvement we could make to the updater is to ensure that all output (not just the launcher's own log file) is written to disk for easier debugging.

@emkll
Copy link
Contributor

emkll commented Mar 23, 2021

I have observed this issue locally while testing #666 , the updater will try to run a full apply run, but hangs at 35% after applying dom0 state. The reboot did resolve. I did observe that a task failed while initially applying dom0 state, prior to running the full provisioning run:

Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]: ----------
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:           ID: dom0-adjust-desktop-icon-size-xfce
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:     Function: cmd.script
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:         Name: salt://update-xfce-settings
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:       Result: False
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:      Comment: Command 'salt://update-xfce-settings' run
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:      Started: 11:23:14.221777
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:     Duration: 118.96 ms
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:      Changes:
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:               ----------
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:               pid:
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:                   16356
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:               retcode:
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:                   1
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:               stderr:
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:                   Failed to init libxfconf: Failed to connect to socket /run/user/0/bus: Permission denied.
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:               stdout:
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]:                   update-xfce-settings: Adjusting icon size for user m to 64 px
Mar 23 11:23:32 dom0 org.xfce.FileManager[5810]: ----------

Did you also observe this error? If so, we should:

  1. Investigate the underlying issue with the XFCE settings that are not applied correctly
  2. Make the salt commands more resilient to failures (and consider exiting before applying dom0 state if it fails in https://github.com/freedomofpress/securedrop-workstation/blob/main/launcher/sdw_updater_gui/UpdaterApp.py#L190)
  3. Better surface errors to the logs. The error logs above already available in dom0 through journalctl.

@emkll emkll added this to Near Term - SD Workstation in SecureDrop Team Board Mar 23, 2021
@eloquence eloquence moved this from Near Term - SD Workstation to Next sprint candidates in SecureDrop Team Board Mar 23, 2021
@eloquence eloquence self-assigned this Mar 24, 2021
@eloquence eloquence moved this from Next sprint candidates to Sprint #68 (3/24-4/7) in SecureDrop Team Board Mar 24, 2021
@eloquence
Copy link
Member Author

eloquence commented Mar 31, 2021

Investigate the underlying issue with the XFCE settings that are not applied correctly

Some initial observations:

  1. I've so far been unable to get a repro of the specific error you observed by forcing an updater migration run. To test, I created a file called /tmp/sdw-migrations/testing, and then re-ran the updater, which indeed did force a migration. The update-xfce-settings actions completed successfully; the updater did not halt.

  2. I've also been unable to reproduce the error by manually applying the dom0 Salt states.

  3. The output you pasted suggests that the script is correctly run as the GUI user (Adjusting icon size for user m to 64 px - the script reads that username m from $USER). However, DBUS_SESSION_BUS_ADDRESS apparently points to a DBUS address for root (user 0). See this logic:

https://github.com/freedomofpress/securedrop-workstation/blob/main/dom0/update-xfce-settings#L26-L28

Either id -u sometimes evaluates to 0 (even though we invoke the script with Salt's runas and $USER evaluates correctly), or DBUS_SESSION_BUS_ADDRESS already points to the root user's DBUS session. Either way, that incorrect value is sufficient to cause the error, as the script does not have permission to access that file when run as the GUI user.

Before modifying the script, I'd like to understand why it's (sometimes?) attempting to access the wrong DBUS session.

I will add some debugging lines and see if I can reproduce the error you observed:

  • During a clean 0.5.3 install
  • During a clean install from latest main
  • In forced migrations (possibly sporadically)

Please let me know if you have other pointers or debugging suggestions. :)

@eloquence
Copy link
Member Author

eloquence commented Apr 1, 2021

During a clean 0.5.3 install

So far no dice on a repro. Here's what I did:

  • Uninstall
  • Install 0.5.3 prod RPM (which drops the postinst flag forcing a migration)
  • Run sdw-admin --apply with env set to staging
  • Add debugging code to update-xfce-settings
  • Downgrade a package in dom0
  • Rerun updater

Saw a successful sdw-admin --apply run triggered by the updater, and the DBUS_SESSION_BUS_ADDRESS variable was set to the expected value. Reboot prompt also worked as expected, and I did not see #663 after the reboot. Will continue to poke.

@eloquence
Copy link
Member Author

I still haven't been able to reproduce the state reported by @emkll in #668 (comment) but #684 should make it impossible to get into that specific failure mode and should have no negative side effects. Next up: more graceful error handling for the updater itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants