Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system-helper: set IO class to idle #2071

Closed
wants to merge 1 commit into from

Conversation

@ramcq
Copy link
Contributor

commented Sep 3, 2018

Our benchmarks show this significantly reduces the interactivity impact of
ongoing Flatpak operations while the user is continuing other tasks on the
system. The effect is very pronounced with the default CFQ scheduler, and in
combination with BFQ, using the idle class improves the worst case to nearly
the same as an unloaded system.

Thanks to @wjt for the benchmarking. Copy/pasting his results:

I ran some benchmarks to time all combinations of:

  • Mission (HDD root device), Yoga (SSD root device)
  • BFQ, CFQ (both with default settings, ie BFQ is in low-latency mode)
  • the following five background jobs:
    • nothing
    • sudo -i sh -c 'while true; do flatpak install org.gnome.Lollipop.flatpak; flatpak uninstall app/org.gnome.Lollipop//stable; done'
    • sudo -i iogenic org.gnome.Lollipop.flatpak (an implementation of that same loop, using the Flatpak API rather than repeatedly spawning the flatpak cli tool
    • sudo -i ionice -c3 sh -c ...
    • sudo -i ionice -c3 iogenic ...

The benchmark is to launch LibreOffice 15 (on Yoga)/30 (on Mission, because the outliers were … much more outlying!) times. I ran the trials & collected data with:

multitime -r 'echo 3 | sudo tee /proc/sys/vm/drop_caches' -n 15 flatpak run org.libreoffice.LibreOffice --terminate_after_init

https://github.com/wjt/ionogenic/blob/master/results/Plot.ipynb has the mean/min/max/median/std-dev for each combination of the above (there's 1 missing, sorry) and plots of it sliced in various ways. I think this plot below is the clearest. The coloured bars show the median launch time for Libreoffice, and the error bars show the min and max launch time for the trials:

download

I had a theory that while true; do flatpak install; flatpak uninstall; done is mis-detected by BFQ as a bunch of "soft real-time" processes being started, then dying off, which is why I wrote the long-lived process impl. I don't think this data supports my theory.

Rob has a theory that because fsyncs are global barriers on the filesystem, the scheduler can't do much about this, and my guess is that the big outliers pushing up the max in the loaded cases may be caused by that. I guess we could test this by sticking eatmydata around the install/uninstall loop.

Anyway, my conclusion is:

  • On slow spinning disk, changing the scheduler and setting idle IO priority on Flatpak jobs will both independently help
  • On SSD, BFQ makes the worst case worse if we don't set idle IO priority (I guess it is still assigning too high a weight to the flatpak job) but if we do, it's a wash, and we saw above that under sustained IO workloads BFQ seems to perform better
system-helper: set IO class to idle
Our benchmarks show this significantly reduces the interactivity impact of
ongoing Flatpak operations while the user is continuing other tasks on the
system. The effect is very pronounced with the default CFQ scheduler, and in
combination with BFQ, using the idle class improves the worst case to nearly
the same as an unloaded system.
@alexlarsson

This comment has been minimized.

Copy link
Member

commented Sep 4, 2018

@rh-atomic-bot

This comment has been minimized.

Copy link
Collaborator

commented Sep 4, 2018

📌 Commit b4f0e77 has been approved by alexlarsson

@rh-atomic-bot

This comment has been minimized.

Copy link
Collaborator

commented Sep 4, 2018

⌛️ Testing commit b4f0e77 with merge 060322b...

@rh-atomic-bot

This comment has been minimized.

Copy link
Collaborator

commented Sep 4, 2018

☀️ Test successful - status-papr
Approved by: alexlarsson
Pushing 060322b to master...

@hadess

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2018

Would there be a way for the install process itself to say whether it's interactive or not?

I could see gnome-software wanting to change the class of the I/O at runtime, for example, starting with the non-idle class when somebody clicks on the "Install" button in the UI, and switching it to idle when the app goes unfocused, or the user switches to another page.

If that's implementable, I'll file a separate issue.

@ramcq

This comment has been minimized.

Copy link
Contributor Author

commented Nov 17, 2018

@hadess It would be possible indeed, per the code in GNOME Software a process is at liberty to demote its own IO priority should it wish. We thought about this in Endless and weren't sure that the complexity of propagating interactivity through the ops in G-S, tracking the idle / normal threads in the pool (or having two pools), then tracking the same across D-Bus etc was worthwhile. Unless your system is already under IO load from a competing background process, setting the app installs/upgrades to idle prority isn't going to be that big a deal - Flatpak / OSTree operations are so catastrophically slow (because sync() is such a blunt instrument) that I didn't expect anyone to perform them interactively anyway. :P

There is much more low-hanging fruit to speed up the process - eg half the amount of IO we do (see ostreedev/ostree#1723), bunch transactions together across the system helper API so that triggers are only run once at the end of a batch of operations, etc. (Not sure if GNOME Software does or doesn't batch the ops together in one transaction in an upgrade run, but that is lost when the transaction meets the system helper woodchipper...)

@hadess

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2018

Put it another way. If nothing else is going on on the machine, what's the difference in speed between an idle install and a non-idle one? As in, are users going to say that's it's slow, rather than say that it locks the whole machine up ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.