Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Startup Times via Threading #4951

Merged
merged 6 commits into from Jun 9, 2021

Conversation

Web-eWorks
Copy link
Member

@Web-eWorks Web-eWorks commented Sep 2, 2020

Absolutely not finished, this is a horrendous hack that just so happens to work perfectly on my end.

  • I've added profiling to worker threads started by the job system, so now they show up in the profiler output and we can see just how long they stay idle.

  • I've also made Sound::Init() do it's sound and music loading in a background thread using the Job system, which has a few ramifications: FileSystem will need to be reviewed to ensure it's thread-safe, and if we want to play music at the main menu (or even earlier) then we need some way for the main thread to block until a certain group of jobs are finished (which sounds like a task for JobSet to me!).

At the moment there's absolutely no synchronization here, so if the background thread were to theoretically take an entire minute to finish then the game would start with no or almost no music available to MusicPlayer. Loading the music should be thread-safe, but the game would be rather silent until the player started a new game.

I'd love to get @fluffyfreak's feedback on main-thread synchronization in particular - the main thread needs to keep calling FinishJobs() due to the way the job system is set up, but ideally there's a way for it to do that but also yield most of it's time if it's still blocked on async jobs.

EDIT: for the interested, this shaves off ~1000mcycles (or around 1/5th) from our warm startup times in a release configuration. I'd prefer to not load the entire 160-file music and sound effect library on startup and instead stream in the music as needed, but this is a step in the right direction!

@fluffyfreak
Copy link
Contributor

Wow that's brave! I tried threading this mess once before but I went too far and tried to do the graphics loading too, leading too it's failure.

If you wanted to you could sent a flag to wait for the sounds to be loaded before the loading can "complete" and then progress to the main menu? Otherwise I'll check it out asap

@Web-eWorks
Copy link
Member Author

Yeah, I'm thinking I'll add some sort of semaphore to the AsyncJobQueue to allow the main thread to block until the next job is finished - combined with a new method in JobSet to basically call FinishJobs in a loop until the jobset is empty, I can just attach a JobSet to the loader hierarchy and block on it as the last step in the load process.

Please feel free to skim the PR - I mostly need your expertise on an efficient way for the main thread to yield until the next job has completed. Once I have that synchronization finished, I'd appreciate a full review.

@Web-eWorks
Copy link
Member Author

Wound up going with an entirely different - albeit still flexible - method of handling async tasks by simply exposing JobSets for the loader and looping until they are empty before proceeding to the next step and the main menu. The only issue I can think of is that this might cause resource contention issues with vsync disabled and a small number of cores (< 3), so I might revisit it and add an SDL delay call during the idle loop.

@Web-eWorks Web-eWorks marked this pull request as ready for review May 26, 2021 19:27
FileSystem appears to be threadsafe, but could probably use some more
investigation to ensure that we're not inadvertently causing a data race.
Add async overall and per-step JobSet queues to Pi::App for startup code.
Load steps now wait for async jobs to complete before progressing.
The loading screen now waits for all async jobs to transition to menu.
This takes the FaceParts init step (with prewarmed fs cache) to ~220ms from 440ms.

Run FacePart initialization on the background queue (not needed by further steps).
Remove further unnecessary profiling points.
This cleans up ~1/4th of the profile events during star system cache generation.
Instead of finishing one job per thread per frame, now copy job pointers to an on-thread array and run their finalizers in one pass.

This means we can queue potentially 1000s of jobs in a single frame / load step without then needing to finalize them over the next 250 frames.

Potentially investigate an upper limit on the number of finalized jobs; e.g. 100/thread/frame to avoid frame hitches.
@Web-eWorks
Copy link
Member Author

Overall, after moving sound and FaceParts loading to background threads, warm startup times dropped from 2.3s to 1.4s, which is a 40% speedup for development iteration! Cold start times (after clearing the vm cache) stayed about the same for me (~12s) which is unsurprising because Pioneer is throughput bound by dint of residing on my hard drive.

I think this PR is about ready for final review and merge; there's nothing horribly special going on here other than doing some loading in background jobs, and I think I've avoided any synchronization issues. I have some deeper refactoring I want to do to the job system (to allow composable jobs and more granular job waiting) but that's a matter for a different PR, else I'll never get this one landed.

@impaktor
Copy link
Member

I think this PR is about ready for final review and merge

Ping @fluffyfreak

Copy link
Contributor

@fluffyfreak fluffyfreak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow loading does seem much faster! (I have an SSD)

@fluffyfreak
Copy link
Contributor

Sorry the review took so long.
I've tested it out, read through the code, all of it seems good. No problems with anything 👍

@Web-eWorks
Copy link
Member Author

Sorry the review took so long.
I've tested it out, read through the code, all of it seems good. No problems with anything +1

No worries, I've been doing some cleanup and dev on the renderer rewrite branch!
Thanks for reviewing, I tried to make this one as bullet-proof as possible!

@Web-eWorks Web-eWorks merged commit 3f84018 into pioneerspacesim:master Jun 9, 2021
@Web-eWorks Web-eWorks deleted the threaded-improvements branch August 13, 2021 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants