Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Servo TSC Meeting May 2024 #88

Closed
mrego opened this issue May 15, 2024 · 13 comments
Closed

Servo TSC Meeting May 2024 #88

mrego opened this issue May 15, 2024 · 13 comments
Labels
tsc-meeting TSC Meeting

Comments

@mrego
Copy link
Member

mrego commented May 15, 2024

Agenda

  • Status update
  • Crowdfunding
    • Proposals/ideas to use the money
  • ReadableStream
  • Vello integration update
  • Crate ownership discussion
  • Criteria for publishing stylo
  • Changelogs
  • Speedometer performance test suite
  • Outreach
  • AOB
@mrego mrego added the tsc-meeting TSC Meeting label May 15, 2024
@mrego
Copy link
Member Author

mrego commented May 15, 2024

As usual feel free to propose topics for the agenda in the comments here.

@wusyong
Copy link
Member

wusyong commented May 24, 2024

Can we add one for ReadableStream? Next major mozjs bump is coming and I think this is becoming a hot topic again. I would like to sort out the steps required to push it.

And if there's some time, I would like to mention my recent side project Verso in AOB. It doesn't have much stuff yet. But I just want to make some clarification if anyone is confused what it is.

@gterzian
Copy link
Member

Please add the following topics:

  • Vello integration update, discussion to try to achieve consensus on continuing to explore(not tech discussion about any integration itself).
  • Follow-up on crate ownership discussion, proposal to add a table in Servo book with main committers per crate(to be filled-in over time).
  • Follow-up on running Speedometer performance test suite with Servo(potential as Outreachy project?)

@delan
Copy link
Member

delan commented May 28, 2024

  • Crowdfunding
    • Proposals/ideas to use the money

I propose that we spend 184.21 EUR/month (~201 USD/month) on a dedicated server for CI runners, to cut our build times by 32%. We can do this in a way that enhances our existing CI setup without locking us in.

As of April 2024, our main workflow takes around 61 min across four platforms, where Linux and Windows are the critical paths at 61 min each:

  • Linux path (build + wpt), mean 61 min (41 min – 139 min)
    • Linux build job, mean 29 min (19 min – 41 min)
  • Windows path, mean 61 min (56 min – 77 min)
  • macOS path, mean 41 min (32 min – 68 min)
  • Android path, mean 19 min (16 min – 24 min)

GitHub-hosted runners are limited to 4 cores and 16 GiB RAM, they are forced to install deps and revive caches at runtime, and their Windows runners are notoriously slow. Self-hosted runners would allow us to prebake deps, caches, repo checkout, and even prebuild main for incremental builds. This is especially useful for Windows builds, where file I/O performance is poor.

Some prototyping of a libvirt-and-zfs-based runner system, using very similar hardware to a Hetzner AX102, shows that we can cut the Windows build job from 61 min down to under 13 min and the Linux build job from 29 min down to under 8 min, making macOS the critical path.

Alternatives considered include:

  • GitHub-hosted larger runners — much easier, much more expensive (1032 USD/month)
    • note that these numbers are based on build times that would only be possible with self-hosted runners, so the actual amount would be higher
  • EC2 dedicated instances — potentially easier, more expensive (565 USD/month)
  • EC2 shared instances — less predictable perf, similar cost (192 USD/month)
  • Azure shared instances — less predictable perf, cheaper (131 USD/month)
  • third-party runner providers — none support Windows yet
Raw data (based on April 2024 usage)
  • instance hours, 186.28 hours/month
    • Windows, 381 runs x 13/60 hours = 82.55 hours (analysis: < 2024-04.json jq -c '.[].job_runs[] | select(.labels == ["windows-2022"]) | .name' | jq -sc unique | tr \\n \\0 | xargs -0I{} ./compute-critical-path.sh 2024-04.json {} {})
    • Linux, 778 runs x 8/60 hours = 103.73 hours (analysis: < 2024-04.json jq -c '.[].job_runs[] | select(.labels != [] and (.labels[0] | startswith("ubuntu-")) and (.name | endswith("/ Linux Build"))) | .name' | jq -sc unique | tr \\n \\0 | xargs -0I{} ./compute-critical-path.sh 2024-04.json {} {})
  • egress traffic, 125.4 GB/month
    • Windows, 381 runs x 125MB of artifacts = 47.6 GB upload
    • Linux, 778 runs x 100MB of artifacts = 77.8 GB upload
  • github larger runners, $1032.30/month
    • Windows 16-core = $633.98/month at $7.680/hour ($0.128/min)
    • Linux 16-core = $398.32/month at $3.840/hour ($0.064/min)
  • ec2, us-east-2, $192.53/month + $372.56/month for dedicated instances
    • Windows, c5ad.4xlarge (16c, 32G, NVMe) = $117.55/month at $1.424/hour
    • Linux, c6gd.4xlarge (16c, 32G, NVMe) = $63.73/month at $0.6144/hour
    • 125 GB egress = $11.25/month at $0.09/GB
    • dedicated instance fee, 186.28 hours = $372.56/month at $2/hour
  • azure, East US, $131.34/month
    • Windows, B16ms (16c, 64G, 128G storage) = $60.26/month at $0.730/hour
    • Ubuntu, B16ms (16c, 64G, 128G storage) = $69.08/month at $0.666/hour
    • 25 GB paid egress = $2.00/month at $0.08/GB

@sagudev
Copy link
Member

sagudev commented May 28, 2024

Self-hosted runners would allow us to prebake deps, caches, repo checkout, and even prebuild main for incremental builds

I think all of this could be achieved using docker images that can be loaded onto provided runners: https://docs.github.com/en/actions/using-jobs/running-jobs-in-a-container, something I've been planing to try out ever since setting up docker images for cross compiling for mozjs: https://github.com/servo/servo-build-deps/tree/main/docker.

to cut our build times

I think the predominant reason to have self-hosted runners is so we can bring back WPT runs on windows and mac, just cutting build times isn't worth the price IMO (with the amount of contributors that project currently have).

GitHub-hosted larger runners — much easier, much more expensive (1032 USD/month)

GitHub already gives us more free runners, so it's possible that they give us better pricing too.

@delan
Copy link
Member

delan commented May 28, 2024

I think all of this could be achieved using docker images that can be loaded onto provided runners: https://docs.github.com/en/actions/using-jobs/running-jobs-in-a-container, something I've been planing to try out ever since setting up docker images for cross compiling for mozjs: https://github.com/servo/servo-build-deps/tree/main/docker.

That sounds promising! But from what I can see, we can only use this for build jobs that run on Linux, whereas Windows and macOS could benefit from that just as much or even more, no?

I think the predominant reason to have self-hosted runners is so we can bring back WPT runs on windows and mac, just cutting build times isn't worth the price IMO (with the amount of contributors that project currently have).

I think that could be a worthwhile goal too, yeah, it would be great for Servo to run tests on all platforms (and someday other kinds of tests, like perf tests). That said, I wouldn’t discount the effect of faster builds on developer productivity, even when the GitHub-hosted runners have enough capacity.

GitHub already gives us more free runners, so it's possible that they give us better pricing too.

That’s true, maybe we can ask them.

@nicoburns
Copy link

CI build times are definitely a drain on my productivity when working on the Servo project, so I would be in favour of self-hosted runners (and FWIW, my experience with self-hosted GA runners in the past has been positive). Some questions:

  • What exactly is it proposed that we buy? Because the linked AX20 is only €55/month.
  • Are we losing anything in terms of concurrency? While this will complete individual runs faster, presumably it will have less capacity for concurrent runs?
  • Might we consider doing the same for macOS in future? Scaleway offer reasonablish priced instances (although the benefit might not be a big as now that Github have apple silicon runners)

Some further thoughts:

  • If we wanted a cheaper option then perhaps we could consider a self-hosted cache machine that was mostly disk and network IO. Github's cache infra is slow and too small for Servo to use, but we could probably build our own cheaply. IMO full dedicated runners probably makes sense.
  • Much of my reliance on CI is due to cross-platform differences in WPT results. I've mostly been working with layout where the results really shouldn't be platform-dependent. If I could results that are more reliable across platforms then I could rely on running the tests locally which would greatly improve my iteration speed.
  • Having to ensure that both layout_2013 and layout_2020 tests pass is doubling the time it takes me to run the tests locally. As layout_2013 is not being actively worked on, this is only really relevant when enabling new style properties, but I am planning to do quite a bit of that.

@nicoburns
Copy link

nicoburns commented May 28, 2024

I would like to add the following to the agenda:

Update link in selectors repository

The old selectors repo (https://github.com/servo/rust-selectors) currently links to components/selectors directory in the servo/servo repo. We should update it to link to the servo/stylo repo.

This will require temporarily unarchiving the repository which I do not have permissions for. (the repository link on crates.io also needs updating, but this should happen as part of publishing stylo).

(this has now been done)

Criteria for publishing stylo

I would be keen to publish sooner rather than later, with regular ongoing releases on at least a quarterly basis (can of course be more frequent). Publishing does not necessarily need to coincide with an announcement of it's availability.

Tasks I personally consider to be blocking this:

  • Add more people as owners of the crate on crates.io (I can do this once we determine who should be added)
  • Crate Cargo.toml updates
    • Rename crate from style to stylo (we should probably also rename subcrates like style_traits and style_config)
    • Update the version numbers (probably to 0.1.0 for stylo? already published crates like selectors will need to be bumped if they have changed)
    • Update the repository links
    • Anything else that makes sense (a pass over the metadata makes sense)
  • Removing git and path dependencies from https://github.com/servo/stylo/blob/main/Cargo.toml (can this be permanent or does this need to be temporary as part of publishing?)
  • Add a little more documentation to the lib.rs (see: https://doc.servo.org/style/)

Q: Do these changes need to be upstreamed?

Changelogs

I think it would great if we could start maintaining changelogs. Perhaps starting with stylo and dependencies (including cssparser). This is IMO a critical step in enabling community adoption of Servo libraries. Could this become part of the task of:

  • Submitting PRs to stylo
  • Syncing with upstream stylo

?

@mrego
Copy link
Member Author

mrego commented May 28, 2024

Update link in selectors repository

The old selectors repo (https://github.com/servo/rust-selectors) currently links to components/selectors directory in the servo/servo repo. We should update it to link to the servo/stylo repo.

This will require temporarily unarchiving the repository which I do not have permissions for. (the repository link on crates.io also needs updating, but this should happen as part of publishing stylo).

Can you report an issue about this? I don't think this need to be discussed on the meeting live. The issue can be reported in the general Servo projects.

@delan
Copy link
Member

delan commented May 28, 2024

What exactly is it proposed that we buy? Because the linked AX20 is only €55/month.

Sorry, that was a typo, it should read “Hetzner AX102”, which is 123.76 EUR/month plus 60.45 EUR/month for the necessary 16-core Windows Server Standard licence.

@nicoburns
Copy link

Can you report an issue about this? I don't think this need to be discussed on the meeting live

Done: servo/servo#32387 (I agree that this doesn't require discussion).

@delan
Copy link
Member

delan commented May 28, 2024

Are we losing anything in terms of concurrency? While this will complete individual runs faster, presumably it will have less capacity for concurrent runs?

This would have finite capacity, but we should be able to design our workflows to fall back to GitHub-hosted runners.

That said, I built the prototype with the assumption that we would allow at least two builds to run concurrently, giving runners up to half the CPU (8c16t out of 16c32t) and 24 GiB RAM. We would be limited to two concurrent Windows builds on this server due to the Standard licence (Datacenter is unlimited but $$$), but given how much of the Servo builds and unit tests are not actually 16-way parallel, we could almost certainly run a bunch of Linux builds plus two Windows builds at the same time by oversubscribing the CPU.

Might we consider doing the same for macOS in future? Scaleway offer reasonablish priced instances (although the benefit might not be a big as now that Github have apple silicon runners)

That would be a good next step imo.

@nicoburns
Copy link

Update link in selectors repository

This has now been done.

mrego added a commit that referenced this issue May 29, 2024
Signed-off-by: Manuel Rego Casasnovas <rego@igalia.com>
@mrego mrego closed this as completed in 376a2f5 May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tsc-meeting TSC Meeting
Projects
None yet
Development

No branches or pull requests

6 participants