Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a minimal CI that validates tree before merge #56674

Closed
rengolin opened this issue Jul 22, 2022 · 23 comments
Closed

Create a minimal CI that validates tree before merge #56674

rengolin opened this issue Jul 22, 2022 · 23 comments
Assignees
Labels
infrastructure Bugs about LLVM infrastructure

Comments

@rengolin
Copy link
Member

This isn't necessary about passing tests or repeating the buildbot infrastructure, but about making sure the merge works across all projects.

The LLVM monorepo is a collection of sub-projects that not all people work together and most of us don't build all projects even during the tougher tests we do. Having a simple build of all components and some sanity checks on CMake structure (ex. a mock project using LLVM, like a standalone MLIR dialect, or a clang-tool using the CMake stuff) would go a long way of ensuring that at least we didn't break anything obvious for other people with our commits.

We should also make checks on license headers, contribution agreements or whatever legal stuff that is too easy to forget about and too easy to automate as a pre-commit CI in GH.

@EugeneZelenko EugeneZelenko added infrastructure Bugs about LLVM infrastructure and removed new issue labels Jul 22, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 22, 2022

@llvm/issue-subscribers-infrastructure

@joker-eph
Copy link
Collaborator

Would porting the existing pre-merge CI enough? Or are these things missing from what it does right now?

@rengolin
Copy link
Member Author

rengolin commented Aug 2, 2022

I'm not sure. The current CI fails more often than it should and it still doesn't cover much. If we're going to have a pre-commit CI that blocks merge, we need it stable, fast and with good enough coverage.

I don't know what the coverage of the current tests are, but we should try to at least build all projects (probably multiple bots).

I think first step would be to copy the infra and see what it does, and then when we use PRs, we can make them mandatory once it's good enough.

@joker-eph
Copy link
Collaborator

Right now the pre-merge CI should build everything in the monorepo, what missing coverage are you thinking of? Things like the external test-suite repo?

@rengolin
Copy link
Member Author

rengolin commented Aug 2, 2022

Does it build all projects in the most common configurations? Do we care about different build flags? Do we want to run at least some simple testing? Do we want to separate "simple" from "non-simple" in the tests to allow that? If we run tests, do we add more architectures?

The point here isn't just "how we do what we already do" but "what do we actually want to achieve and how do we make it work stable enough so that it doesn't break our reviews all the time".

Most projects can be built at the same time (like we do on the release testing), but others need some special sauce (like runtime libraries. If we just care about building, we shouldn't care about asserts or optimised builds, though we also don't want debug symbols, so I imagine it's just -O1 or something.

Building the test-suite makes no sense, unless we do that with the compiler that we just produced. If we're going to do that, then we'll either need a long build or chained bots, artifacts, etc. Neither are simple to have as pre-merge CI.

I think my point is simply that, if we're going to have a pre-merge CI, we need to make sure developers don't have to keep hitting the build button and hope "this time" it'll work. The current CI has too many spurious failures for that to work at scale.

@tstellar
Copy link
Collaborator

Maybe we should start by choosing how long we want the minimal CI test to run and then figure out how much we can do in that in that amount of time. How long does the current pre-merge CI take?

@joker-eph
Copy link
Collaborator

joker-eph commented Oct 26, 2022

What is the criteria: clock time or cpu time? Because clock time depends on how large of a machine you can afford :)

Also, the current pre-merge CI does not build/test always the same thing: it'll decide what to do depending on the paths changed (and so the time will vary vastly if you touch LLVM itself or only MLIR for example).
Time will also heavily depend on how much reuse of the cache there is...

@tstellar
Copy link
Collaborator

I was thinking clock time.

@metaflow
Copy link
Contributor

metaflow commented Oct 26, 2022 via email

@rengolin
Copy link
Member Author

The free Github builders can take many hours to build and test LLVM+MLIR+Clang, so I'm going to assume we won't even be trying those. But overall, I wouldn't worry about setting a particular time constraint.

It strongly depends on the amount of requests (PR changes) and the level of parallelism. In a nutshell, the pre-commit CI has to "cope with" (ie. not pile up) every change in every PR. This is a standard "buildbot scaling issue", bundled with reviews and approvals issues. If the time that it takes to review is slower than the time the pre-merge tests take, we're good. If not, we may want to enable "merge-after-CI-passing" in Github, which is very convenient.

I also would strongly recommend we do some kind of pruning, for example, only adding to the list of projects to build IFF the commit touches those sub-directories in the monorepo, including documentation (ie avoid building LLVM at all on a docs-only change). Not only this reduces the load on testing, but also avoids unrelated test instability to harm irrelevant commits.

@ldionne
Copy link
Member

ldionne commented Aug 24, 2023

I am currently setting up pre-commit CI for Clang and libc++ on top of Github pull requests (using the existing BuildKite infrastructure for the time being). I would suggest that each sub-project might want to handle setting up the pre-commit CI that makes sense for it -- in the past I believe this has been one reason why the Phabricator pre-commit CI was so flaky: it wasn't owned by the specific sub-projects.

@joker-eph
Copy link
Collaborator

Phab precommit is/was frequently partially broken because there is not post-commit CI on it. So no revert of commits breaking it when the breakage is not covered by a fast build bot. The only way to keep a config "green" is a post-merge bot tracking the exact same config with folks paying attention to revert on breakage.

@rengolin
Copy link
Member Author

Phab precommit is/was frequently partially broken because there is not post-commit CI on it. So no revert of commits breaking it when the breakage is not covered by a fast build bot. The only way to keep a config "green" is a post-merge bot tracking the exact same config with folks paying attention to revert on breakage.

That's a weird state. I assumed those builds were always a fresh main clone plus the changes on a fresh new build.

The simplest way to track pre-commit changes is to build with main + changes, regardless of what breaks upstream. If upstream is broken, so is CI.

The slightly less simple is to build main + changes and if there are any breakages, also build main and see if they also break on main on the exact same way. Not marking as green, but at least orange or yellow.

@metaflow
Copy link
Contributor

The only way to keep a config "green" is a post-merge bot tracking the exact same config with folks paying attention to revert on breakage.

I like this idea, will keep that in mind when adding checks for PRs! Would not agree that it's "the only way" as we can ask people to always use PRs / submit queue to make sure that very basic things are keep working.

@metaflow
Copy link
Contributor

That's a weird state. I assumed those builds were always a fresh main clone plus the changes on a fresh new build.

yes, that was the case. The issue @joker-eph is talking about is that if someone has broken a test for CI then they will not be bothered by emails about that.

@rengolin
Copy link
Member Author

yes, that was the case. The issue @joker-eph is talking about is that if someone has broken a test for CI then they will not be bothered by emails about that.

Should be fine when we move to Github PRs and make the CI passing a requirement for merging.

@metaflow
Copy link
Contributor

metaflow commented Aug 24, 2023

re @ldionne

I would suggest that each sub-project might want to handle setting up the pre-commit CI that makes sense for it -- in the past I believe this has been one reason why the Phabricator pre-commit CI was so flaky: it wasn't owned by the specific sub-projects.

yes, that would be great! Though I have only recieved any request for custom setup from libc++ folks (you), other projects were silent. I think that a more practical way would be to provide some "default" (e.g. llvm;clang;milr... + ninja check-all) and instructions for individual projects on how they can tailor CI for specific needs + some guildelines on resource usage / time constraints.

@rengolin
Copy link
Member Author

I think even in the event all projects have their own separate CI we still need wider scope ones (shared amongst all projects), in case we miss something that may affect other projects. LLVM to anything, libs to Clang, etc. MLIR is still reasonable separated, but will soon have a strong dependency with Flang/Clang once they start using MLIR as their post-AST IR.

@joker-eph
Copy link
Collaborator

That's a weird state. I assumed those builds were always a fresh main clone plus the changes on a fresh new build.

That's what they were, but the combination of cmake flags + host compiler + OSes don't have an equivalent in buildbot. So "main" was broken, but no one could know where it would come from, as it only shows up during pre-merge.

make the CI passing a requirement for merging.

As long as we don't take away direct git push :)

@ldionne
Copy link
Member

ldionne commented Aug 30, 2023

I've now set up "minimal" CI that mirrors the premerge checks we had in Phabricator. They work the same as they used to, basically they figure out the set of changed projects, compute some dependency related stuff (e.g. modifying MLIR will cause Flang to be tested as well) and then they run Debian + Windows CI.

IMO figuring this out implicitly should be a temporary state as individual projects figure out what testing makes sense for them, but clearly this can be discussed given that not everyone agrees.

I'd be tempted to close this issue as resolved since I think this qualifies as "minimal CI". Any objections?

@ldionne ldionne self-assigned this Aug 30, 2023
@ldionne
Copy link
Member

ldionne commented Aug 30, 2023

Closing, we can re-open if there is discussion!

@ldionne ldionne closed this as completed Aug 30, 2023
@joker-eph
Copy link
Collaborator

Can you link to it?

metaflow added a commit to metaflow/llvm-zorg that referenced this issue Sep 7, 2023
As was suggested llvm/llvm-project#56674 (comment)
that adds a configuration that should match build run by GitHub pull
requests.

We are going to set up initially one worker per windows / linux
architecture. It's unlikely that windows one will be able to catch up
with every commit but we will monitor it and decide if e.g. adding one
more worker might actually be fast enough to build every change.
metaflow added a commit to llvm/llvm-zorg that referenced this issue Sep 22, 2023
As was suggested llvm/llvm-project#56674 (comment)
that adds a configuration that should match build run by GitHub pull
requests.

We are going to set up initially one worker per windows / linux
architecture. It's unlikely that windows one will be able to catch up
with every commit but we will monitor it and decide if e.g. adding one
more worker might actually be fast enough to build every change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Bugs about LLVM infrastructure
Projects
Development

No branches or pull requests

8 participants