Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triggering rebuilds of dependency chains #44

Closed
jakirkham opened this issue Mar 3, 2018 · 17 comments
Closed

Triggering rebuilds of dependency chains #44

jakirkham opened this issue Mar 3, 2018 · 17 comments
Assignees
Labels
GSOC Google Summer of Code

Comments

@jakirkham
Copy link
Contributor

Admittedly this may not always make sense. However in case that it does, this would be very useful. Namely would be good to trigger rebuilds of downstream dependencies when an upstream dependency is rebuilt. As a simple example, oniguruma and jq.

@CJ-Wright
Copy link
Member

What is the best way to trigger a rebuild?
This sounds like it would need its own script. We could have a list of nodes who's downstreams need rebuilds.

@jakirkham
Copy link
Contributor Author

Presumably this would send a PR with a build number bump. It might also update the pinning, but I don't think this will matter for much longer (i.e. cb3 would make this step irrelevant).

Right was figuring we could already use the graph you have to solve this problem.

Yeah it would probably be a new script. Admittedly we might only care about doing this for things that require dependencies from our central pinning file at build time. In fact, it might be best to only look at changes in the central pinning file to determine when a rebuild needs to occur.

@CJ-Wright
Copy link
Member

Hmm ok, I think I get it.

We could have a flag in some nodes which notes that when the node is version bumped then all downstream nodes need a build bump (we could pull this from central pinning).
The flag defaults to false and is flipped to true when it is version bumped (and the version bump is accepted).
Another script follows behind and find all the nodes with a central pin where the flag is true and then issues build number bumps to all downstream packages, then flips the flag back to false.

We might something to note that we don't need a build number bump if a downstream package is getting a version bump itself (since that should trigger a rebuild).

@jakirkham
Copy link
Contributor Author

Right. Something like that sounds good.

Yeah, starts to overlap with issue ( #16 ).

Should add that re-rendering will be needed as part of this process as that is how we update the dependencies.

cc @isuruf

@CJ-Wright
Copy link
Member

How often do we want to do something like this? Every pinning update? If we change the gcc pinning how much of the graph are we prepared to rebuild?

@jakirkham
Copy link
Contributor Author

My suspicion is that this is going to be painful in the beginning. Partly because some things weren't using the latest recommended pinnings (i.e. in the old pinning script) and partly because the latest recommended pinnings are a bit outdated.

That said, I think once we get in the swing of things (much like version updates), this will get more manageable. Mainly because we won't be so badly out-of-date. Also because there are some things we don't need to be as sensitive to when they update thanks to conda-build 3. There are of course things that are vital to our stack, but also have weak ABI guarantees (if any), which will be painful to update. Though I think we will get a better sense of what these are as we proceed and how often they are of a concern.

One of the things that @pelson did a great job of in the old bot, was he made sure to hold off on doing such dependency update PRs until evenings and weekends. Also he made sure to do these in small batches. The net result is we could stay on top of pinning updates, but not suffer too badly from build worker backlog.

As we proceed here, would expect that we can reuse that strategy with some tweaks. Namely we can do a pass through one level of the topological order for a given pinning in a batch. Now we may discover we need to constrain that a bit further (e.g. some magic upper limit on PRs in a batch that keeps things sane). Would add that doing this continuously should help avoid the backlog getting too extreme. Overall this is probably a good first approximation of what will need to happen.

Does this align with the sorts of things you were already thinking? Other thoughts/concerns?

@justcalamari
Copy link
Contributor

@jakirkham This is similar to what we were thinking. The plan is to identify the levels in a topological sort and PR to the feedstocks in a level simultaneously. The difficulty here is that our graph is not acyclic, so we cannot actually do a topological sort. One way to get around this is to remove certain edges that break cycles, but it is not obvious to me which edges should be removed and it is possible that removing edges gets rid of paths that should still be in the graph.

The idea instead is to find the length of the longest path from the root node (the package in central pinning) to every other node and treat that length as the level of a node, since feedstocks with the same longest path length cannot depend on each other. My solution to find the longest path is to do a pseudo topological sort using DFS starting from a node in central pinning where the leaf nodes are nodes with either no outgoing edges or all of whose outgoing edges complete a cycle. This will give a topological sort of the graph where all nodes are reachable from the root node. Using this ordering we can find the length of the longest path to each node. I think that using this method all nodes that are not part of a cycle will be rebuilt in the correct order.

@justcalamari
Copy link
Contributor

Here is a list with the number of feedstocks at each level in the subgraph with descendants of toolchain
0: 1,
1: 57,
2: 13,
3: 29,
4: 7,
5: 5,
6: 3,
7: 1,
8: 4,
9: 170,
10: 87,
11: 447,
12: 335,
13: 305,
14: 253,
15: 378,
16: 282,
17: 171,
18: 762,
19: 428,
20: 254,
21: 197,
22: 109,
23: 85,
24: 36,
25: 34,
26: 27,
27: 23,
28: 8,
29: 4

I'm not sure how many packages we can rebuild at a time, but it looks like we will need to split some of these levels into smaller batches.

@scopatz
Copy link
Contributor

scopatz commented May 30, 2018

Wow! I wonder what is going on at level 7.

@CJ-Wright
Copy link
Member

It would be great to also have some sort of human readable output of the transition. We can't do everything at once and so having something that tells the maintainers which packages have been migrated, which are currently being migrated (open PR), and which are still pending would be helpful.
Inspiration? https://release.debian.org/transitions/

@CJ-Wright
Copy link
Member

@justcalamari would it be possible to recompute #44 (comment) but a) only have direct inheritors, b) include packages with the new syntax in the listing?

@justcalamari
Copy link
Contributor

Packages that have migrated to the new syntax are no longer descendants of toolchain, since the requirement is now *_compiler_stub. Its hard to look at descendants of compiler_stub since those aren't feedstocks and thus aren't in the graph. We may want to give the bot some special behavior when it finds compiler_stub in the requirements.

@CJ-Wright
Copy link
Member

CJ-Wright commented Aug 6, 2018

As an update for this:

  • We are able to rebuild dependency chains (see the compiler migrations that will happen soon).
  • The process is still a bit manual since the bot needs to take in the full subgraph to properly run the rebuilds in order.
  • Ideally the next step would be to have a system which watches conda-forge-pinning and knows
    1. What package(s) changed during a pinning update
    2. What subgraph(s) needs to get processed

This way the bot will automatically know what to do when the pinnings get updated.

@CJ-Wright
Copy link
Member

From discussion with @mariusvniekerk

Write something into smithy which writes a file into the feedstock upon rerender. The file will contain the pinned packages and the pinned version. This will allow us to gather the files and the data needed to determine if a rerender PR needs to be issued.

With that data in hand we can then produce the map between the pinned packages and their children allowing for easy PR issuing.

@CJ-Wright
Copy link
Member

It seems that we have the packages with pins: https://github.com/regro/libcfgraph/blob/master/artifacts/arrow-cpp/conda-forge/linux-64/arrow-cpp-0.11.1-py27h0e61e49_1004.json#L115

We can then compare the pins in that file against our current pinnings repo, truncate the pins by the pin precision and then check min and max pins to see if the package needs to be rebuilt.

@mariusvniekerk
Copy link
Contributor

So the general idea is to use these stored pinnings that are inherent in the artifacts and compare them what our pinnings file is at present. If there is a diff we issue a pr. This has some difficult bits around dealing with arch variants but is probably close enough to usable.

We also only have the conda_build_config section for things that were relatively recently built.

@CJ-Wright
Copy link
Member

this exists now (mostly)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GSOC Google Summer of Code
Projects
None yet
Development

No branches or pull requests

5 participants