New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triggering rebuilds of dependency chains #44
Comments
What is the best way to trigger a rebuild? |
Presumably this would send a PR with a build number bump. It might also update the pinning, but I don't think this will matter for much longer (i.e. cb3 would make this step irrelevant). Right was figuring we could already use the graph you have to solve this problem. Yeah it would probably be a new script. Admittedly we might only care about doing this for things that require dependencies from our central pinning file at build time. In fact, it might be best to only look at changes in the central pinning file to determine when a rebuild needs to occur. |
Hmm ok, I think I get it. We could have a flag in some nodes which notes that when the node is version bumped then all downstream nodes need a build bump (we could pull this from central pinning). We might something to note that we don't need a build number bump if a downstream package is getting a version bump itself (since that should trigger a rebuild). |
How often do we want to do something like this? Every pinning update? If we change the |
My suspicion is that this is going to be painful in the beginning. Partly because some things weren't using the latest recommended pinnings (i.e. in the old pinning script) and partly because the latest recommended pinnings are a bit outdated. That said, I think once we get in the swing of things (much like version updates), this will get more manageable. Mainly because we won't be so badly out-of-date. Also because there are some things we don't need to be as sensitive to when they update thanks to One of the things that @pelson did a great job of in the old bot, was he made sure to hold off on doing such dependency update PRs until evenings and weekends. Also he made sure to do these in small batches. The net result is we could stay on top of pinning updates, but not suffer too badly from build worker backlog. As we proceed here, would expect that we can reuse that strategy with some tweaks. Namely we can do a pass through one level of the topological order for a given pinning in a batch. Now we may discover we need to constrain that a bit further (e.g. some magic upper limit on PRs in a batch that keeps things sane). Would add that doing this continuously should help avoid the backlog getting too extreme. Overall this is probably a good first approximation of what will need to happen. Does this align with the sorts of things you were already thinking? Other thoughts/concerns? |
@jakirkham This is similar to what we were thinking. The plan is to identify the levels in a topological sort and PR to the feedstocks in a level simultaneously. The difficulty here is that our graph is not acyclic, so we cannot actually do a topological sort. One way to get around this is to remove certain edges that break cycles, but it is not obvious to me which edges should be removed and it is possible that removing edges gets rid of paths that should still be in the graph. The idea instead is to find the length of the longest path from the root node (the package in central pinning) to every other node and treat that length as the level of a node, since feedstocks with the same longest path length cannot depend on each other. My solution to find the longest path is to do a pseudo topological sort using DFS starting from a node in central pinning where the leaf nodes are nodes with either no outgoing edges or all of whose outgoing edges complete a cycle. This will give a topological sort of the graph where all nodes are reachable from the root node. Using this ordering we can find the length of the longest path to each node. I think that using this method all nodes that are not part of a cycle will be rebuilt in the correct order. |
Here is a list with the number of feedstocks at each level in the subgraph with descendants of I'm not sure how many packages we can rebuild at a time, but it looks like we will need to split some of these levels into smaller batches. |
Wow! I wonder what is going on at level 7. |
It would be great to also have some sort of human readable output of the transition. We can't do everything at once and so having something that tells the maintainers which packages have been migrated, which are currently being migrated (open PR), and which are still pending would be helpful. |
@justcalamari would it be possible to recompute #44 (comment) but a) only have direct inheritors, b) include packages with the new syntax in the listing? |
Packages that have migrated to the new syntax are no longer descendants of toolchain, since the requirement is now |
As an update for this:
This way the bot will automatically know what to do when the pinnings get updated. |
From discussion with @mariusvniekerk Write something into smithy which writes a file into the feedstock upon rerender. The file will contain the pinned packages and the pinned version. This will allow us to gather the files and the data needed to determine if a rerender PR needs to be issued. With that data in hand we can then produce the map between the pinned packages and their children allowing for easy PR issuing. |
It seems that we have the packages with pins: https://github.com/regro/libcfgraph/blob/master/artifacts/arrow-cpp/conda-forge/linux-64/arrow-cpp-0.11.1-py27h0e61e49_1004.json#L115 We can then compare the pins in that file against our current pinnings repo, truncate the pins by the pin precision and then check min and max pins to see if the package needs to be rebuilt. |
So the general idea is to use these stored pinnings that are inherent in the artifacts and compare them what our pinnings file is at present. If there is a diff we issue a pr. This has some difficult bits around dealing with arch variants but is probably close enough to usable. We also only have the |
this exists now (mostly) |
Admittedly this may not always make sense. However in case that it does, this would be very useful. Namely would be good to trigger rebuilds of downstream dependencies when an upstream dependency is rebuilt. As a simple example,
oniguruma
andjq
.The text was updated successfully, but these errors were encountered: