Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix-store: implement --query --unsubstitutable #7526

Closed
wants to merge 2 commits into from

Conversation

nrdxp
Copy link
Contributor

@nrdxp nrdxp commented Dec 29, 2022

Resolves #3946 by providing an efficient means of querying cache status of the passed paths.

For every path passed, it will be returned back on standard output only if it is not available in any of the configured substituters.

A threadpool is used to make this as quick as possible.

Included in the documentation is a sample command showing how users can now trivially capture any and all uncached build & runtime dependencies of a given derivation. This information is highly useful in CI environments, for example.

Only caveat is that we might hit the OS arg limit until #7437 is fixed.

TODO

  • implement nix-store query
  • write tests

Checklist for maintainers

Maintainers: tick if completed or explain if not relevant

  • agreed on idea
  • agreed on implementation strategy
  • tests, as appropriate
    • functional tests - tests/**.sh
    • unit tests - src/*/tests
    • integration tests - tests/nixos/*
  • documentation in the manual
  • code and comments are self-explanatory
  • commit message explains why the change was made
  • new feature or bug fix: updated release notes

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-correctly-cache-build-time-dependencies-using-nix-haskellforall-com/22726/5

@mlyxshi
Copy link

mlyxshi commented Dec 29, 2022

Are you interested in this relative issue?
#7527

@nrdxp
Copy link
Contributor Author

nrdxp commented Dec 29, 2022

Are you interested in this relative issue? #7527

You could use this PR to accomplish your goal by simply processing your derivations through nix-store -qU (see the example I added in the doc for how to get build & runtime dependencies) before sending them to nix copy. As Sandro rightly pointed out though, it will still result in a few redundant dependencies, since if the entire dependency closure does not exist in a single cache, Nix throws an error. To improve the situation we just have to allow Nix to continue instead of erroring in this situation, and you could then just pass a --no-recursive to nix copy and cache only those that don't exist elsewhere.

In general, my recent experience writing a Nix-centric CI has kind of changed my perspective somewhat, and I'm starting to feel like we should take a more "git-like" approach to our cli, rather than replacing the older, more verbose commands outright, we should stabalize them as much as possible and use them as the "plumbing" to the higher level commands.

@nrdxp
Copy link
Contributor Author

nrdxp commented Dec 30, 2022

Because there isn't really a nice standalone solution for this atm, I went ahead and extracted the logic as a standalone tool for now until this is accepted:
https://github.com/divnix/nix-uncached

@fricklerhandwerk fricklerhandwerk added feature Feature request or proposal cli The old and/or new command line interface labels Dec 30, 2022
@edolstra
Copy link
Member

I think functionality like this should be added to the new CLI rather than the legacy CLI. It seems to me that what we really want here is a CLI interface to Store::queryMissing() (ideally with JSON output).

Also "uncached" is a vague term. What does it mean for a path to be uncached? Maybe "unsubstitutable" is better.

@nrdxp
Copy link
Contributor Author

nrdxp commented Dec 30, 2022

I think functionality like this should be added to the new CLI rather than the legacy CLI. It seems to me that what we really want here is a CLI interface to Store::queryMissing() (ideally with JSON output).

I can add it to nix path-info I think, fairly easily. Are these "legacy" commands intended to be completely removed at some point? My first pass I did use JSON, but it seemed a bit superflous. Since nix path-info already has a json flag though, I can probably include this information as an extra boolean in that output when passed.

Even so, nix path-info doesn't seem to have an equivalent atm for nix-store -qR --include-outputs, so I would still need that command to forward to the list to check in nix path-info.

Also, my first implementation did just use Store::queryMissing(), but that isn't exactly what I wanted, since I don't care if the path exists locally or not, I just want to know if it exists in a substituter (so it can be sent to nix copy to add it there, for example). If you want to include the local store as part of the equation you can add --extra-substituters auto to the invocation to make that explicit.

Also "uncached" is a vague term. What does it mean for a path to be uncached? Maybe "unsubstitutable" is better.

Yeah I was thinking that myself, for now I just used the terminology that has been used elsewhere in the community so far, but I was also thinking of using unsubstitutable instead just last night, only thing is that it's quite long, although I guess with a short flag it would still be fine.

@nrdxp
Copy link
Contributor Author

nrdxp commented Dec 31, 2022

@edolstra, I implemented the same logic for nix path-info, including allowing it the --json to add a new boolean output showing whether a path is cached or not.

However there are a few downsides to the nix path-info implementation compared to the nix-store one so far. In order to explain that, understand that my main usecase for this is a novel CI system which querys information about the paths before they have been built, to decide if they need to be built, etc. I acquire the paths I need from a flake using nix eval which produces a special json we then want to manipulate based on information we are going to query, like the cache status.

Importantly, I only want to run evaluation once, and only once, for a given CI run, so passing flake refs around isn't desirable as they may induce additional evaluation. Instead we query information on the derivations directly. This just doesn't work well with nix path-info.

With the nix-store implementation it works perfectly, I can just query the derivation with nix-store -qR --include-outputs and then filter that through my new nix-store -qU and I've got the information I need without having to build a thing, minus some additional post-processing which can be easily done with jq and nix show-derivation (to remove locally built derivations from the equation, etc).

Specifically these are the quircks I've encountered with the nix path-info implementation so far:

  • as already mentioned, there is no equivalent to the --include-outputs flag, so its either querying buildtime or runtime deps, not both.
  • If I pass a derivation (which is all I have), then the command will not return information about it's outpath. I have to explicitly pass the outpath for that, but that is a non-starter for my usecase since I don't have them built yet (and don't want to necessarily). This means there is no way to mimic the behavior of --include-outputs by simply calling nix path-info twice (with and without --derivation) since passing an outpath that doesn't exist just induces an error.

For these reasons I propose we keep both implementations until both are more in-line, and as mentioned in my previous comment, we may want to consider a more "plumbing" and "porceline" approach for flakes 2.0, to keep drift between the old and new commands from becoming a big issue.

I like the low-level control the legacy commands give me when I need that. Say in a complex CI system that focuses on doing as little work as possible, but I like the higher level commands for ease of use when working on my own system. I think there is an argument to be made for keeping both around and taking implementations from the lower-level commands and reusing them for their flake counterparts.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/what-would-you-like-to-see-improved-in-nix-cli-experience/24012/58

clone["substitute"] = !infos.empty();

auto jsonList(jsonList_.lock());
jsonList->emplace_back(clone);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to really emplace it at the back without unneeded copies i suggest this:

Suggested change
jsonList->emplace_back(clone);
jsonList->emplace_back(std::move(clone));

}

std::function<void(nlohmann::json::basic_json::value_type)> checkSubs;
checkSubs = [&](const nlohmann::json::basic_json::value_type & value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same suggestion with the nested lambda expression here. all the type info is relatively noisy and can be reduced.

auto clone = value;
std::string storePathS = value["path"];
auto info = store->queryPathInfo(store->followLinksToStorePath(storePathS));
auto storePath = info->path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like some const ... & could help the reader understand what really needs to be a (mutable) copy and what not, as well as reduce some redundant copying.

Sync<std::set<std::string>> results_;

std::function<void(StorePath)> printPath;
printPath = [&](const StorePath & storePath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the parameter type from the std::function and the actual assignment differ by having and not having const ... & qualifiers. i suggest using the same nested lambda expression approach here, too.

}

std::cout << std::endl;
auto results(results_.lock());
results->emplace(storePathS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using std::move will help avoiding a needless copy.

Comment on lines 187 to 190
auto results(results_.lock());
for (auto & str : *results) {
std::cout << std::endl << str;
}
Copy link
Contributor

@tfc tfc Dec 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::endl means to emit a new line character and flush the terminal. But this code suggests that you mean to separate all strings by newlines and then flush only once at the end, so i suggest this:

Suggested change
auto results(results_.lock());
for (auto & str : *results) {
std::cout << std::endl << str;
}
for (const auto & str : *results_.lock()) {
std::cout << '\n' << str;
}
std::cout << std::endl;

@nrdxp nrdxp marked this pull request as draft January 1, 2023 00:31
@nrdxp
Copy link
Contributor Author

nrdxp commented Jan 1, 2023

Thanks for the review, I'm moving to draft for now as the implementation needs work. After some initial benchmarks, the threadpool didn't actually help anything and is about the same as using a single thread, even when setting the threadpool limit to something super high like 1000.

@nrdxp nrdxp marked this pull request as ready for review January 10, 2023 17:33
@nrdxp
Copy link
Contributor Author

nrdxp commented Jan 10, 2023

Taking out of draft as after some testing it seems the performance issues already existed and were not introduced by me: #7575

I will go ahead and continue my effort here, address the review comments, and possibly simplify the implementation by just removing the threadpool unless/until we find a solution to the performance issues. That should probably be handled in a separate PR though anyway.

@nrdxp nrdxp changed the title nix-store: implement --query --uncached nix-store: implement --query --unsubstitutable Jan 11, 2023
@nrdxp nrdxp force-pushed the uncached branch 2 times, most recently from 6d692d5 to 65e57dd Compare January 11, 2023 22:36
Resolves NixOS#3946 by providing an efficient means of querying cache status
of the passed paths.

For every path passed, it will be returned on standard output only if it
is not available in any of the configured substituters.

StorePathSet storePaths;
for (auto const & str : opArgs) {
storePaths.emplace(store->followLinksToStorePath(str));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation here and below is wrong (should be 4 characters per level).

}

for ( auto & path : storePaths) {
if (substitutablePaths.find(path) == substitutablePaths.end())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (substitutablePaths.find(path) == substitutablePaths.end())
if (!substitutablePaths.count(path))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not short-circuit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way? On an std::map it should be equally efficient.

But since we're fine with C++20 now, you can also use substitutablePaths.contains(path).

@tomberek
Copy link
Contributor

adding to make the cross-link easier to find: #7587

@nrdxp
Copy link
Contributor Author

nrdxp commented Feb 11, 2023

closing in favour of the aforementioned PR

@nrdxp nrdxp closed this Feb 11, 2023
@tomberek tomberek reopened this Feb 11, 2023
@tomberek
Copy link
Contributor

Miss click

@tomberek tomberek closed this Feb 11, 2023
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-01-20-nix-team-meeting-minutes-25/25432/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli The old and/or new command line interface feature Feature request or proposal
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

support building only drvs that lack substitutes (aka nix-build-uncached)
8 participants