Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring: move hex and hasher modules from util module to util crate #11429

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nrc
Copy link
Member

@nrc nrc commented Nov 27, 2022

This is a trivial code-motion refactoring, moving the hex and hasher modules from the util module to the util crate. This is preparatory work for some other refactoring I've been looking at to move part of fingerprinting into its own crate. I've kept this to its own PR since I think it is easier to review and is a change that there is no reason not to land (other than churn, I guess) even if we don't want the larger refactoring.

r? @arlosi

@rustbot
Copy link
Collaborator

rustbot commented Nov 27, 2022

Failed to set assignee to arlosi: invalid assignee

Note: Only org members, users with write permissions, or people who have commented on the PR may be assigned.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 27, 2022
@nrc nrc force-pushed the utils branch 2 times, most recently from 93734ea to 23c5ae5 Compare November 27, 2022 13:42
@ehuss
Copy link
Contributor

ehuss commented Nov 27, 2022

Can you say more about what your long term goals are? The fingerprint stuff is all pretty deeply internal to cargo, and not something that should expect any kind of stability. Also, parts of it will likely be radically changed in the coming year.

If we are to land something like this, the version of cargo-util needs to be bumped.

@nrc
Copy link
Member Author

nrc commented Nov 28, 2022

Can you say more about what your long term goals are? The fingerprint stuff is all pretty deeply internal to cargo, and not something that should expect any kind of stability. Also, parts of it will likely be radically changed in the coming year.

Long-long-term I'd like to make parts of Cargo more pluggable, so for example a distributed or shared cache could provide built components rather than having Cargo always provide them from its cache or by building. But that is a long way off and has a lot of questions. My current concrete long-term plan is to put fingerprinting in its own crate so that we can at least experiment with other programs using it. I imagine that to do so we'd have to match versions between Cargo and the fingerprint crate, so stability is not really an issue. And the API exposed would be pretty much 'build description' -> 'hash' so the internals of fingerprinting would not be exposed nor would they need to be stable, so I think this could all be done without really affecting Cargo's obligations (modulo whatever the radical changes are).

Signed-off-by: Nick Cameron <nrc@ncameron.org>
@bors
Copy link
Collaborator

bors commented Dec 2, 2022

☔ The latest upstream changes (presumably #11445) made this pull request unmergeable. Please resolve the merge conflicts.

@ehuss
Copy link
Contributor

ehuss commented Dec 10, 2022

Long-long-term I'd like to make parts of Cargo more pluggable, so for example a distributed or shared cache could provide built components rather than having Cargo always provide them from its cache or by building.

Although I agree that it would be nice to make Cargo more extensible, I'm unclear on this particular example. sccache already exists to provide a shared cache. Or you can use a shared TARGET directory. There are also other efforts already planned for a built-in shared cache (but which require several other steps to get there).

And the API exposed would be pretty much 'build description' -> 'hash' so the internals of fingerprinting would not be exposed nor would they need to be stable,

My concern here is that the change to fingerprint might not use hashes at all. Or if it did, it may change significantly. Also, the "build description" is currently a Unit, which requires including the majority of cargo. At that point, you may as well just link against the cargo crate. Can you say why linking with cargo isn't sufficient for doing experiments?

Or, perhaps another question, would you be interested in helping with the existing plans for having a shared cache?

so I think this could all be done without really affecting Cargo's obligations (modulo whatever the radical changes are).

It's not always a case of obligations, but the experience that users will have with third-party packages. If third-party packages frequently break, then users may not associate their frustration with that package, but instead with cargo, or with Rust overall. If we foster an environment where people are encouraged to use unstable interfaces, it can result in a more brittle ecosystem overall.

@weihanglo weihanglo added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 26, 2022
@nrc
Copy link
Member Author

nrc commented Feb 6, 2023

Sorry for the late reply. My hope was that this PR is harmless enough that we could merge it to better facilitate local experimentation and we can discuss the controversial stuff later. But diving in ...

Although I agree that it would be nice to make Cargo more extensible, I'm unclear on this particular example.

The motivation here is to integrate with other build systems which already have their own caching implemented. Having something built-in to Cargo or using sccache aren't alternatives because it is the integration with an existing cache rather than caching itself which is the requirement.

My concern here is that the change to fingerprint might not use hashes at all. Or if it did, it may change significantly. Also, the "build description" is currently a Unit, which requires including the majority of cargo. At that point, you may as well just link against the cargo crate. Can you say why linking with cargo isn't sufficient for doing experiments?

So this is all very experimental, so perhaps linking with Cargo will work, but my thinking is that external software would use the fingerprint crate to be able to make fingerprints which are compatible with Cargo's. Linking against all of Cargo might work, but I guess we'd need to make the API available at the least.

In terms of the Unit, my hope is that we could define some plain old data structure which contains all the info to be hashed into a fingerprint which could be created easily from a Unit or by another build system, and that would be the input to the fingerprint crate for generating a fingerprint.

Or, perhaps another question, would you be interested in helping with the existing plans for having a shared cache?

I'd love to find out more about them and if it we can make it work with what we want to do, then I'd be keen. Is there a design doc or similar I could look at?

@ehuss
Copy link
Contributor

ehuss commented Feb 7, 2023

I'd love to find out more about them and if it we can make it work with what we want to do, then I'd be keen. Is there a design doc or similar I could look at?

I don't think there is anything explicitly written down. There is a bit of a long journey:

  • Add support for last-use tracking. Having a cache that grows endlessly I think would be a bad experience. I have a branch with some experiments for first introducing sqlite for a database for tracking the registry cache to start getting some experience with it and to see if users have problems with it. This will introduce a new subcommand (something like cargo gc or cargo cache, or maybe just extend cargo clean, I'm not sure).
  • Extend the use of sqlite to the registry cache data itself (weihang did some experiments on this) to gather more experience and stress testing.
  • Add last-use tracking to target directory artifacts.
  • Rewrite the fingerprint tracking to use sqlite. This will allow more precise tracking of reasons for rebuilds among other things.
  • Figure out what to do about Reconsider RUSTFLAGS artifact caching. #8716. Changing RUSTFLAGS breaks the cache, and we don't have a solution.
  • Refactor the code to support multiple target directories.
  • Introduce a shared target directory that would keep dependencies that can be shared across projects.

The first step has been pretty high on my list to resurrect.

There are also be some big questions about whether or not this is the right direction, versus something more sophisticated (such as what sccache does, or other build tools). For example, to your point about using another system's cache, it may be appropriate to have some sort of pluggable system. But it's too vague for me to comment on, since I'm not really clear what that means. Like, there is some mechanism for storing files? And you would create some kind of conversion script that copies those into and out of the target directory? I think it would help to have the vision clearly written down with examples of actual build systems and how it could integrate.

@weihanglo
Copy link
Member

Convert this into draft at this moment. Will look into how it is useful in the long-term.

Please feel free to close it or do experiments on it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants