Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash.apply provides the same hash for all directories #339

Open
Baccata opened this issue Oct 6, 2022 · 2 comments
Open

Hash.apply provides the same hash for all directories #339

Baccata opened this issue Oct 6, 2022 · 2 comments

Comments

@Baccata
Copy link

Baccata commented Oct 6, 2022

See here.

When a file is a directory, you get a FileNotFoundException when instantiating a FileInputStream with it. This makes it harder to write SBT tasks that need to invalidate some cache upon a file being added/removed from a directory.

@eed3si9n
Copy link
Member

eed3si9n commented Oct 6, 2022

Not sure if I have the full context on what situation would require directory hash, but as a reference Bazel also behaves in a similar way wherein if you use directory as input or output you get a warning saying "we're not gonna invalidate correctly for directory" or something along the lines.

See also https://www.scala-sbt.org/1.x/docs/Caching.html#Tracking+file+attributes and other example regarding caching.

@Baccata
Copy link
Author

Baccata commented Oct 20, 2022

Sorry for the late reply.

Not sure if I have the full context on what situation would require directory hash.

There's various reasons why you'd want it. Smithy4s, a library I maintain, abstracts the business logic away from SBT, as we also provide a CLI and a Mill plugin. Therefore, we essentially have a CodegenArgs => Seq[GeneratedFiles] function that build tools can delegate to.

Now the thing is that we want to use whole CodegenArgs datatype as a a cache key, because a change in any of the data it contains may impact the code rendering, and therefore any change should invalidate the cache. This CodegenArgs datatype contains a bunch of things, including reference to files that may be indiscriminately files or folders. Because it doesn't JUST contain files, FileFunction doesn't cut it for us.

We naively thought files via the functions provided by SBT would hash folders, like it happens in Mill's PathRef, turns out it wasn't the case and we had to take action, after a couple hours of debugging what was very puzzling behaviour.

Now whether Mill's way of hashing folders is actually desirable in SBT is not my place to say, but silently accepting folders and recovering by hashing an empty string is definitely not great, as it very much gives a false-positive behaviour. I think throwing an exception would be better, as at least the implementor could be made aware (via testing) of the problem, and take action. Logging a warning would be also acceptable, though easier to miss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants