Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchSources is redundant and not precise #3718

Closed
OlegYch opened this issue Nov 6, 2017 · 21 comments
Closed

watchSources is redundant and not precise #3718

OlegYch opened this issue Nov 6, 2017 · 21 comments
Labels
area/watch Issues around ~ (triggered execution, watch) uncategorized Used for Waffle integration
Milestone

Comments

@OlegYch
Copy link
Contributor

OlegYch commented Nov 6, 2017

This is more of an idea than an issue.

Recent issues with ~ like #3695 and #3501, and continuing performance problems with change detection (although slightly alleviated by NIO watching and upcoming enhancements in hashing) have led me to think that watchSources needs to go.

Instead ~ has to deduce the set of files consumed by each task and feed the changes to the task.
One way to do that might be to track files accessed by each task in a bit of global mutable in-memory state - ie each task would be given a (cache: (File => (File => T) => T), modified: Set[File]). This state intentionally would not survive sbt restart, but imho this is ok at the moment, though it would be nice to unify that with existing caching facilities.

Perhaps there are better ways, but this was the first thing i could imagine and i'd like to receive some feedback on this

@ShaneDelmore
Copy link

I use watchSources to depend on more than what is referenced during compilation. For example, I may have test oracles files I want to compare output too, zipped files, json files, things like that, and when my code changes I want to rerun my tests, but also if any of my test oracles change I want to rerun my tests. Isn't this what watchSources is designed for?

@OlegYch
Copy link
Contributor Author

OlegYch commented Nov 6, 2017

@ShaneDelmore good point
run and test depend on products which reads config files, and you could make it depend on other files which are only read at runtime, just like you do with watchSources, so that usecase should be well covered

@ShaneDelmore
Copy link

Just to make sure I am understanding this, is the proposed change to add test oracles to products instead of watchSources to enable ~test to catch changes to test resources?

@OlegYch
Copy link
Contributor Author

OlegYch commented Nov 7, 2017

@ShaneDelmore any of the tasks that all of test testQuick and testOnly depend on, Test / products might be a good option
the proposed change is not limited to tests though, but rather to any task that reads files
currently watchSources has no idea what task it's used for, and therefore is either too eager or misses stuff and cannot be used for incremental change detection

@ShaneDelmore
Copy link

In that case, I see your point but fear the outcome. I imagine conversations like these:

NewSbtUser: I want to make ~test rerun tests when I change a file, how do I do it?
SbtExpert: Add the files to watchSources.
NewSbtUser: That makes sense, thanks!

Vs
NewSbtUser: I want to make ~test rerun tests when I change a file, how do I do it?
SbtExpert: Add the files to any that test, testQuick, and testOnly depend on.
NewSbtUser: Huh?

I’m trying to see how this conversation ends with a happy user. I’m not saying veto it for that reason alone, but hopefully we at least have a good answer for this.

@eed3si9n eed3si9n added the area/watch Issues around ~ (triggered execution, watch) label Jun 29, 2018
@eed3si9n eed3si9n added the uncategorized Used for Waffle integration label Sep 18, 2018
@eed3si9n eed3si9n removed the x/waffle label Nov 8, 2018
@eatkins
Copy link
Contributor

eatkins commented Nov 9, 2018

@OlegYch I am not sure that your suggestion is really practical. I can't imagine an implementation that wouldn't require adding a wrapper for every task that reads a file and much of that file consumption may be downstream from sbt. I do, however, think that watchSources should be better scoped than they currently are. For example, suppose that project foo depends on project bar but only for the compile configuration, if you touch a bar test source file, it will trigger a rebuild in ~foo/compile which doesn't make sense.

In general, this is really hard to get right without manual intervention. For example, as a simple strawman, suppose that we reworked things so that there was a single task:
val allSourceFiles = taskKey[Seq[Source]]("All of the compilation source files")
that was defined as

allSourceFiles := Seq(
  Source(unmanagedSourceDirectories.value, "*.scala" | "*.java", NothingFilter),
  Source(managedSourceDirectories.value, "*.scala" | "*.java", NothingFilter)
)

We still couldn't reuse this task for watchSources because the managed source files are updated during compilation, which would cause them to trigger a build, leading to an infinite loop. I believe that the cache of modified files would also suffer from this problem. The only way to fix that would be to add a bunch of exclusion rules, at which point we may as well just whitelist the files that should trigger anyway.

Where we can do better though is with documentation. I don't think that the documentation of ~ is especially helpful. I also have added some functionality that is completely undocumented and am hoping to update that before the 1.3.0 release.

@OlegYch
Copy link
Contributor Author

OlegYch commented Nov 9, 2018

@eatkins yep this definitely needs more thought
maybe we could take a look at what bazel, mill etc do with their pervasive caching

@eatkins
Copy link
Contributor

eatkins commented Nov 9, 2018

Yeah, there's no doubt that there are things we can learn from other tools. That being said, last time I looked at Mill back in May, it was clear to me that if big projects start using it, they are going to have to rewrite the file watching portion. It currently uses a dead simple polling implementation that has terrible performance once the number of files it's monitoring exceeds O(1000).

@OlegYch
Copy link
Contributor Author

OlegYch commented Nov 9, 2018

@eatkins that's kinda beside the point, this issue is more about accuracy of watching changed sources (or task inputs in general) which is pretty low atm in sbt
better scoping of watchSources might be a start, but accurately tracking all inputs should be the goal imo

@eatkins
Copy link
Contributor

eatkins commented Nov 10, 2018

Fair enough, though I still think it's worth pointing out that mill is not definitively better than sbt at watching for source changes. In my opinion, the main difference is that mill is much more opinionated about how the build is structured than sbt is. I don't think that it is more accurate per se. If you do not override the default source directories and you don't try to do anything other than compile scala and java sources, then I think sbt is just as accurate as mill.

Sbt makes it really easy to add any file as a watch source regardless of whether or not it is actually a compilation source. Granted, this functionality is not well documented so probably not readily available to most end users. That is more of a documentation problem than a technical problem. To me, the big technical challenge is that practically everything is override-able in sbt and also that with a large user base, there are a ton of nasty edge cases, e.g. symlinked directories or source files. If a project deviates from standard layout, or if it adds extra build steps, like running make in a jni project, then it is on the user to manually update the watchSources. In general, this is quite easy though, because, take the jni case, you can just do something like

watchSources += Source(jniDir, "*.c" | "*.h", NothingFilter)

and now it will trigger on any changes to all of those files. I'm not sure how easy it is to do something like that with other build tools.

Out of curiosity, what are your biggest issues with ~ today? In my projects, I find that it works pretty consistently well in sbt > 1.1.5, and especially in sbt 1.3.x (which is available as a nightly build). More specifically, in which situation do you find it to be inaccurate?

@eatkins
Copy link
Contributor

eatkins commented Nov 10, 2018

I should add that I've put an enormous amount of work into improving ~ (much of which is still pending in the 1.3.0 release) and I want to ensure that it is meeting the needs of end users. For the most part, I believe that we've solved the hard problem: reliably watch any directory or file for changes even when there are symlinks involved. Watching the right files should be relatively easy in comparison.

@OlegYch
Copy link
Contributor Author

OlegYch commented Nov 10, 2018

@eatkins atm the immediately visible problem is that almost no plugins that use custom source dirs (eg sbt-protoc) adjust watchSources
i don't think that's an easy problem to solve, and will probably require changes in sbt api
otherwise the watching subsystem seems to work pretty good after your recent changes (in 1.2.x)

@eatkins
Copy link
Contributor

eatkins commented Nov 10, 2018

Aah, yes, that makes a ton of sense. I think that if the watchSources were scoped by task (and by this I really mean that the ~ command should correctly apply scopes to the watchSources key given the tasks that it's going to run), then it wouldn't be too bad to get plugins to work with ~ out of the box. If this functionality existed, then we could update the docs to suggest that plugin authors add a $PLUGIN_TASK / watchSources := ... in their plugin definition and reach out to the more popular plugins and try and get them to add watchSources.

@OlegYch
Copy link
Contributor Author

OlegYch commented Nov 14, 2018

this sounds like a good start (though i imagined more pervasive inputs tracking)
not sure how this will work though, eg if you run compile and it depends on protoc (via generatedSources), how would ~compile know about protoc / watchSources ?

@eatkins
Copy link
Contributor

eatkins commented Nov 14, 2018

I could be mistaken, but I think it's possible to get all of the dependent scopes for a task. If that's the case, then if compile depends on protoc, then we can append protoc / watchSources to the compile / watchSources list.

@eatkins
Copy link
Contributor

eatkins commented Nov 16, 2018

I'm actually keen to address this because I've run into some frustrations with the scoping myself recently.

@eatkins
Copy link
Contributor

eatkins commented Nov 21, 2018

@OlegYch my hunch was correct. I will have a PR up soon that works exactly the way I described in the comment.

@eatkins
Copy link
Contributor

eatkins commented Nov 23, 2018

@OlegYch I've been thinking about it some more and I don't think I quite understood what you were saying. Upon further reflection, I think that you're right that we should deprecate and remove watchSources. My current idea is that we should add a new key (I'm not committed to this name):

val taskSources = settingKey[Seq[sbt.io.internal.Source]]("The sources for a particular task")

The idea is that taskSources are all of the files that specify the unmanaged inputs to a task. Note, however, that because they are specified with sbt.io.internal.Source rather than Seq[File] the taskSources key can be a SettingKey rather than a TaskKey, which is better for performance.

To reimplement the compile task, we'd have something like

Compile / compile / taskSources := unmanagedSourceDirectories.value.map {
  d => Source(d.value, includeFilter in unmanagedSources, excludeFilter in unmanagedSources)
}
Compile / compile / sources := taskSources.value.flatMap {
  s => IO.listFiles(s.base, s.includeFilter -- s.excludeFilter)
} ++ managedSources.value  

With these definitions, we could use the taskSources for watchSources. The improvement is that any changes to taskSources automatically propagate to watchSources. For many (most?) tasks, there would be no distinction between sources and taskSources.

I think the syntax for all of this can be made pretty nice. For example, in the protobuf plugin use case, we should be able to do something like:

protoc / taskSources := protocSourceDirectory.value / ** / *.proto

and then the protoc task (and any task that transitively depens on it) will automatically watch all "*.proto" files.

If we had this infrastructure, we could also build out caching/persistence of setting/task output similar to what mill does: http://www.lihaoyi.com/mill/page/tasks.html. This would be nice because there are certain tasks that don't neatly fit into the task/setting dichotomy because they should only run if their inputs have actually changed. For compile, we almost get this for free with zinc, but most other tasks will run even if there is no chance that their output will change.

@eatkins
Copy link
Contributor

eatkins commented Nov 23, 2018

@eed3si9n @dwijnand I'm curious what you think about ^^

@dwijnand
Copy link
Member

Sounds reasonable to me.

@dwijnand
Copy link
Member

According to #4512 (comment), fixed in #4512.

@dwijnand dwijnand added this to the 1.3.0 milestone May 31, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/watch Issues around ~ (triggered execution, watch) uncategorized Used for Waffle integration
Projects
None yet
Development

No branches or pull requests

5 participants