-
Notifications
You must be signed in to change notification settings - Fork 1.3k
index/stage: move repo collection logic to index #8782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if callable(callback): | ||
| callback() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When self.index is called, it might go on and collect stages. So the callback is there to signify that collection was completed, and that we are moving to check graph.
This is used in dvc add to show Checking graph status output.
Codecov ReportBase: 93.63% // Head: 93.53% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #8782 +/- ##
==========================================
- Coverage 93.63% 93.53% -0.10%
==========================================
Files 456 457 +1
Lines 36174 36171 -3
Branches 5241 5245 +4
==========================================
- Hits 33871 33834 -37
- Misses 1805 1832 +27
- Partials 498 505 +7
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
| return self.url or self.root_dir | ||
|
|
||
| @cached_property | ||
| def index(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other suggestion that I have is, as we discussed repo.index is a full index, so an API to get a partial index probably should not be included inside Index, but should be at Repo level, something like repo.index_view().
Since Index now is more like a dataclass, we could, in theory, create an Index with partially filled data.
def index_view(self, directory):
return Index(self, stages_from_dir, metrics_from_dir, plots_from_dir, params_from_dir)But the collection happens eagerly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense 👍
2e790eb to
1504739
Compare
|
@efiop, any thoughts on this? |
|
@skshetry Sorry for the delay, trying to get to this. I'll take a look asap. |
Haven't made many changes here, but now the Index collection happens eagerly (although index is still created lazily) when
Index.from_repo()orIndex.from_file()is called.Indexrather than as part ofRepo.stage.collect_repo.repo._skip_graph_checkswas broken, it has been fixed by introducingRepo.ensure_graph_correctness_with(stages=stages)API that honors that attribute.Index.from_repo().Repo/StageLoad/Index.Motivation
The Index had
stagescached property, which was lazily loaded. We extended this and addedmetrics/params/plotsproperty which was also acached_property. It was not possible to collect and fill all of these properties at the same time, but for performance reasons, we wanted to load everything at once.So I changed the implementation of all these properties to invoke
Index._collect(), which would collect and fill all of the properties at the same time (even if you ask for just one).https://github.com/iterative/dvc/blob/1d5de9c5ba5909f6eaa0911c3b2a691bbf4ea254/dvc/repo/index.py#L117-L127
But that felt hackish, the performance issue is solved but the way we collect feels odd.
It does seem like we are not collecting a unit of things, but multiple things. At least, the cached property makes me think in that way.
So not only in implementation, but to make it look like a single unit of thing to load, the interface also has to match that. So we needed to unify this collection logic.
With this PR, the Index is loaded eagerly when
Index.from_repo()is invoked, and it returns a single unitIndexthat has allstages/metrics/params/plotsin it.The other reason was the realization that the collection logic belonged in
Indexrather than inStageLoad, but that's more obvious.