cli: evolog: show associated operation for recent commits (PoC) #6405

yuja · 2025-04-24T12:23:15Z

This patch isn't intended for merging. The idea is that we'll record operation ids where the commits were introduced in index, and jj evolog will use this information to extract the operation object. Since commits should be added only once to the index, this should be compatible with the current (append-only, immutable) index model. (If we were to index newly visible/hidden commits per operation, we would need separate index structure.)

I think the predecessors can be stored in the operation object, but I don't have strong feeling about that. Even if we decide to remove the predecessors information at all, we can still benefit from the "commit_id: operation_id" index. jj evolog would first look up the originating operation, then compute diffs of visible commits to deduce the predecessor commit. In this implementation, visibility changes (e.g. abandoned) wouldn't be included (unlike the operation log.)

#963

Checklist

If applicable:

I have updated CHANGELOG.md
I have updated the documentation (README.md, docs/, demos/)
I have updated the config schema (cli/src/config-schema.json)
I have added tests to cover my changes

martinvonz

The idea is that we'll record operation
ids where the commits were introduced in index, and jj evolog will use this
information to extract the operation object.

I'm not sure how we're going to make this work at Google. The index there is separate from operation log. You can create a commit without creating any operation and it will still be in the index. So it's undefined which operation introduced a given commit.

Thinking about what information we need for the feature, and in particular assuming we don't have predecessor pointers, we need to first find the operations that modified a given change (i.e. added or removed commits with the given change id). That seems like something that would fit into an "opset". Maybe we'd want a function on OpStore trait for evaluating an opset. It can still be expensive to load the repo at each operation to find the relevant commit ids. I don't know if that means that we should have something more specific for this use case than an opset evaluator.

yuja · 2025-05-03T04:00:44Z

I'm not sure how we're going to make this work at Google. The index there is separate from operation log. You can create a commit without creating any operation and it will still be in the index. So it's undefined which operation introduced a given commit.

Thinking about what information we need for the feature, and in particular assuming we don't have predecessor pointers, we need to first find the operations that modified a given change (i.e. added or removed commits with the given change id). That seems like something that would fit into an "opset". Maybe we'd want a function on OpStore trait for evaluating an opset. It can still be expensive to load the repo at each operation to find the relevant commit ids. I don't know if that means that we should have something more specific for this use case than an opset evaluator.

Something like index.resolve(op_store, commit_id|change_id|op_set) -> graph_iter|commit_operation_pair or op_store.resolve(index, ..) -> .. seems good if that works for Google.

FWIW, I think evolution log should be keyed by commit_id, not by change_id. change_id-based history might work if the forge can guarantee that there are no duplicated changes, but that's no the case for opensource Git repos.

PhilipMetzger · 2025-05-03T13:05:57Z

FWIW, I think evolution log should be keyed by commit_id, not by change_id. change_id-based history might work if the forge can guarantee that there are no duplicated changes, but that's no the case for opensource Git repos.

I pretty much think this should be backend-dependent since some them will probably align with their forges.

Thinking about what information we need for the feature, and in particular assuming we don't have predecessor pointers, we need to first find the operations that modified a given change (i.e. added or removed commits with the given change id). That seems like something that would fit into an "opset". Maybe we'd want a function on OpStore trait for evaluating an opset. It can still be expensive to load the repo at each operation to find the relevant commit ids. I don't know if that means that we should have something more specific for this use case than an opset evaluator.

SGTM

This patch isn't intended for merging. The idea is that we'll record operation ids where the commits were introduced in index, and `jj evolog` will use this information to extract the operation object. Since commits should be added only once to the index, this should be compatible with the current (append-only, immutable) index model. (If we were to index newly visible/hidden commits per operation, we would need separate index structure.) I think the predecessors can be stored in the operation object, but I don't have strong feeling about that. Even if we decide to remove the predecessors information at all, we can still benefit from the "commit_id: operation_id" index. `jj evolog` would first look up the originating operation, then compute diffs of visible commits to deduce the predecessor commit. In this implementation, visibility changes (e.g. abandoned) wouldn't be included (unlike the operation log.) jj-vcs#963

There are two major goals: 1. garbage-collect predecessor commits referenced by immutable commits. 2. show operations alongside predecessors in "jj evolog". The predecessors field will be removed from the Commit object. Maybe we can also remove (writer side of) the extras table from GitBackend. "jj evolog" will traverse the operation history to build an evolution graph. This will be slower than the current implementation, but it seems tolerable for mid-size repository stored in a local disk. (when the page cache is warm) % time jj op log -T'"."' --no-graph | wc -c 50459 jj op log -T'"."' --no-graph 0.12s user 0.31s system 103% cpu 0.418 total Suppose we're interested in recent modifications, the traversal will often terminate early. I also have an idea for indexing originating operations. jj-vcs#6405 For old operations which didn't record predecessors, "jj evolog" will fall back to commit.predecessor_ids(). That's why commit_predecessors is Option<_>.

There are two major goals: 1. garbage-collect predecessor commits referenced by immutable commits. 2. show operations alongside predecessors in "jj evolog". The predecessors field will be removed from the Commit object. Maybe we can also remove (writer side of) the extras table from GitBackend. "jj evolog" will traverse the operation history to build an evolution graph. This will be slower than the current implementation, but it seems tolerable for mid-size repository stored in a local disk. (when the page cache is warm) % time jj op log -T'"."' --no-graph | wc -c 50459 jj op log -T'"."' --no-graph 0.12s user 0.31s system 103% cpu 0.418 total Suppose we're interested in recent modifications, the traversal can often be terminate early. I also have an idea for indexing originating operations. jj-vcs#6405 For old operations which didn't record predecessors, "jj evolog" will fall back to commit.predecessor_ids(). That's why commit_predecessors is Option<_>.

There are two major goals: 1. garbage-collect predecessor commits referenced by immutable commits. 2. show operations alongside predecessors in "jj evolog". The predecessors field will be removed from the Commit object. Maybe we can also remove (writer side of) the extras table from GitBackend. "jj evolog" will traverse the operation history to build an evolution graph. This will be slower than the current implementation, but it seems tolerable for mid-size repository stored in a local disk. (when the page cache is warm) % time jj op log -T'"."' --no-graph | wc -c 50459 jj op log -T'"."' --no-graph 0.12s user 0.31s system 103% cpu 0.418 total Suppose we're interested in recent modifications, the traversal can often be terminate early. I also have an idea for indexing originating operations. #6405 For old operations which didn't record predecessors, "jj evolog" will fall back to commit.predecessor_ids(). That's why commit_predecessors is Option<_>.

yuja · 2025-05-26T12:58:45Z

Superseded by #6628.

yuja force-pushed the push-pumxqzrrptxw branch 3 times, most recently from c495a99 to f1585c4 Compare May 1, 2025 01:56

martinvonz reviewed May 2, 2025

View reviewed changes

yuja force-pushed the push-pumxqzrrptxw branch 2 times, most recently from a403a85 to 3e633ae Compare May 15, 2025 11:48

yuja force-pushed the push-pumxqzrrptxw branch from 3e633ae to 9339b1d Compare May 15, 2025 12:06

yuja mentioned this pull request May 21, 2025

cli: util: add clear-predecessors command #5715

Closed

4 tasks

yuja closed this May 26, 2025

yuja deleted the push-pumxqzrrptxw branch May 26, 2025 12:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cli: evolog: show associated operation for recent commits (PoC) #6405

cli: evolog: show associated operation for recent commits (PoC) #6405

Uh oh!

yuja commented Apr 24, 2025

Uh oh!

martinvonz left a comment

Uh oh!

yuja commented May 3, 2025

Uh oh!

PhilipMetzger commented May 3, 2025

Uh oh!

yuja commented May 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cli: evolog: show associated operation for recent commits (PoC) #6405

cli: evolog: show associated operation for recent commits (PoC) #6405

Uh oh!

Conversation

yuja commented Apr 24, 2025

Checklist

Uh oh!

martinvonz left a comment

Choose a reason for hiding this comment

Uh oh!

yuja commented May 3, 2025

Uh oh!

PhilipMetzger commented May 3, 2025

Uh oh!

yuja commented May 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants