Skip to content

Conversation

@yuja
Copy link
Contributor

@yuja yuja commented Apr 24, 2025

This patch isn't intended for merging. The idea is that we'll record operation ids where the commits were introduced in index, and jj evolog will use this information to extract the operation object. Since commits should be added only once to the index, this should be compatible with the current (append-only, immutable) index model. (If we were to index newly visible/hidden commits per operation, we would need separate index structure.)

I think the predecessors can be stored in the operation object, but I don't have strong feeling about that. Even if we decide to remove the predecessors information at all, we can still benefit from the "commit_id: operation_id" index. jj evolog would first look up the originating operation, then compute diffs of visible commits to deduce the predecessor commit. In this implementation, visibility changes (e.g. abandoned) wouldn't be included (unlike the operation log.)

#963

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@yuja yuja force-pushed the push-pumxqzrrptxw branch 3 times, most recently from c495a99 to f1585c4 Compare May 1, 2025 01:56
Copy link
Member

@martinvonz martinvonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that we'll record operation
ids where the commits were introduced in index, and jj evolog will use this
information to extract the operation object.

I'm not sure how we're going to make this work at Google. The index there is separate from operation log. You can create a commit without creating any operation and it will still be in the index. So it's undefined which operation introduced a given commit.

Thinking about what information we need for the feature, and in particular assuming we don't have predecessor pointers, we need to first find the operations that modified a given change (i.e. added or removed commits with the given change id). That seems like something that would fit into an "opset". Maybe we'd want a function on OpStore trait for evaluating an opset. It can still be expensive to load the repo at each operation to find the relevant commit ids. I don't know if that means that we should have something more specific for this use case than an opset evaluator.

@yuja
Copy link
Contributor Author

yuja commented May 3, 2025

I'm not sure how we're going to make this work at Google. The index there is separate from operation log. You can create a commit without creating any operation and it will still be in the index. So it's undefined which operation introduced a given commit.

Thinking about what information we need for the feature, and in particular assuming we don't have predecessor pointers, we need to first find the operations that modified a given change (i.e. added or removed commits with the given change id). That seems like something that would fit into an "opset". Maybe we'd want a function on OpStore trait for evaluating an opset. It can still be expensive to load the repo at each operation to find the relevant commit ids. I don't know if that means that we should have something more specific for this use case than an opset evaluator.

Something like index.resolve(op_store, commit_id|change_id|op_set) -> graph_iter|commit_operation_pair or op_store.resolve(index, ..) -> .. seems good if that works for Google.

FWIW, I think evolution log should be keyed by commit_id, not by change_id. change_id-based history might work if the forge can guarantee that there are no duplicated changes, but that's no the case for opensource Git repos.

@PhilipMetzger
Copy link
Contributor

FWIW, I think evolution log should be keyed by commit_id, not by change_id. change_id-based history might work if the forge can guarantee that there are no duplicated changes, but that's no the case for opensource Git repos.

I pretty much think this should be backend-dependent since some them will probably align with their forges.

Thinking about what information we need for the feature, and in particular assuming we don't have predecessor pointers, we need to first find the operations that modified a given change (i.e. added or removed commits with the given change id). That seems like something that would fit into an "opset". Maybe we'd want a function on OpStore trait for evaluating an opset. It can still be expensive to load the repo at each operation to find the relevant commit ids. I don't know if that means that we should have something more specific for this use case than an opset evaluator.

SGTM

@yuja yuja force-pushed the push-pumxqzrrptxw branch 2 times, most recently from a403a85 to 3e633ae Compare May 15, 2025 11:48
This patch isn't intended for merging. The idea is that we'll record operation
ids where the commits were introduced in index, and `jj evolog` will use this
information to extract the operation object. Since commits should be added only
once to the index, this should be compatible with the current (append-only,
immutable) index model. (If we were to index newly visible/hidden commits per
operation, we would need separate index structure.)

I think the predecessors can be stored in the operation object, but I don't have
strong feeling about that. Even if we decide to remove the predecessors
information at all, we can still benefit from the "commit_id: operation_id"
index. `jj evolog` would first look up the originating operation, then compute
diffs of visible commits to deduce the predecessor commit. In this
implementation, visibility changes (e.g. abandoned) wouldn't be included (unlike
the operation log.)

jj-vcs#963
@yuja yuja force-pushed the push-pumxqzrrptxw branch from 3e633ae to 9339b1d Compare May 15, 2025 12:06
yuja added a commit to yuja/jj that referenced this pull request May 24, 2025
There are two major goals:
1. garbage-collect predecessor commits referenced by immutable commits.
2. show operations alongside predecessors in "jj evolog".

The predecessors field will be removed from the Commit object. Maybe we can
also remove (writer side of) the extras table from GitBackend.

"jj evolog" will traverse the operation history to build an evolution
graph. This will be slower than the current implementation, but it seems
tolerable for mid-size repository stored in a local disk.

    (when the page cache is warm)
    % time jj op log -T'"."' --no-graph | wc -c
    50459
    jj op log -T'"."' --no-graph  0.12s user 0.31s system 103% cpu 0.418 total

Suppose we're interested in recent modifications, the traversal will often
terminate early. I also have an idea for indexing originating operations.
jj-vcs#6405

For old operations which didn't record predecessors, "jj evolog" will fall back
to commit.predecessor_ids(). That's why commit_predecessors is Option<_>.
yuja added a commit to yuja/jj that referenced this pull request May 24, 2025
There are two major goals:
1. garbage-collect predecessor commits referenced by immutable commits.
2. show operations alongside predecessors in "jj evolog".

The predecessors field will be removed from the Commit object. Maybe we can
also remove (writer side of) the extras table from GitBackend.

"jj evolog" will traverse the operation history to build an evolution
graph. This will be slower than the current implementation, but it seems
tolerable for mid-size repository stored in a local disk.

    (when the page cache is warm)
    % time jj op log -T'"."' --no-graph | wc -c
    50459
    jj op log -T'"."' --no-graph  0.12s user 0.31s system 103% cpu 0.418 total

Suppose we're interested in recent modifications, the traversal will often
terminate early. I also have an idea for indexing originating operations.
jj-vcs#6405

For old operations which didn't record predecessors, "jj evolog" will fall back
to commit.predecessor_ids(). That's why commit_predecessors is Option<_>.
yuja added a commit to yuja/jj that referenced this pull request May 26, 2025
There are two major goals:
1. garbage-collect predecessor commits referenced by immutable commits.
2. show operations alongside predecessors in "jj evolog".

The predecessors field will be removed from the Commit object. Maybe we can
also remove (writer side of) the extras table from GitBackend.

"jj evolog" will traverse the operation history to build an evolution
graph. This will be slower than the current implementation, but it seems
tolerable for mid-size repository stored in a local disk.

    (when the page cache is warm)
    % time jj op log -T'"."' --no-graph | wc -c
    50459
    jj op log -T'"."' --no-graph  0.12s user 0.31s system 103% cpu 0.418 total

Suppose we're interested in recent modifications, the traversal will often
terminate early. I also have an idea for indexing originating operations.
jj-vcs#6405

For old operations which didn't record predecessors, "jj evolog" will fall back
to commit.predecessor_ids(). That's why commit_predecessors is Option<_>.
yuja added a commit to yuja/jj that referenced this pull request May 26, 2025
There are two major goals:
1. garbage-collect predecessor commits referenced by immutable commits.
2. show operations alongside predecessors in "jj evolog".

The predecessors field will be removed from the Commit object. Maybe we can
also remove (writer side of) the extras table from GitBackend.

"jj evolog" will traverse the operation history to build an evolution
graph. This will be slower than the current implementation, but it seems
tolerable for mid-size repository stored in a local disk.

    (when the page cache is warm)
    % time jj op log -T'"."' --no-graph | wc -c
    50459
    jj op log -T'"."' --no-graph  0.12s user 0.31s system 103% cpu 0.418 total

Suppose we're interested in recent modifications, the traversal can often be
terminate early. I also have an idea for indexing originating operations.
jj-vcs#6405

For old operations which didn't record predecessors, "jj evolog" will fall back
to commit.predecessor_ids(). That's why commit_predecessors is Option<_>.
github-merge-queue bot pushed a commit that referenced this pull request May 26, 2025
There are two major goals:
1. garbage-collect predecessor commits referenced by immutable commits.
2. show operations alongside predecessors in "jj evolog".

The predecessors field will be removed from the Commit object. Maybe we can
also remove (writer side of) the extras table from GitBackend.

"jj evolog" will traverse the operation history to build an evolution
graph. This will be slower than the current implementation, but it seems
tolerable for mid-size repository stored in a local disk.

    (when the page cache is warm)
    % time jj op log -T'"."' --no-graph | wc -c
    50459
    jj op log -T'"."' --no-graph  0.12s user 0.31s system 103% cpu 0.418 total

Suppose we're interested in recent modifications, the traversal can often be
terminate early. I also have an idea for indexing originating operations.
#6405

For old operations which didn't record predecessors, "jj evolog" will fall back
to commit.predecessor_ids(). That's why commit_predecessors is Option<_>.
@yuja
Copy link
Contributor Author

yuja commented May 26, 2025

Superseded by #6628.

@yuja yuja closed this May 26, 2025
@yuja yuja deleted the push-pumxqzrrptxw branch May 26, 2025 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants