Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: add jj fix proof of concept #3647

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 153 additions & 0 deletions cli/src/commands/fix.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
// Copyright 2024 The Jujutsu Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

use std::collections::{HashMap, HashSet};

use futures::StreamExt;
use itertools::Itertools;
use jj_lib::backend::{BackendError, BackendResult, CommitId, FileId, TreeValue};
use jj_lib::commit::{Commit, CommitIteratorExt};
use jj_lib::matchers::EverythingMatcher;
use jj_lib::merged_tree::MergedTreeBuilder;
use jj_lib::repo::Repo;
use jj_lib::repo_path::RepoPathBuf;
use jj_lib::revset::{RevsetExpression, RevsetIteratorExt};
use jj_lib::store::Store;
use pollster::FutureExt;
use tracing::instrument;

use crate::cli_util::{CommandHelper, RevisionArg};
use crate::command_error::CommandError;
use crate::ui::Ui;

/// Format files
#[derive(clap::Args, Clone, Debug)]
#[command(verbatim_doc_comment)]
pub(crate) struct FixArgs {
/// Fix these commits and all their descendants
#[arg(long, short)]
sources: Vec<RevisionArg>,
}

#[instrument(skip_all)]
pub(crate) fn cmd_fix(
ui: &mut Ui,
command: &CommandHelper,
args: &FixArgs,
) -> Result<(), CommandError> {
let mut workspace_command = command.workspace_helper(ui)?;
let root_commits: Vec<Commit> = workspace_command
.parse_union_revsets(&args.sources)?
.evaluate_to_commits()?
.try_collect()?;
workspace_command.check_rewritable(root_commits.iter().ids())?;

let mut tx = workspace_command.start_transaction();

// Collect all FileIds we're going to format and which commits they appear in
let commits: Vec<_> = RevsetExpression::commits(root_commits.iter().ids().cloned().collect())
.descendants()
.evaluate_programmatic(tx.base_repo().as_ref())?
.iter()
.commits(tx.repo().store())
.try_collect()?;
let mut file_ids = HashSet::new();
let mut commit_paths: HashMap<CommitId, Vec<RepoPathBuf>> = HashMap::new();
for commit in commits.iter().rev() {
let mut paths = vec![];
// Paths modified in parent commits in the set should also be updated in this
// commit
for parent_id in commit.parent_ids() {
if let Some(parent_paths) = commit_paths.get(parent_id) {
paths.extend_from_slice(parent_paths);
}
}
let parent_tree = commit.parent_tree(tx.repo())?;
let tree = commit.tree()?;
let mut diff_stream = parent_tree.diff_stream(&tree, &EverythingMatcher);
async {
while let Some((repo_path, diff)) = diff_stream.next().await {
let (_before, after) = diff?;
for term in after.into_iter().flatten() {
if let TreeValue::File { id, executable: _ } = term {
file_ids.insert((repo_path.clone(), id));
paths.push(repo_path.clone());
}
}
}
Ok::<(), BackendError>(())
}
.block_on()?;
commit_paths.insert(commit.id().clone(), paths);
}

let formatted = format_files(tx.repo().store().as_ref(), &file_ids)?;

tx.mut_repo().transform_descendants(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: One thing to consider is how to recover from failures. My initial expectation was that I would want the entire process to fail if processing any commit failed. But, so far in my experience running git test fix, if processing one or more commits fails —

  • I've still actually always wanted to apply any fixes for commits whose ancestors did not fail.
  • In some but not all cases, I've wanted to apply fixes for commits that passed, even if some ancestor failed.
  • I don't think I've ever wanted to apply a fix if the command for it returned a non-zero exit status, even if it did successfully format some files on disk.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I suspect @hooper has thought about that but I don't think I have. I think what you say makes sense. I suppose the effect is that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach that has worked well for us in hg fix is to only use the output if the exit code is zero, and to print any stderr lines with a prefix that indicates which commit/file caused it. Part of this problem is just meeting the formatting tools where they are instead of trying to negotiate some new contract with them.

command.settings(),
root_commits.iter().ids().cloned().collect_vec(),
|mut rewriter| {
let paths = commit_paths.get(rewriter.old_commit().id()).unwrap();
let old_tree = rewriter.old_commit().tree()?;
let mut tree_builder = MergedTreeBuilder::new(old_tree.id().clone());
for path in paths {
let old_value = old_tree.path_value(path);
let new_value = old_value.map(|old_term| {
if let Some(TreeValue::File { id, executable }) = old_term {
if let Some(new_id) = formatted.get(&(path, id)) {
Some(TreeValue::File {
id: new_id.clone(),
executable: *executable,
})
} else {
old_term.clone()
}
} else {
old_term.clone()
}
});
if new_value != old_value {
tree_builder.set_or_remove(path.clone(), new_value);
}
}
let new_tree = tree_builder.write_tree(rewriter.mut_repo().store())?;
let builder = rewriter.reparent(command.settings())?;
builder.set_tree_id(new_tree).write()?;
Ok(())
},
)?;

tx.finish(ui, format!("fixed {} commits", root_commits.len()))
}

fn format_files<'a>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: Ultimately, we presumably want to program against an interface dealing with commits as an intermediary between jj fix and the formatter, which I guess would eventually arrive as part of jj run?

For reference, here's what the git-branchless interface currently looks like (it's used in git test but also in git submit for the Phabricator backend): https://github.com/arxanas/git-branchless/blob/e87ab50603fe40e1285bba574d9d58e7b9c88baa/git-branchless-test/src/lib.rs#L1204-L1215.

(One thing I would change about the interface is to not necessarily pass in an entire slice of Commits, and use only a lazily-evaluated revset of some sort, for the case of searching commits. It's also actually relevant for formatting commits, if your objective is to find the first badly-formatted commit and apply the formatting as a patch, rather than formatting all commits in a stateless manner — both are meaningful strategies.)

That interface only supports external commands, but I assume that jj also wants to support in-process commands/functions. In either case, I think there's not too much information that we need to pass to the command/function, which would form the interface of a "command" that can be run as part of jj run/jj fix:

  • The ID of the commit being processed.
  • The IDs of ancestors of that commit which are also roots of the commit set being formatted.
  • If literally running a command in a workspace, then a handle to the workspace.
  • For convenience, perhaps also the following, even though they can be computed directly from the IDs of all the involved commits?
    • The set of files changed compared to parents.
    • The set of files changed compared to roots.

The approach as written in this PR deduplicates requests to format the same file, but I would encourage to not make that the direct interface, as it's not general enough to handle certain classes of fix. If the formatter function is actually running in-process, then it shouldn't be too hard to recover the caching at a different level of abstraction instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions

(One thing I would change about the interface is to not necessarily pass in an entire slice of Commits, and use only a lazily-evaluated revset of some sort, for the case of searching commits. It's also actually relevant for formatting commits, if your objective is to find the first badly-formatted commit and apply the formatting as a patch, rather than formatting all commits in a stateless manner — both are meaningful strategies.)

Can you explain your idea a bit more, as with run we try to cache commits somewhere, either on disk or depending on backend somewhere else.

That interface only supports external commands, but I assume that jj also wants to support in-process commands/functions. In either case, I think there's not too much information that we need to pass to the command/function, which would form the interface of a "command" that can be run as part of jj run/jj fix:

  • The ID of the commit being processed.

  • The IDs of ancestors of that commit which are also roots of the commit set being formatted.

  • If literally running a command in a workspace, then a handle to the workspace.

  • For convenience, perhaps also the following, even though they can be computed directly from the IDs of all the involved commits?

    • The set of files changed compared to parents.
    • The set of files changed compared to roots.

This looks like a really good idea, but how should that interface look? Should there be some Context which is passed into the inner run impl?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach as written in this PR deduplicates requests to format the same file, but I would encourage to not make that the direct interface, as it's not general enough to handle certain classes of fix. If the formatter function is actually running in-process, then it shouldn't be too hard to recover the caching at a different level of abstraction instead?

jj fix is specialized for formatting individual files (does not support formatters that need to inspect multiple files) and can therefore avoid materializing working copies. Maybe we can rewrite it using the jj run infrastructure once we have some virtual file system support so we don't need to materialize the working copies. Does that sound okay?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about working copies specifically, but rather about organizing around commits rather than files:

  • organizing around commits will be better for the eventual jj run integration
  • there are purely in-memory reasons to operate on commits for jj fix
    • some formatters are designed to operate specifically on changed sections of code (particularly useful for transition periods between not having and having a formatter, or upgrading the formatter style)
    • there are some meaningful operations to do on commit messages as well, like inserting a change ID or word wrapping (I guess?)
  • error reporting is almost certainly going to want to be per-commit, not per-file — you want to know which commit to go to in order to fix a syntax error

Therefore I recommend that even the current iteration be designed around commits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot to unpack here, but I can say much of it has come up already in our internal use of hg fix, and my plans for jj fix aren't much different from what we converged on there. I don't yet see a way of totally unifying fix and run that seems worth it to me, but I do want jj run --fix -r <rev> <command> to work as an alternative to jj fix for commands that need more context than formatters.

The point about error reporting is interesting, because there can be multiple root commits where a syntax error would need to be fixed. So you can't just pick a good one to associate the error message with. hg fix just prints the error once for every commit where it occurs. Once we add support for formatting changed line ranges, there will be less possible de-duplication of work across commits, but it will still be quite beneficial to some users if we can do O(changes) work instead of O(changes*commits) for chains of commits with disjoint sets of modified files.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • some formatters are designed to operate specifically on changed sections of code (particularly useful for transition periods between not having and having a formatter, or upgrading the formatter style)

@hooper and I talked a bit about this the other day. I think we can do it by still fully formatting both sides of the diff and then keep only hunks that overlap with hunks in the diff. Consequences:

  • Works independent of formatter (no need for formatter to support partial formatting, no need for jj fix to provide a way to pass in line ranges)
  • Wastefully formats entire files
  • If formatting results in a large chunk of the file changing (e.g. reflowing), then this algorithm will make the reformatting apply to a large chunk too, for better or worse (I don't know how formatters that support partial formatting treat this case)

store: &Store,
file_ids: &'a HashSet<(RepoPathBuf, FileId)>,
) -> BackendResult<HashMap<(&'a RepoPathBuf, &'a FileId), FileId>> {
let mut result = HashMap::new();
for (path, id) in file_ids {
// TODO: read asynchronously
let mut read = store.read_file(path, id)?;
let mut buf = vec![];
read.read_to_end(&mut buf).unwrap();
// TODO: Call a formatter instead of just uppercasing
for b in &mut buf {
b.make_ascii_uppercase();
}
// TODO: Don't write it if it didn't change
let formatted_id = store.write_file(path, &mut buf.as_slice())?;
result.insert((path, id), formatted_id);
}
Ok(result)
}
4 changes: 4 additions & 0 deletions cli/src/commands/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mod diffedit;
mod duplicate;
mod edit;
mod files;
mod fix;
mod git;
mod init;
mod interdiff;
Expand Down Expand Up @@ -92,6 +93,8 @@ enum Command {
Duplicate(duplicate::DuplicateArgs),
Edit(edit::EditArgs),
Files(files::FilesArgs),
#[command(hide = true)]
Fix(fix::FixArgs),
#[command(subcommand)]
Git(git::GitCommand),
Init(init::InitArgs),
Expand Down Expand Up @@ -199,6 +202,7 @@ pub fn run_command(ui: &mut Ui, command_helper: &CommandHelper) -> Result<(), Co
Command::Merge(sub_args) => merge::cmd_merge(ui, command_helper, sub_args),
Command::Rebase(sub_args) => rebase::cmd_rebase(ui, command_helper, sub_args),
Command::Backout(sub_args) => backout::cmd_backout(ui, command_helper, sub_args),
Command::Fix(sub_args) => fix::cmd_fix(ui, command_helper, sub_args),
Command::Resolve(sub_args) => resolve::cmd_resolve(ui, command_helper, sub_args),
Command::Branch(sub_args) => branch::cmd_branch(ui, command_helper, sub_args),
Command::Undo(sub_args) => operation::cmd_op_undo(ui, command_helper, sub_args),
Expand Down
9 changes: 9 additions & 0 deletions lib/src/rewrite.rs
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,15 @@ impl<'repo> CommitRewriter<'repo> {
let builder = self.rebase_with_empty_behavior(settings, EmptyBehaviour::Keep)?;
Ok(builder.unwrap())
}

/// Rewrite the old commit onto the new parents without changing its
/// contents. Returns a `CommitBuilder` for the new commit.
pub fn reparent(self, settings: &UserSettings) -> BackendResult<CommitBuilder<'repo>> {
Ok(self
.mut_repo
.rewrite_commit(settings, &self.old_commit)
.set_parents(self.new_parents))
}
}

pub enum RebasedCommit {
Expand Down