Skip to content

feat(ast-engine): support TAB indentation parsing#100

Merged
bashandbone merged 9 commits intomainfrom
fix-indent-tabs-13144396554011743547
Mar 10, 2026
Merged

feat(ast-engine): support TAB indentation parsing#100
bashandbone merged 9 commits intomainfrom
fix-indent-tabs-13144396554011743547

Conversation

@bashandbone
Copy link
Contributor

@bashandbone bashandbone commented Mar 9, 2026

The indent.rs parsing logic only supported whitespace/space characters. This change handles checking and preserving literal TAB characters (\t) during extraction and replacing code snippets with multi-line string segments.


PR created automatically by Jules for task 13144396554011743547 started by @bashandbone

Summary by Sourcery

Add support for preserving and reapplying TAB-based indentation in AST replacement and templating logic.

New Features:

  • Allow indentation detection and reconstruction to distinguish between space and TAB indentation and preserve the original style when formatting replacements and template insertions.

Tests:

  • Extend replacer tests to cover TAB-indented sources and replacements to ensure correct deindent and reindent behavior.

* Added `get_tab` utility function for parsing tab characters from content.
* Refactored `get_indent_at_offset` to handle tabs by returning `is_tab` along with indent offset.
* Handled the stripping and insertion of mixed-tabs vs space indent characters properly inside `remove_indent` and `indent_lines_impl`.
* Plumbed the `is_tab` boolean down through `formatted_slice` and `indent_lines`.
* Updated tests in `indent.rs` to exercise proper TAB character extraction and re-indentation rules.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 9, 2026 22:01
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Mar 9, 2026

Reviewer's Guide

Adds full support for tab-indented code in the ast-engine replacer by propagating an is_tab flag through indentation helpers, updating indent detection and de-indentation to treat tabs as indentation units, and extending template replacement logic and tests accordingly.

Class diagram for updated indentation and template replacement logic

classDiagram
    class Content {
    }

    class DeindentedExtract {
        <<enum>>
        MultiLine(lines, original_indent)
        SingleLine(line)
    }

    class IndentModule {
        +formatted_slice(content, start, slice) [C::Underlying[]]
        +indent_lines(indent, extract, is_tab) Cow~Underlying[]~
        +get_indent_at_offset(src) usize
        +get_indent_at_offset_with_tab(src) (usize, bool)
        -indent_lines_impl(indent, lines, is_tab) Underlying[]
        -remove_indent(indent, src) Underlying[]
        -get_new_line() Underlying
        -get_space() Underlying
        -get_tab() Underlying
    }

    class TemplateFix {
        +generate_replacement(nm) Underlying
    }

    class Template {
        -fragments Vec~String~
        -vars Vec~(MetaVarExtract, Indent, bool)~
        +create_template(tmpl, pattern) Template
    }

    class MetaVarExtract {
        <<enum>>
        Range(name)
        Single(name)
        Transformed(name)
    }

    class MetaVarEnv {
        +get_range(name) Range
        +get_source(name) Source
        +get_transformed(name) Underlying[]
    }

    class Doc {
        <<trait>>
        +get_source() Source
    }

    class Replacer {
        <<trait>>
        +generate_replacement(nm) Underlying
    }

    class NodeMatch {
        +get_doc() Doc
        +get_env() MetaVarEnv
        +range() Range
    }

    class Source {
        +get_range(range) Underlying[]
        +decode_str(s) Underlying[]
    }

    class Range {
        +start usize
        +end usize
    }

    %% Relationships
    IndentModule ..> Content : uses
    IndentModule ..> DeindentedExtract : uses

    TemplateFix ..|> Replacer
    TemplateFix ..> Template : uses
    TemplateFix ..> MetaVarEnv : uses
    TemplateFix ..> NodeMatch : uses
    TemplateFix ..> IndentModule : uses

    Template ..> MetaVarExtract : contains

    MetaVarEnv ..> MetaVarExtract : keyed_by

    NodeMatch ..> Doc : returns
    Doc ..> Source : returns

    Source ..> Content : implements

    %% Helper functions in template module
    class TemplateModule {
        +replace_fixer(fixer, env) Underlying[]
        +maybe_get_var(env, var, indent, is_tab) Cow~Underlying[]~
    }

    TemplateModule ..> TemplateFix : uses
    TemplateModule ..> MetaVarEnv : uses
    TemplateModule ..> MetaVarExtract : uses
    TemplateModule ..> IndentModule : uses
    TemplateModule ..> DeindentedExtract : uses
Loading

Flow diagram for tab-aware indentation detection and application

flowchart TD
    A["Start replacement
    (NodeMatch or Template)"] --> B["Get leading source
    before insertion point"]
    B --> C["Call get_indent_at_offset_with_tab
    -> (indent, is_tab)"]

    C --> D{indent == 0?}
    D -- "Yes" --> E["No additional indent
    pass through or deindent only"]
    D -- "No" --> F["Prepare DeindentedExtract
    from source or template bytes"]

    F --> G["Call indent_lines(indent, extract, is_tab)"]

    subgraph Indent_lines_logic
        G --> H{extract variant}
        H -- "SingleLine" --> I["Return original line
        unchanged"]
        H -- "MultiLine" --> J["Compare requested indent
        with original_indent"]
        J --> K{indent >= original_indent?}
        K -- "Greater" --> L["Call indent_lines_impl
        with indent - original_indent
        and is_tab"]
        K -- "Equal" --> M["Return lines unchanged"]
        K -- "Less" --> N["Call indent_lines_impl
        with original_indent - indent
        and is_tab"]
    end

    subgraph Indent_lines_impl
        L --> O["Choose indent_char:
        if is_tab then get_tab
        else get_space"]
        N --> O
        O --> P["Build leading prefix by
        repeating indent_char indent times"]
        P --> Q["Do not prefix first line"]
        Q --> R["For each subsequent line:
        append new_line, then leading,
        then line content"]
    end

    C --> S["remove_indent used by
    extract_with_deindent"]
    subgraph Remove_indent_logic
        S --> T["Split src by new_line"]
        T --> U["For each line, strip up to indent
        leading characters where each
        char is space OR tab"]
        U --> V["Join lines with new_line"]
    end

    R --> W["Indented bytes returned
    to caller"]
    E --> W
    V --> F
Loading

File-Level Changes

Change Details Files
Propagate indentation style (space vs tab) through the indenting pipeline so replacements preserve original indentation characters.
  • Extended formatted_slice and indent_lines to compute and accept an is_tab flag alongside the numeric indent level.
  • Updated all indent_lines call sites in indent and template replacers to pass through the detected is_tab value.
  • Adjusted indent_lines_impl to choose between space and tab characters when constructing leading indentation based on is_tab.
crates/ast-engine/src/replacer/indent.rs
crates/ast-engine/src/replacer/template.rs
Enhance indent detection and de-indentation to recognize and strip both spaces and tabs while tracking whether tabs are present.
  • Introduced get_tab helper analogous to get_space for the Content abstraction.
  • Replaced get_indent_at_offset’s implementation with a new get_indent_at_offset_with_tab that returns both indent width and a boolean flag indicating if tabs were seen, and kept get_indent_at_offset as a thin wrapper for backward compatibility.
  • Modified remove_indent to strip up to indent leading units that may be either spaces or tabs on each line instead of assuming spaces only.
crates/ast-engine/src/replacer/indent.rs
Update template-based replacements to carry and respect indentation style per metavariable occurrence.
  • Expanded Template.vars to store an (MetaVarExtract, Indent, bool) triple where the boolean encodes tab usage at that position in the template source.
  • Adjusted create_template to use tab-aware indent detection and populate the is_tab flag for each variable site.
  • Changed replace_fixer and maybe_get_var to thread the is_tab flag through and call indent_lines with the appropriate indentation style when reconstructing variable substitutions.
crates/ast-engine/src/replacer/template.rs
Add regression tests covering tab-indented extraction and replacement to validate the new behavior.
  • Updated existing tests to call the new indent_lines signature and pass tab style where appropriate.
  • Added tests for de-indenting and re-indenting tab-indented code slices, and for replacements into tab-indented contexts to ensure correct tab preservation and alignment.
crates/ast-engine/src/replacer/indent.rs

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Threading a single is_tab: bool through indent_lines and tests (e.g. source.contains('\t')) loses information when indentation is mixed or differs between lines; consider propagating the actual indent character (or per-line indent style) instead of a global flag so replacement preserves existing indentation more faithfully.
  • get_indent_at_offset_with_tab currently sets is_tab = true if any tab is seen in the scanned prefix, not necessarily in the leading indentation sequence itself; if the intent is to mirror the indentation of the current line, it might be less surprising to only mark is_tab based on the contiguous indentation run immediately preceding the content.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Threading a single `is_tab: bool` through `indent_lines` and tests (e.g. `source.contains('\t')`) loses information when indentation is mixed or differs between lines; consider propagating the actual indent character (or per-line indent style) instead of a global flag so replacement preserves existing indentation more faithfully.
- `get_indent_at_offset_with_tab` currently sets `is_tab = true` if *any* tab is seen in the scanned prefix, not necessarily in the leading indentation sequence itself; if the intent is to mirror the indentation of the current line, it might be less surprising to only mark `is_tab` based on the contiguous indentation run immediately preceding the content.

## Individual Comments

### Comment 1
<location path="crates/ast-engine/src/replacer/indent.rs" line_range="255-264" />
<code_context>
+}
+
+/// returns (indent, is_tab)
+pub fn get_indent_at_offset_with_tab<C: Content>(src: &[C::Underlying]) -> (usize, bool) {
     let lookahead = src.len().max(MAX_LOOK_AHEAD) - MAX_LOOK_AHEAD;

     let mut indent = 0;
+    let mut is_tab = false;
     let new_line = get_new_line::<C>();
     let space = get_space::<C>();
-    // TODO: support TAB. only whitespace is supported now
+    let tab = get_tab::<C>();
     for c in src[lookahead..].iter().rev() {
         if *c == new_line {
-            return indent;
+            return (indent, is_tab);
         }
         if *c == space {
             indent += 1;
+        } else if *c == tab {
+            indent += 1;
+            is_tab = true;
</code_context>
<issue_to_address>
**issue (bug_risk):** The `is_tab` flag becomes true if any tab appears in the indent run, which can flip mixed indentation into all-tabs when re-indenting.

Because `is_tab` is set when *any* tab appears, a prefix like `'  	'` becomes `(indent = 3, is_tab = true)`, and `indent_lines_impl` will re-indent as three tabs instead of preserving the original mixed whitespace. This can silently change indentation style and visual alignment. Consider instead deriving `is_tab` from:
- The first indent character,
- Separate space/tab counts (choosing the majority), or
- Requiring that all indent characters are tabs before setting `is_tab`.
This would avoid unintended indentation changes when re-indenting or expanding templates.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Fixed formatting issue in `crates/ast-engine/src/replacer/indent.rs` and `crates/ast-engine/src/replacer/template.rs` that was flagged by `cargo fmt --all -- --config-path ./rustfmt.toml --check` in the CI pipeline.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds TAB-aware indentation handling to the ast-engine replacer so multi-line replacements can preserve tab-indented formatting instead of assuming spaces.

Changes:

  • Introduces tab detection alongside indent width calculation (get_indent_at_offset_with_tab) and threads that through re-indentation.
  • Extends template replacement to carry indentation style information (is_tab) per metavariable insertion point.
  • Updates indentation/deindentation logic and adds tests covering tab-indented cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
crates/ast-engine/src/replacer/template.rs Propagates (indent, is_tab) through template metavariable replacement so inserted content is reindented using tabs when appropriate.
crates/ast-engine/src/replacer/indent.rs Adds tab token support, introduces get_indent_at_offset_with_tab, updates indenting/deindenting to recognize \t, and adds tab-focused tests.
Comments suppressed due to low confidence (1)

crates/ast-engine/src/replacer/indent.rs:310

  • remove_indent ends with lines.join(&new_line).clone(). join already returns an owned Vec, so the extra .clone() does an unnecessary second allocation/copy. Returning the join result directly avoids that overhead.
        .collect();
    lines.join(&new_line).clone()
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

for ((var, indent), frag) in vars.zip(frags) {
if let Some(bytes) = maybe_get_var(env, var, indent.to_owned()) {
for ((var, indent, is_tab), frag) in vars.zip(frags) {
if let Some(bytes) = maybe_get_var(env, var, indent.to_owned(), is_tab.to_owned()) {
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent.to_owned() / is_tab.to_owned() are unnecessary here since usize and bool are Copy. Passing *indent / *is_tab (or destructuring by value earlier) would be simpler and avoids the extra trait calls.

Suggested change
if let Some(bytes) = maybe_get_var(env, var, indent.to_owned(), is_tab.to_owned()) {
if let Some(bytes) = maybe_get_var(env, var, *indent, *is_tab) {

Copilot uses AI. Check for mistakes.
Comment on lines 267 to +271
if *c == space {
indent += 1;
} else if *c == tab {
indent += 1;
is_tab = true;
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_indent_at_offset_with_tab collapses indentation style into a single is_tab flag (set to true if any tab appears). For mixed indentation (e.g. " \t"), this will re-indent using only tabs and will not preserve the original prefix. Consider returning the actual indentation prefix (or an enum like Space/Tab/ Mixed with the concrete prefix) so re-indentation can reproduce mixed whitespace correctly.

Copilot uses AI. Check for mistakes.
let end = source.chars().count() - trailing_white;
let extracted = extract_with_deindent(&source, start..end);
let result_bytes = indent_lines::<String>(0, &extracted);
let result_bytes = indent_lines::<String>(0, &extracted, source.contains('\t'));
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In test_deindent, using source.contains('\t') to decide is_tab can mis-detect indentation style if the source contains a tab anywhere (e.g. in a string literal) but the indentation at start is spaces. It’d be more accurate to compute (indent, is_tab) from the prefix up to start via get_indent_at_offset_with_tab and pass that is_tab value.

Suggested change
let result_bytes = indent_lines::<String>(0, &extracted, source.contains('\t'));
let (_, is_tab) = get_indent_at_offset_with_tab::<String>(&source, start);
let result_bytes = indent_lines::<String>(0, &extracted, is_tab);

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +106
fn get_tab<C: Content>() -> C::Underlying {
C::decode_str("\t")[0].clone()
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After adding get_tab and tab-aware indentation handling, several doc comments in this module still describe indentation strictly in terms of “spaces” / “space-based indentation” (e.g., DeindentedExtract::MultiLine docs and the module-level limitations). Updating those docs would keep the documentation consistent with the new behavior.

Copilot uses AI. Check for mistakes.
google-labs-jules bot and others added 2 commits March 9, 2026 22:09
Fixed missing backticks in doc comments and replaced slice allocations (`&[var.clone()]`) with `std::slice::from_ref` inside `crates/ast-engine/src/replacer/indent.rs` to satisfy clippy linters.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Fixed an unused variable warning in `crates/language/src/lib.rs` and collapsed an if-let statement in `crates/flow/src/incremental/analyzer.rs` as mandated by clippy.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
@bashandbone
Copy link
Contributor Author

@copilot open a new pull request to apply changes based on the comments in this thread and this one

Copy link
Contributor

Copilot AI commented Mar 9, 2026

@bashandbone I've opened a new pull request, #101, to work on those changes. Once the pull request is ready, I'll request review from you.

@google-labs-jules
Copy link
Contributor

@copilot open a new pull request to apply changes based on the comments in this thread and this one

Acknowledged.

google-labs-jules bot and others added 3 commits March 9, 2026 23:29
Fixed an unused variable warning in `crates/language/src/lib.rs` and collapsed an if-let statement in `crates/flow/src/incremental/analyzer.rs` as mandated by clippy.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
… behavior (#101)

* Initial plan

* fix(ast-engine): address review comments on TAB indentation support

- template.rs:119: use *indent/*is_tab (Copy types) instead of .to_owned()
- indent.rs: fix get_indent_at_offset_with_tab to only set is_tab=true
  for pure-tab indentation; mixed indentation falls back to spaces
- indent.rs:331: use get_indent_at_offset_with_tab in test_deindent
  for accurate is_tab detection instead of source.contains('\t')
- indent.rs:104-106: update doc comments to reflect tab/mixed support

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix(ast-engine): use byte indices in test_deindent helper

Replace .chars().count() with str::trim_start/trim_end length
arithmetic so start/end are byte offsets throughout, making the
helper correct for non-ASCII / multi-byte UTF-8 input.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

👋 Hey @bashandbone,

Thanks for your contribution to thread! 🧵

You need to agree to the CLA first... 🖊️

Before we can accept your contribution, you need to agree to our Contributor License Agreement (CLA).

To agree to the CLA, please comment:

I read the contributors license agreement and I agree to it.

Those exact words are important1, so please don't change them. 😉

You can read the full CLA here: Contributor License Agreement


@bashandbone has signed the CLA.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Footnotes

  1. Our bot needs those exact words to recognize that you agree to the CLA.

google-labs-jules bot and others added 2 commits March 10, 2026 00:05
Fixed an unused variable warning in `crates/language/src/lib.rs` and collapsed an if-let statement in `crates/flow/src/incremental/analyzer.rs` as mandated by clippy.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Fixed an unused variable warning in `crates/language/src/lib.rs` and collapsed an if-let statement in `crates/flow/src/incremental/analyzer.rs` as mandated by clippy.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
@bashandbone bashandbone merged commit 58e59d6 into main Mar 10, 2026
23 of 26 checks passed
@bashandbone bashandbone deleted the fix-indent-tabs-13144396554011743547 branch March 10, 2026 00:08
@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants