feat(ast-engine): support TAB indentation parsing#100
Conversation
* Added `get_tab` utility function for parsing tab characters from content. * Refactored `get_indent_at_offset` to handle tabs by returning `is_tab` along with indent offset. * Handled the stripping and insertion of mixed-tabs vs space indent characters properly inside `remove_indent` and `indent_lines_impl`. * Plumbed the `is_tab` boolean down through `formatted_slice` and `indent_lines`. * Updated tests in `indent.rs` to exercise proper TAB character extraction and re-indentation rules. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideAdds full support for tab-indented code in the ast-engine replacer by propagating an Class diagram for updated indentation and template replacement logicclassDiagram
class Content {
}
class DeindentedExtract {
<<enum>>
MultiLine(lines, original_indent)
SingleLine(line)
}
class IndentModule {
+formatted_slice(content, start, slice) [C::Underlying[]]
+indent_lines(indent, extract, is_tab) Cow~Underlying[]~
+get_indent_at_offset(src) usize
+get_indent_at_offset_with_tab(src) (usize, bool)
-indent_lines_impl(indent, lines, is_tab) Underlying[]
-remove_indent(indent, src) Underlying[]
-get_new_line() Underlying
-get_space() Underlying
-get_tab() Underlying
}
class TemplateFix {
+generate_replacement(nm) Underlying
}
class Template {
-fragments Vec~String~
-vars Vec~(MetaVarExtract, Indent, bool)~
+create_template(tmpl, pattern) Template
}
class MetaVarExtract {
<<enum>>
Range(name)
Single(name)
Transformed(name)
}
class MetaVarEnv {
+get_range(name) Range
+get_source(name) Source
+get_transformed(name) Underlying[]
}
class Doc {
<<trait>>
+get_source() Source
}
class Replacer {
<<trait>>
+generate_replacement(nm) Underlying
}
class NodeMatch {
+get_doc() Doc
+get_env() MetaVarEnv
+range() Range
}
class Source {
+get_range(range) Underlying[]
+decode_str(s) Underlying[]
}
class Range {
+start usize
+end usize
}
%% Relationships
IndentModule ..> Content : uses
IndentModule ..> DeindentedExtract : uses
TemplateFix ..|> Replacer
TemplateFix ..> Template : uses
TemplateFix ..> MetaVarEnv : uses
TemplateFix ..> NodeMatch : uses
TemplateFix ..> IndentModule : uses
Template ..> MetaVarExtract : contains
MetaVarEnv ..> MetaVarExtract : keyed_by
NodeMatch ..> Doc : returns
Doc ..> Source : returns
Source ..> Content : implements
%% Helper functions in template module
class TemplateModule {
+replace_fixer(fixer, env) Underlying[]
+maybe_get_var(env, var, indent, is_tab) Cow~Underlying[]~
}
TemplateModule ..> TemplateFix : uses
TemplateModule ..> MetaVarEnv : uses
TemplateModule ..> MetaVarExtract : uses
TemplateModule ..> IndentModule : uses
TemplateModule ..> DeindentedExtract : uses
Flow diagram for tab-aware indentation detection and applicationflowchart TD
A["Start replacement
(NodeMatch or Template)"] --> B["Get leading source
before insertion point"]
B --> C["Call get_indent_at_offset_with_tab
-> (indent, is_tab)"]
C --> D{indent == 0?}
D -- "Yes" --> E["No additional indent
pass through or deindent only"]
D -- "No" --> F["Prepare DeindentedExtract
from source or template bytes"]
F --> G["Call indent_lines(indent, extract, is_tab)"]
subgraph Indent_lines_logic
G --> H{extract variant}
H -- "SingleLine" --> I["Return original line
unchanged"]
H -- "MultiLine" --> J["Compare requested indent
with original_indent"]
J --> K{indent >= original_indent?}
K -- "Greater" --> L["Call indent_lines_impl
with indent - original_indent
and is_tab"]
K -- "Equal" --> M["Return lines unchanged"]
K -- "Less" --> N["Call indent_lines_impl
with original_indent - indent
and is_tab"]
end
subgraph Indent_lines_impl
L --> O["Choose indent_char:
if is_tab then get_tab
else get_space"]
N --> O
O --> P["Build leading prefix by
repeating indent_char indent times"]
P --> Q["Do not prefix first line"]
Q --> R["For each subsequent line:
append new_line, then leading,
then line content"]
end
C --> S["remove_indent used by
extract_with_deindent"]
subgraph Remove_indent_logic
S --> T["Split src by new_line"]
T --> U["For each line, strip up to indent
leading characters where each
char is space OR tab"]
U --> V["Join lines with new_line"]
end
R --> W["Indented bytes returned
to caller"]
E --> W
V --> F
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Threading a single
is_tab: boolthroughindent_linesand tests (e.g.source.contains('\t')) loses information when indentation is mixed or differs between lines; consider propagating the actual indent character (or per-line indent style) instead of a global flag so replacement preserves existing indentation more faithfully. get_indent_at_offset_with_tabcurrently setsis_tab = trueif any tab is seen in the scanned prefix, not necessarily in the leading indentation sequence itself; if the intent is to mirror the indentation of the current line, it might be less surprising to only markis_tabbased on the contiguous indentation run immediately preceding the content.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Threading a single `is_tab: bool` through `indent_lines` and tests (e.g. `source.contains('\t')`) loses information when indentation is mixed or differs between lines; consider propagating the actual indent character (or per-line indent style) instead of a global flag so replacement preserves existing indentation more faithfully.
- `get_indent_at_offset_with_tab` currently sets `is_tab = true` if *any* tab is seen in the scanned prefix, not necessarily in the leading indentation sequence itself; if the intent is to mirror the indentation of the current line, it might be less surprising to only mark `is_tab` based on the contiguous indentation run immediately preceding the content.
## Individual Comments
### Comment 1
<location path="crates/ast-engine/src/replacer/indent.rs" line_range="255-264" />
<code_context>
+}
+
+/// returns (indent, is_tab)
+pub fn get_indent_at_offset_with_tab<C: Content>(src: &[C::Underlying]) -> (usize, bool) {
let lookahead = src.len().max(MAX_LOOK_AHEAD) - MAX_LOOK_AHEAD;
let mut indent = 0;
+ let mut is_tab = false;
let new_line = get_new_line::<C>();
let space = get_space::<C>();
- // TODO: support TAB. only whitespace is supported now
+ let tab = get_tab::<C>();
for c in src[lookahead..].iter().rev() {
if *c == new_line {
- return indent;
+ return (indent, is_tab);
}
if *c == space {
indent += 1;
+ } else if *c == tab {
+ indent += 1;
+ is_tab = true;
</code_context>
<issue_to_address>
**issue (bug_risk):** The `is_tab` flag becomes true if any tab appears in the indent run, which can flip mixed indentation into all-tabs when re-indenting.
Because `is_tab` is set when *any* tab appears, a prefix like `' '` becomes `(indent = 3, is_tab = true)`, and `indent_lines_impl` will re-indent as three tabs instead of preserving the original mixed whitespace. This can silently change indentation style and visual alignment. Consider instead deriving `is_tab` from:
- The first indent character,
- Separate space/tab counts (choosing the majority), or
- Requiring that all indent characters are tabs before setting `is_tab`.
This would avoid unintended indentation changes when re-indenting or expanding templates.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Fixed formatting issue in `crates/ast-engine/src/replacer/indent.rs` and `crates/ast-engine/src/replacer/template.rs` that was flagged by `cargo fmt --all -- --config-path ./rustfmt.toml --check` in the CI pipeline. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds TAB-aware indentation handling to the ast-engine replacer so multi-line replacements can preserve tab-indented formatting instead of assuming spaces.
Changes:
- Introduces tab detection alongside indent width calculation (
get_indent_at_offset_with_tab) and threads that through re-indentation. - Extends template replacement to carry indentation style information (
is_tab) per metavariable insertion point. - Updates indentation/deindentation logic and adds tests covering tab-indented cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| crates/ast-engine/src/replacer/template.rs | Propagates (indent, is_tab) through template metavariable replacement so inserted content is reindented using tabs when appropriate. |
| crates/ast-engine/src/replacer/indent.rs | Adds tab token support, introduces get_indent_at_offset_with_tab, updates indenting/deindenting to recognize \t, and adds tab-focused tests. |
Comments suppressed due to low confidence (1)
crates/ast-engine/src/replacer/indent.rs:310
remove_indentends withlines.join(&new_line).clone().joinalready returns an ownedVec, so the extra.clone()does an unnecessary second allocation/copy. Returning thejoinresult directly avoids that overhead.
.collect();
lines.join(&new_line).clone()
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for ((var, indent), frag) in vars.zip(frags) { | ||
| if let Some(bytes) = maybe_get_var(env, var, indent.to_owned()) { | ||
| for ((var, indent, is_tab), frag) in vars.zip(frags) { | ||
| if let Some(bytes) = maybe_get_var(env, var, indent.to_owned(), is_tab.to_owned()) { |
There was a problem hiding this comment.
indent.to_owned() / is_tab.to_owned() are unnecessary here since usize and bool are Copy. Passing *indent / *is_tab (or destructuring by value earlier) would be simpler and avoids the extra trait calls.
| if let Some(bytes) = maybe_get_var(env, var, indent.to_owned(), is_tab.to_owned()) { | |
| if let Some(bytes) = maybe_get_var(env, var, *indent, *is_tab) { |
| if *c == space { | ||
| indent += 1; | ||
| } else if *c == tab { | ||
| indent += 1; | ||
| is_tab = true; |
There was a problem hiding this comment.
get_indent_at_offset_with_tab collapses indentation style into a single is_tab flag (set to true if any tab appears). For mixed indentation (e.g. " \t"), this will re-indent using only tabs and will not preserve the original prefix. Consider returning the actual indentation prefix (or an enum like Space/Tab/ Mixed with the concrete prefix) so re-indentation can reproduce mixed whitespace correctly.
| let end = source.chars().count() - trailing_white; | ||
| let extracted = extract_with_deindent(&source, start..end); | ||
| let result_bytes = indent_lines::<String>(0, &extracted); | ||
| let result_bytes = indent_lines::<String>(0, &extracted, source.contains('\t')); |
There was a problem hiding this comment.
In test_deindent, using source.contains('\t') to decide is_tab can mis-detect indentation style if the source contains a tab anywhere (e.g. in a string literal) but the indentation at start is spaces. It’d be more accurate to compute (indent, is_tab) from the prefix up to start via get_indent_at_offset_with_tab and pass that is_tab value.
| let result_bytes = indent_lines::<String>(0, &extracted, source.contains('\t')); | |
| let (_, is_tab) = get_indent_at_offset_with_tab::<String>(&source, start); | |
| let result_bytes = indent_lines::<String>(0, &extracted, is_tab); |
| fn get_tab<C: Content>() -> C::Underlying { | ||
| C::decode_str("\t")[0].clone() | ||
| } |
There was a problem hiding this comment.
After adding get_tab and tab-aware indentation handling, several doc comments in this module still describe indentation strictly in terms of “spaces” / “space-based indentation” (e.g., DeindentedExtract::MultiLine docs and the module-level limitations). Updating those docs would keep the documentation consistent with the new behavior.
Fixed missing backticks in doc comments and replaced slice allocations (`&[var.clone()]`) with `std::slice::from_ref` inside `crates/ast-engine/src/replacer/indent.rs` to satisfy clippy linters. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Fixed an unused variable warning in `crates/language/src/lib.rs` and collapsed an if-let statement in `crates/flow/src/incremental/analyzer.rs` as mandated by clippy. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
|
@copilot open a new pull request to apply changes based on the comments in this thread and this one |
|
@bashandbone I've opened a new pull request, #101, to work on those changes. Once the pull request is ready, I'll request review from you. |
Acknowledged. |
Fixed an unused variable warning in `crates/language/src/lib.rs` and collapsed an if-let statement in `crates/flow/src/incremental/analyzer.rs` as mandated by clippy. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
… behavior (#101) * Initial plan * fix(ast-engine): address review comments on TAB indentation support - template.rs:119: use *indent/*is_tab (Copy types) instead of .to_owned() - indent.rs: fix get_indent_at_offset_with_tab to only set is_tab=true for pure-tab indentation; mixed indentation falls back to spaces - indent.rs:331: use get_indent_at_offset_with_tab in test_deindent for accurate is_tab detection instead of source.contains('\t') - indent.rs:104-106: update doc comments to reflect tab/mixed support Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix(ast-engine): use byte indices in test_deindent helper Replace .chars().count() with str::trim_start/trim_end length arithmetic so start/end are byte offsets throughout, making the helper correct for non-ASCII / multi-byte UTF-8 input. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
|
👋 Hey @bashandbone, Thanks for your contribution to thread! 🧵You need to agree to the CLA first... 🖊️Before we can accept your contribution, you need to agree to our Contributor License Agreement (CLA). To agree to the CLA, please comment:
Those exact words are important1, so please don't change them. 😉 You can read the full CLA here: Contributor License Agreement ✅ @bashandbone has signed the CLA. You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. Footnotes
|
Fixed an unused variable warning in `crates/language/src/lib.rs` and collapsed an if-let statement in `crates/flow/src/incremental/analyzer.rs` as mandated by clippy. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Fixed an unused variable warning in `crates/language/src/lib.rs` and collapsed an if-let statement in `crates/flow/src/incremental/analyzer.rs` as mandated by clippy. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
The
indent.rsparsing logic only supported whitespace/space characters. This change handles checking and preserving literal TAB characters (\t) during extraction and replacing code snippets with multi-line string segments.PR created automatically by Jules for task 13144396554011743547 started by @bashandbone
Summary by Sourcery
Add support for preserving and reapplying TAB-based indentation in AST replacement and templating logic.
New Features:
Tests: