Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 14 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ fuori [OPTIONS]
| `--diff <range>` | Export files changed in a diff range |
| `--tree` / `--no-tree` | Include/omit project tree (default: on) |
| `--tree-depth <n>` | Limit tree render depth |
| `--line-numbers` | Prefix exported code lines with line numbers |
| `-s <size_kb>` | Max file size in KB (default: 100) |
| `--warn-tokens <n>` | Warn above token threshold (default: 200k) |
| `--max-tokens <n>` | Hard-fail above token threshold |
Expand All @@ -117,6 +118,7 @@ fuori --diff main...HEAD # Changes since branching from main
fuori -o - > codebase.md # Pipe to stdout
fuori --no-tree # Skip the project tree section
fuori --tree-depth 2 # Shallow tree
fuori --line-numbers --staged # Add line numbers for review-oriented exports
fuori -s 50 # 50 KB file size cap
fuori --warn-tokens 100000 # Earlier token warning
fuori --max-tokens 270000 # Hard token budget
Expand Down Expand Up @@ -183,7 +185,7 @@ Use `--no-git` to force the filesystem walker explicitly.
Additional semantics:

- The default Git-backed mode and explicit Git file-selection modes are scoped to the current working directory subtree when run from a Git subdirectory
- Git-selected files bypass bypass ignore rules at selection time
- Git-selected files bypass ignore rules at selection time
- Git-selected files still go through normal export-time checks such as regular-file validation, symlink skipping, binary detection, size limits, sensitive-file protection, and output-file self-exclusion
- `--unstaged` does not include untracked files
- Renamed files are exported under the current path reported by Git
Expand Down Expand Up @@ -246,13 +248,14 @@ Sensitive files are skipped by default unless `--allow-sensitive` is set.

The output markdown file will contain:

1. A preamble with repository, mode, and generation timestamp metadata plus a short mode description
1. A preamble with repository, mode, and generation timestamp metadata, plus `Line numbers: on` when enabled, and a short mode description
2. A `Change Context` section for `--staged`, `--unstaged`, and `--diff` exports
3. A project tree section that reflects the exported artifact (enabled by default)
4. A header with the file path
5. A code block with the file content
6. Appropriate language identifiers for syntax highlighting
7. A `stderr` summary of files, bytes, and estimated tokens after successful completion
6. Optional line-number prefixes inside code blocks when `--line-numbers` is set
7. Appropriate language identifiers for syntax highlighting
8. A `stderr` summary of files, bytes, and estimated tokens after successful completion

Example file contents excerpt (the `Makefile` section is omitted for brevity):
````markdown
Expand All @@ -261,6 +264,7 @@ Example file contents excerpt (the `Makefile` section is omitted for brevity):
Repository: my-project
Mode: recursive
Generated: 2026-03-16T12:34:56Z
Line numbers: on

This document contains all the source code files from the current directory subtree.

Expand All @@ -275,11 +279,11 @@ This document contains all the source code files from the current directory subt
## src/main.c

```c
#include <stdio.h>

int main() {
printf("Hello, World!\n");
return 0;
}
1 | #include <stdio.h>
2 |
3 | int main() {
4 | printf("Hello, World!\n");
5 | return 0;
6 | }
```
````
138 changes: 49 additions & 89 deletions docs/design.md
Original file line number Diff line number Diff line change
@@ -1,127 +1,87 @@
# Design Notes

This document captures the main architectural decisions behind `fuori`.
It is intentionally short and focused on decisions contributors are likely to touch.
This document records the design boundaries of `fuori`.
It is intentionally short. If a detail belongs in user-facing docs or tests, it should not be repeated here unless it reflects a stable architectural choice.

## Primary Goal

`fuori` is a small, dependency-light Unix CLI that exports a codebase into a single Markdown artifact for LLM and review workflows.
`fuori` is a small, dependency-light Unix CLI that produces a single Markdown artifact for LLM context packing and code review.

Design bias:

- predictable CLI behavior
- easy-to-review C99/POSIX implementation
- minimal dependencies and simple build flow
- practical behavior for real repositories over perfect abstraction purity
- practical, trustworthy behavior over abstraction purity

## Design Strengths
## Non-Goals

These implementation choices are worth preserving because they match the tool's scope well:
`fuori` is not intended to become a context platform.

- Git integration is pragmatic: `fuori` relies on the system `git` binary instead of a heavier embedded Git library, while still falling back to the filesystem walker outside repositories.
- Subprocess handling is defensive: Git commands use careful fork/exec handling so `execvp` failures can be reported reliably rather than inferred indirectly.
- Content filtering is intentionally strict: the collector is biased toward exporting UTF-8-like source text and skipping inputs that are likely to pollute LLM context.
- Secret protection should stay simple and default-on: block obviously sensitive filenames and a short list of high-signal content patterns, warn generically, and allow an explicit per-run override.
- Token estimation is artifact-based: warnings and hard limits are derived from the final rendered Markdown structure rather than just raw source bytes.
- Output handling is atomic: token-limit refusal happens before destination mutation, file output uses `mkstemp` plus `rename`, and temp/final output files are excluded from collection via inode/device checks.
- Git-backed and filesystem-backed selection stay cleanly separated: auto mode prefers Git and falls back quietly, while explicit Git modes remain hard Git-dependent and preserve subtree scoping.
- Rendering and metrics are kept consistent: fence sizing is precomputed once, metrics are based on the actual rendered structure, and tree bytes are counted separately but within the same accounting model.
- Hardening remains pragmatic rather than elaborate: `O_NOFOLLOW` is used when available, opened files are verified by device/inode, Git-selected paths are deduplicated, and output ordering is deterministic.
The project should continue to avoid:

## File Selection Model
- template engines and multiple output formats
- remote cloning or repository discovery features
- TUIs, MCP layers, or service-style integrations
- embedded parser/tokenizer stacks unless the project scope changes materially
- full Git ignore parity beyond what the current filesystem walker needs

`fuori` distinguishes between requested selection mode and resolved selection mode.
## Selection Model

- Default behavior starts in auto mode.
- Inside a Git repository, auto mode prefers Git's view of the current subtree.
- Outside Git, or when Git is unavailable, auto mode falls back silently to the recursive filesystem walker.
- `--no-git` forces the filesystem walker.
- `--from-stdin` accepts caller-supplied paths from standard input and resolves directly to selected-path mode.
- Explicit Git modes such as `--staged`, `--unstaged`, and `--diff` remain hard Git-dependent modes.
- `-0` / `--null` switches stdin record parsing from newline to NUL for safe round-tripping of arbitrary filenames.
Selection and rendering are kept separate on purpose.

Why:

- Git-backed default mode gives correct repository-aware behavior for most real projects.
- Filesystem fallback preserves portability for unpacked archives, non-Git directories, and simple local use.
- Stdin mode is the Unix escape hatch for external path producers without introducing another export pipeline.

## Ignore Behavior

There are two ignore paths by design:

- Git-backed selection uses Git as the source of truth for tracked files and untracked non-ignored files.
- Filesystem recursion uses the local ignore engine in `src/ignore.c`.
- Stdin-backed selection bypasses selection-time ignore matching, just like Git-selected paths.

The local ignore engine exists to support:

- non-Git directories
- `--no-git`
- automatic fallback outside repositories
- Auto mode prefers Git's view of the current subtree and falls back quietly to recursive filesystem walking when Git is unavailable.
- `--no-git` forces filesystem selection.
- `--staged`, `--unstaged`, and `--diff` stay explicitly Git-dependent.
- `--from-stdin` only supplies candidate paths; it does not bypass normal export-time checks.
- Output ordering is deterministic rather than preserving caller input order.

It supports common `.gitignore`-style matching, including recursive `**` globs, but it is not intended to reimplement Git's full layered ignore model.

For stdin-selected paths, the caller chooses the candidate set and the normal export-time gate still applies afterward: regular-file validation, symlink skipping, UTF-8/binary filtering, sensitive-file protection, size limits, output self-exclusion, and deterministic final ordering.
Why:

## Stdin Selection Semantics
- Git-backed default behavior is usually the most correct repository view.
- Filesystem fallback preserves portability and keeps the tool useful outside Git repos.
- Stdin remains the Unix escape hatch without introducing another export pipeline.

Stdin selection is intentionally narrow:
## Filtering and Safety

- `--from-stdin` only changes where paths come from.
- Newline is the default record delimiter for convenience.
- `-0` / `--null` switches parsing to NUL and is the safe choice for arbitrary path bytes.
- Empty records are ignored.
- EOF without a final delimiter still yields a valid final record.
- Stdin-selected paths are sorted and deduplicated before export rather than emitted in pipe order.
The collector should be conservative.

This preserves the project's determinism contract: output order is a property of the selected content, not the caller's input order.
- Exportable files are biased toward UTF-8-like source text.
- Symlinks, binary or invalid-text files, oversized files, and self-output paths are skipped.
- Secret protection stays simple and default-on: high-signal filename and content checks, generic warnings, explicit `--allow-sensitive` override.
- File output remains atomic via temporary-file write plus `rename`.

## In-Memory Export Plan
The goal is not perfect classification. The goal is to avoid obviously bad exports while keeping the implementation understandable.

Accepted files are read into memory and stored in the export plan before rendering.
## Artifact Shape

Why this is intentional:
The Markdown artifact should explain itself with a small amount of grounded metadata.

- the tool enforces a per-file size cap
- rendering and byte/token estimation both need accepted file contents
- caching avoids rereading files during later phases
- the implementation stays simple and deterministic
- The header carries factual export context such as repository, mode, generation time, and conditional review metadata like `Line numbers: on`.
- `Change Context` is reserved for Git delta modes and reflects the subset of selected files that actually survive export-time filtering.
- Line numbers are opt-in because they are useful for review but noisy for pure context packing.
- The project tree is optional and should reflect the final exported artifact, not the raw filesystem.

This is a good tradeoff for the current scope. Streaming should only be revisited if the project explicitly targets much larger exports or lower peak memory usage.
These are not presentation flourishes. They exist to help a reader interpret what the artifact represents.

## Rendering and Metrics

The renderer emits Markdown, while tree helpers and renderer-local metadata preparation support that flow.

Important constraints:

- shared export structures stay format-neutral
- Markdown-specific metadata must not leak into shared collection types
- estimation and emission should consume the same prepared renderer data where possible

This is why fence-length preparation is renderer-owned rather than stored on `ExportEntry`.

## Output Safety Guarantees

The tool is designed to avoid mutating the destination unless export preconditions pass.
Markdown-specific concerns belong in the renderer, not in shared collection structures.

Notable guarantees:
- Shared export data stays format-neutral.
- Renderer-owned preparation is acceptable when it prevents leaking Markdown details into collection types.
- Byte counting and output writing should share the same formatting path where practical so estimates do not drift from emitted output.
- Token budgeting is based on the final Markdown artifact, not raw source bytes.

- token-limit refusal happens before destination mutation
- `--no-clobber` preserves existing files
- final output and temporary output files are excluded from export collection
- normal file output writes via a temporary file and rename path
This keeps size checks honest and makes review-oriented formatting features, such as line numbers, a rendering concern instead of a collection concern.

The CLI integration tests cover these guarantees and should be extended whenever output lifecycle behavior changes.
## Maintenance Rule

## Repository Layout
Keep this document brief and opinionated.

The repository uses a small-app layout:
Update it when:

- `src/` for production code
- `tests/` for test assets
- `docs/` for contributor-facing design notes
- root for project-level files and GitHub metadata
- a new feature changes the project's design boundaries
- a contributor might reasonably ask "why is it shaped this way?"

This keeps the GitHub root readable without introducing library-style structure that the project does not need.
Do not update it for routine feature inventory, CLI examples, or behavior that is already obvious from tests and the README.
5 changes: 3 additions & 2 deletions src/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ int main(int argc, char* argv[]) {
if (options.resolved_mode == FILE_SELECTION_RECURSIVE) {
if (load_ignore_patterns(IGNORE_FILE, &ctx.ignore_patterns, &ctx.ignore_count) != 0) {
fprintf(stderr, "Error: Failed to initialize ignore patterns.\n");
return 1;
goto cleanup;
}
}

Expand Down Expand Up @@ -336,6 +336,7 @@ int main(int argc, char* argv[]) {
render_ctx.selected_paths = selected_paths;
render_ctx.selected_count = selected_count;
render_ctx.diff_range = options.diff_range;
render_ctx.show_line_numbers = options.show_line_numbers;
render_ctx.show_tree = ctx.show_tree;
render_ctx.tree_depth = ctx.tree_depth;

Expand Down Expand Up @@ -429,7 +430,7 @@ int main(int argc, char* argv[]) {
}

errno = 0;
if (render_export_plan(output_file, &plan, &render_info, ctx.verbose) != 0) {
if (render_export_plan(output_file, &plan, &render_info, &render_ctx, ctx.verbose) != 0) {
if (errno != 0) {
perror("Error processing export files");
} else {
Expand Down
3 changes: 3 additions & 0 deletions src/options.c
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ void print_usage(const char* argv0) {
printf(" --tree Include a directory tree section (default)\n");
printf(" --no-tree Omit the directory tree section\n");
printf(" --tree-depth Limit tree rendering depth to N levels\n");
printf(" --line-numbers Prefix exported code lines with line numbers\n");
printf(" -s <size_kb> Set maximum file size limit in KB (default: 100)\n");
printf(" --warn-tokens Warn if estimated tokens exceed N (default: %d)\n",
DEFAULT_WARN_TOKENS);
Expand Down Expand Up @@ -145,6 +146,8 @@ int parse_cli_options(int argc, char* argv[], CliOptions* options) {
options->show_tree = 1;
} else if (strcmp(argv[i], "--no-tree") == 0) {
options->show_tree = 0;
} else if (strcmp(argv[i], "--line-numbers") == 0) {
options->show_line_numbers = 1;
} else if (strcmp(argv[i], "--tree-depth") == 0) {
if (i + 1 < argc) {
if (parse_positive_size_value(argv[++i], "tree depth", SIZE_MAX, &options->tree_depth) != 0) {
Expand Down
1 change: 1 addition & 0 deletions src/options.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ typedef struct {
int output_is_stdout;
int stdin_null_delim;
int show_tree;
int show_line_numbers;
int allow_sensitive;
size_t max_file_size;
size_t tree_depth;
Expand Down
Loading