Skip to content

juwy/jupipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JuPipe

A file-driven, multi-step LLM pipeline runner. Built by JUWY — Julian Weyer.

JuPipe executes LLM workflows defined in plain-text .flowdef files. It supports any LLM provider through LiteLLM — OpenAI, Anthropic, Ollama (local), Azure, Mistral, and others.

The entire workflow definition lives in text files. No framework code, no GUI, no vendor lock-in.

Personal project. JuPipe is actively used for real workflows, but it is maintained by one person, has no formal test suite, and no support is provided. Use at your own risk.


How it differs from other tools

There are several established tools for orchestrating LLM calls. JuPipe takes a different approach.

LangChain is a Python framework for building LLM-powered applications. It provides abstractions for chains, agents, memory, and retrieval — a good fit for complex applications built around these primitives. JuPipe requires no framework code: workflows are plain text files, and provider switching is a one-line config change.

n8n / Windmill are visual workflow automation platforms with LLM integration. They are designed for GUI-driven workflows and run as hosted or self-hosted services; their workflow definitions are stored as JSON. JuPipe is a local CLI tool whose .flowdef files are plain text that integrates naturally with version control.

Direct API scripts (e.g. a Python script calling the OpenAI SDK) are a reasonable starting point for simple, single-call workflows. JuPipe covers the multi-step case — chained calls, loops, conditionals — without requiring code.

JuPipe is suited for power users, scripters, and automators who want to define and version-control LLM pipelines with minimal overhead.


Features

  • Multi-step pipelines — chain LLM calls where each step can reference results from previous steps
  • Any LLM provider — via LiteLLM: OpenAI, Anthropic, Ollama (local), Azure, Mistral, Groq, and more
  • Flexible model configuration — global defaults in config.yml, per-file defaults in [DEFAULTS], per-step overrides
  • Foreach loops — iterate a step over JSON arrays from previous results
  • Conditional steps — execute steps only when conditions on previous results are met
  • Inline Python scripts — embed Python expressions and blocks directly in templates
  • Multiple outputs — generate several output files from a single flow, with configurable filenames and overwrite behaviour
  • JSON dot-notation — access nested JSON fields with step["id"].response.key[0].name
  • Streaming output — see thinking and response tokens in the terminal as they arrive
  • Debug mode — save all intermediate results, resolved prompts, and metadata as YAML
  • PDF input support — use .pdf files as input alongside .txt files; text is extracted automatically
  • Wildcard execution — run all combinations of *.flowdef × *.pdf × *.txt in one command

Requirements

Python 3.10 or later.

pip install litellm pyyaml

For PDF input support (optional):

pip install pdfplumber

Editor Support

A VS Code extension for .flowdef syntax highlighting is included in the vscode-jupipe/ directory.

It highlights:

  • Section headers: [STEP], [OUTPUT], [SCRIPT], [DEFAULTS]
  • Header attributes: id=, condition=, foreach=, filename=, overwrite=
  • Configuration keys: model, temperature, format, max_tokens, …
  • The prompt = keyword
  • Placeholders: {input}, {step["id"].response.field}, {item}, …
  • Inline Python: {{= expr }} and {{% block %}} with full Python highlighting
  • Comments: # …

Installation — VS Code native (Windows/macOS/Linux):

cp -r vscode-jupipe ~/.vscode/extensions/jupipe-flowdef

Installation — VS Code Remote WSL:

cp -r vscode-jupipe ~/.vscode-server/extensions/jupipe-flowdef

Reload VS Code (Developer: Reload Window) after copying.


Quick Start

1. Configure models

Create config.yml in the JuPipe package directory:

default_model: local_qwen

models:
  local_qwen:
    model: ollama/qwen3.5:9b
    api_base: http://localhost:11434
    defaults:
      temperature: 0.7
      max_tokens: 4096

  cloud_api:
    model: openai/gpt-4o
    api_base: ${MYAPI_BASE_URL}
    api_key: ${MYAPI_KEY}
    defaults:
      temperature: 0.5

API keys and base URLs can reference environment variables with ${VAR_NAME}. These are resolved in the following order:

  1. <package_dir>/.env (highest priority)
  2. ~/.secrets/.env (global fallback)
  3. Process environment variables

Example .env file:

# Place in jupipe/ or ~/.secrets/
MYAPI_BASE_URL=https://api.example.com/v1
MYAPI_KEY=sk-your-key-here

2. Write a flow definition

Create summarize.flowdef:

[DEFAULTS]
model = local_qwen

[STEP id="summary"]
temperature = 0.3
prompt =
Summarize the following text in 3 bullet points:
{input}

[OUTPUT filename="summary_{input_name}.md"]
# Summary
{step["summary"].response}

3. Run it

python -m jupipe summarize.flowdef my_document.txt

With debug output:

python -m jupipe summarize.flowdef my_document.txt --debug

With a PDF input file:

python -m jupipe summarize.flowdef report.pdf

With wildcards (processes all combinations):

python -m jupipe *.flowdef data/*.txt data/*.pdf

Configuration

config.yml

The config.yml file defines available models and a global default. It is located in the JuPipe package directory.

Each model entry has:

  • model — the full LiteLLM model string
  • api_base — API endpoint (optional, supports ${ENV_VAR})
  • api_key — API key (optional, supports ${ENV_VAR})
  • defaults — default generation parameters (temperature, max_tokens, etc.)

The default_model key specifies which model alias is used when no model is configured in the .flowdef file.

Model resolution order

When a step is executed, the model is determined by the first match in this chain:

  1. model = ... on the [STEP] itself
  2. model = ... in the [DEFAULTS] block of the .flowdef file
  3. default_model in config.yml

If the alias is not found in config.yml, it is passed through to LiteLLM as a raw model string.


Flow Definition Reference

A .flowdef file is a plain text file with three types of sections.

Lines starting with # are treated as comments only between a [STEP] (or [DEFAULTS]) header and its prompt = line. Inside prompt bodies and output bodies, # is literal content and will be included in the prompt or output as-is.

[DEFAULTS]

Optional. At most one per file. Sets default configuration for all steps.

[DEFAULTS]
model = local_qwen
temperature = 0.5

[STEP id="..."]

Defines a single LLM call. Each step needs a unique ID. Steps are executed in the order they appear.

[STEP id="analyse"]
model = cloud_api
temperature = 0.3
format = json
prompt =
Analyse this text and return JSON:
{input}

Configuration keys before prompt = are passed as generation parameters. Reserved keys:

  • model — model alias from config.yml
  • format — set to json to force structured output
  • think — set to true to enable chain-of-thought (model must support it)
  • reasoning_effort — set to low, medium, or high to control reasoning depth (supported by Langdock, OpenAI)
  • stream — set to false to disable streaming for models or endpoints that do not support it

All other keys (temperature, max_tokens, top_p, etc.) are passed directly to LiteLLM.

[OUTPUT]

Defines an output file. You can have multiple OUTPUT blocks per .flowdef.

Attributes:

  • filename="..." — output filename, supports placeholders (see below). If omitted, defaults to {flow_name}_{input_name}.txt.
  • overwrite=true|false — if true, overwrites existing files. If false (default), appends a counter (_1, _2, ...).
[OUTPUT filename="report_{input_name}.md" overwrite=true]
# Report
{step["summary"].response}

[OUTPUT filename="data_{date}.json" overwrite=false]
{step["analyse"].response}

[OUTPUT]
Default filename, no overwrite.
{step["result"].response}

Filename placeholders: {input_name}, {flow_name}, {date}, and any step reference.


Placeholders

Placeholders can be used in prompts, output bodies, conditions, and filenames.

Input content

{input} — the full content of the input file.

{input_name} — the stem of the input filename (e.g. document for document.txt).

{flow_name} — the stem of the flowdef filename (e.g. summarize for summarize.flowdef).

{date} — the current date in YYYY-MM-DD format (UTC).

These placeholders are available in prompts, output bodies, conditions, and filenames.

Step results

{step["id"].response} — the full response text of a previous step.

{step["id"].thinking} — the thinking/reasoning text (if the model produced one).

JSON dot-notation

If a step returned JSON, you can access nested fields:

{step["analyse"].response.title}
{step["analyse"].response.items[0].name}
{step["analyse"].response.metadata.author}

Foreach iteration access

For steps executed with foreach, the result is an array. Access individual iterations or all of them:

{step["loop"][0].response}          # first iteration
{step["loop"][2].response.score}    # nested JSON in third iteration
{step["loop"][].response}           # JSON array of all responses

Automatic JSON Cleanup

LLMs sometimes wrap JSON responses in Markdown code fences even when not instructed to do so. JuPipe automatically strips these fences before parsing JSON. This means dot-notation access ({step["id"].response.field}) and the step accessor in inline scripts work correctly regardless of whether the model wraps its output in code fences or not.


Foreach Loops

A step can be repeated for each element in a JSON array from a previous step.

[STEP id="split"]
format = json
prompt =
Split this text into chapters. Return a JSON array of chapter titles.
{input}

[STEP id="process" foreach={step["split"].response[]}]
prompt =
Summarize this chapter: {item}
Chapter index: {item_index}

Inside a foreach step, two additional placeholders are available:

  • {item} — the current element from the array
  • {item_index} — the zero-based index of the current iteration

The result of a foreach step is stored as a list. See "Foreach iteration access" above for how to reference it in later steps.


Conditional Steps

A step can be made conditional on the result of a previous step.

[STEP id="check" condition='{step["analyse"].response.risk_level} > 3']
prompt =
This requires escalation. Explain why the risk level is high.
{input}

Supported operators:

  • ==, != — equality / inequality
  • >, <, >=, <= — numeric comparison (falls back to string comparison for non-numeric values)
  • is empty — true if the value is null, "", [], or {}
  • is not empty — true if the value is not empty
  • contains — substring or list membership ({...} contains "warn")
  • in — value is a member of a list ("admin" in {step["x"].response.roles})

If the condition is not met, the step is skipped. Skipped steps return an empty response.


Inline Python Scripts

You can embed Python code directly in prompts and output bodies. This is useful for transformations, filtering, or formatting that go beyond simple placeholders.

Inline expressions

Single-line expressions that are replaced by their return value:

Number of topics: {{= len(step["analyse"].response.topics) }}
First topic:      {{= step["analyse"].response.topics[0] }}

Script blocks

Multi-line Python code. Everything written to print() replaces the block:

{{%
for item in step["overview"].response.results:
    if item.get("score", 0) > 0.8:
        print(f"- {item['name']}: {item['score']}")
%}}

Available variables in scripts

  • input — input file content (str)
  • step — dict-like accessor for step results; supports step["id"].response, step["id"].thinking, and attribute access into parsed JSON
  • json, re, math, datetime — standard library modules
  • resolve(data, path) — dot-notation traversal helper
  • step_raw — raw dictionary of all step results, useful for direct dictionary access
  • now_str — current date and time as a fully formatted string (YYYY-MM-DD HH:MM)

⚠️👮 Scripts run with full Python privileges. Do not run untrusted .flowdef files.


Debug Mode

Pass --debug to save a YAML file alongside each output containing:

  • Flow run metadata (paths, timestamps, success/failure)
  • The [DEFAULTS] configuration
  • Per-step details: resolved prompt, raw response, thinking, timestamps, token usage
  • For foreach steps: all iterations with their individual results
  • Per-output: template and resolved content

The debug file is named <flow>_<input>_debug.yaml and never overwrites existing debug files.


Rate Limiting & Retry

When an API returns a 429 Too Many Requests error, JuPipe automatically retries the request with a waiting period extracted from the error message (e.g. "Please try again in 30 seconds"). If no wait time is given, a minimum of 35 seconds is used.

JuPipe retries up to 5 times before aborting the step with an error.

The terminal shows the wait time and attempt count:

⏳ Rate limit hit. Waiting 35s before retry (attempt 1/5) ...

File Structure

jupipe/
  __init__.py         # Package metadata
  __main__.py         # CLI entry point
  config.py           # config.yml + .env loading, model resolution
  parser.py           # .flowdef file parser
  placeholders.py     # Placeholder and dot-notation resolution
  conditions.py       # Condition expression evaluation
  scripting.py        # Inline Python script processing
  engine.py           # Flow execution orchestrator
  output.py           # Output file management + debug YAML export
  config.yml          # Model configuration (user-created)
  .env                # API keys (user-created, optional, not in git)

vscode-jupipe/        # VS Code extension for .flowdef syntax highlighting

~/.secrets/
  .env                # Global API keys (optional, shared across projects)

About

A file-driven, multi-step LLM pipeline runner. Built by JUWY.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages