Skip to content

olostep-api/CLI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Olostep CLI

Python npm License

Command-line interface for the Olostep API — search, map, answer, scrape, crawl, and batch workflows from your terminal. Outputs are structured JSON (pretty-printed) so you can pipe them into jq, agents, and CI without writing a custom client first.

The same CLI is available as a standalone binary (no Python required) via npm, or from source with Python.


Table of contents


Installation

npm (recommended — standalone binary)

Installs a platform-specific binary on postinstall (macOS arm64/x64, Linux x64, Windows x64). Node.js 16+ is required only for install; the olostep command does not use Python.

npm install -g olostep-cli

Run without a global install:

npx -y olostep-cli@latest --help

If the binary failed to download, reinstall or check that a GitHub release exists for your package version and platform.

Python (from this repository)

For development or when you want to run the Typer app directly:

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -U pip
pip install -e .

The console script is olostep. You can also run python main.py ....

Package metadata: see pyproject.toml (olostep-api-cli).


Authentication

Set one of these (process environment or a .env file next to the working directory / binary):

Variable Description
OLOSTEP_API_KEY API key (preferred)
OLOSTEP_API_TOKEN Alternative token name

Create keys in the Olostep API Keys dashboard.

Batch commands resolve the token via resolve_api_key(); map/answer/scrape/crawl use the same credentials through config/config.py. Defaults include API_BASE_URL (https://api.olostep.com/v1) and batch base URL — change there if you need another environment.


Quick start

export OLOSTEP_API_KEY=your_key_here

olostep --help
olostep map "https://example.com" --top-n 20
olostep scrape "https://example.com" --formats markdown,html

Output, stdout, and agents

  • --out <path> — Write JSON results to a file. Parent directories are created automatically.
  • --out - — Write only the JSON result to stdout (UTF-8, indented). Use this for pipelines, subprocess capture in agents, and tools that expect machine-readable output on stdout.
  • Logs (e.g. logger.info, progress) go to stderr, so you can redirect or ignore them while keeping clean JSON on stdout.

Examples:

olostep map "https://example.com" --top-n 50 --out - | jq '.urls[:10]'
olostep answer "What is Olostep?" --out - | jq .result
olostep scrape "https://example.com" --out - | jq .result.markdown_content

CI-style usage:

export OLOSTEP_API_KEY="${{ secrets.OLOSTEP_API_KEY }}"
olostep scrape "https://docs.example.com" --out result.json

Commands

Run olostep <command> --help for full option text. HTTP timeout for most API-backed commands: --timeout (seconds).

map — discover URLs

Map a site to discover URLs (Olostep Maps).

Option Description
--out Output path or - for stdout
--top-n Max URLs to return
--search-query Optional query to guide discovery
--include-subdomain / --no-include-subdomain Include subdomains
--include-url Repeatable URL patterns to include
--exclude-url Repeatable URL patterns to exclude
--cursor Pagination cursor
--timeout HTTP timeout (s)
olostep map "https://example.com" --top-n 100 --search-query "blog"
olostep map "https://example.com" --include-subdomain --out - | jq '.urls[:5]'

Compatibility: --limit was removed — use --top-n.


answer — researched answers

Ask a question; the CLI polls until the answer job completes.

Option Description
--out Output path or -
--json-format Optional JSON shape / schema hint (string or JSON object string)
--poll-interval Polling interval (seconds)
--poll-timeout Max time to wait (seconds)
--timeout HTTP timeout (s)
olostep answer "Summarize this company's product" --out output/answer.json
olostep answer "Extract company facts" --json-format '{"company":"","country":""}' --out -

Compatibility: --model was removed — use --json-format.


scrape — single URL

Scrape one URL in one or more formats.

Formats (comma-separated): html, markdown, text, json, raw_pdf, screenshot.

Option Description
--out Output path or -
--formats Comma-separated formats (default: markdown)
--country Optional country code
--wait-before-scraping Wait before scrape (milliseconds)
--payload-json Advanced scrape options as a JSON object string
--payload-file Same as above, from a JSON file (mutually exclusive with --payload-json)
--timeout HTTP timeout (s)
olostep scrape "https://example.com/article" --formats markdown,text
olostep scrape "https://example.com" --country US --wait-before-scraping 2000
olostep scrape "https://example.com" --payload-file advanced.json --out - | jq .

scrape-get — fetch by ID

Retrieve a previous scrape by ID.

olostep scrape-get "scrape_abc123" --out output/scrape_get.json
olostep scrape-get "scrape_abc123" --out - | jq .result.markdown_content

crawl — multi-page

Start a crawl, poll until finished, then retrieve page contents.

Retrieve formats (comma-separated): markdown, html, json.

Option Description
--out Output path or -
--max-pages Maximum pages to crawl
--max-depth Optional max depth
--include-subdomain / --no-include-subdomain Subdomains
--include-external / --no-include-external External domains
--include-url / --exclude-url Repeatable path/URL patterns
--search-query / --top-n Optional discovery filter and cap
--webhook Optional webhook URL
--crawl-timeout Crawl timeout (seconds)
--follow-robots-txt / --ignore-robots-txt robots.txt
--formats Retrieve formats
--pages-limit Page size for crawl pages API
--pages-search-query Filter when listing pages
--poll-seconds / --poll-timeout Polling
--timeout HTTP timeout (s)
--dry-run Print API payload JSON and exit (no request)
olostep crawl "https://docs.example.com" --max-pages 50 --formats markdown,html
olostep crawl "https://example.com" --max-pages 10 --dry-run

batch-scrape — CSV

Submit many URLs from a CSV with columns custom_id/id and url. Polls until completion.

Option Description
--out Output path or -
--formats markdown, html, json (comma-separated)
--country Optional country code
--parser-id Optional parser for structured extraction
--poll-seconds Poll interval
--log-every Log every N polls
--items-limit Batch items page size (API often suggests 10–50)
--dry-run Print payload JSON and exit
olostep batch-scrape urls.csv --formats markdown,html --country US
olostep batch-scrape urls.csv --parser-id "<PARSER_ID>" --out results.json

batch-update — metadata

Update metadata on an existing batch. One of --metadata-json or --metadata-file is required (JSON object).

olostep batch-update "batch_abc123" --metadata-json '{"team":"growth"}'
olostep batch-update "batch_abc123" --metadata-file meta.json --out -

Default output paths

If you omit --out, JSON is written under output/:

Command Default file
map output/map.json
answer output/answer.json
scrape output/scrape.json
scrape-get output/scrape_get.json
crawl output/crawl_results.json
batch-scrape output/batch_results.json
batch-update output/batch_update.json

Global options

Option Description
-V, --version Print CLI version and exit
-h, --help Help (Typer / Rich)

Project structure

.
├── main.py                 # Typer CLI entrypoint
├── pyproject.toml          # Python package + `olostep` script
├── olostep.spec            # PyInstaller spec for release binaries
├── npm/
│   ├── package.json        # olostep-cli on npm
│   ├── bin/olostep.js      # Node shim → native binary
│   └── scripts/postinstall.js
├── config/
│   └── config.py           # Env, defaults, base URLs
├── src/
│   ├── api_client.py
│   ├── map_api.py
│   ├── answer_api.py
│   ├── scrape_api.py
│   ├── crawl_api.py
│   ├── batch_api.py
│   └── batch_scraper.py
└── utils/
    └── utils.py            # JSON output, stdout `-`, polling helpers

Development

python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[test]"
pytest
olostep --help

Release binaries are built with PyInstaller (pip install -e ".[build]"); see .github/workflows/release.yml.


Security

  • Do not commit .env or API keys.
  • Rotate keys if they are exposed.

Olostep references

License

MIT — see LICENSE.

About

The fastest way to get clean web data into your AI workflows. Search, scrape, and crawl the web from your terminal with Olostep — no headless browsers, no anti-bot headaches, no infra.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors