Flow is a local AI-assisted browser-flow capture and extraction harness. It captures browser or fixture evidence, analyzes saved artifacts, creates an extraction plan, requires human approval, generates a reusable extractor, and validates that extractor against the saved HTML.
python -m pip install -e ".[test]"Flow uses filesystem storage only. Runs are written under runs/; generated code is written under generated/.
For authenticated captures, start Chrome with remote debugging on the default Flow port:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=19825Flow calls bb-browser through:
npx -y bb-browser ...Unit tests do not require Chrome, CDP, or bb-browser.
flow capture \
--name demo \
--from-html tests/fixtures/jsonld_job.html \
--goal "Extract job" \
--fields title,company,location,description,apply_url
flow analyze runs/demo/latest
flow review runs/demo/latest
flow approve runs/demo/latest --notes "Correct"
flow generate runs/demo/latest --target python-extractor
flow validate generated/demoDuring development, python -m flow.cli ... works the same as flow ....
LinkedIn authenticated browser capture:
flow capture \
--name linkedin_recommended \
--url "https://www.linkedin.com/jobs/collections/recommended" \
--goal "Extract recommended LinkedIn jobs" \
--browser bb \
--port 19825 \
--fields title,company,location,detail_url,description,apply_urlGoldman Sachs fetch capture:
flow capture \
--name goldman_sachs_jobs \
--url "https://higher.gs.com/results?LOCATION=New%20York&page=1&search=software%20engineer&sort=RELEVANCE" \
--goal "Extract Goldman Sachs software engineering jobs in New York" \
--browser fetch \
--fields title,location,division,description,compensation,apply_urlFlow is read-only by default. It captures, analyzes, extracts, and validates. It does not submit applications, purchase anything, send messages, change account settings, delete data, modify profiles, bypass CAPTCHAs, or store credentials.
python -m compileall flow
pytest -q
python -m flow.cli --help