InsightFlow

InsightFlow 是一个 AI Native Web 应用，专注于职业内容搜索与有据可查的摘要生成。它摄入结构化的就业市场内容，通过混合检索获取证据，返回带有引用、证据片段和来源卡片的结构化摘要。项目定位清晰：聚焦于求职、实习和职业研究的检索与综合，而非通用的聊天产品或社交平台。

功能特点

将 CSV 和 JSON 数据集导入本地 SQLite 存储
在标准化、分块、索引和检索全流程中保留来源元数据
基于 SQLite 关键词搜索和 FAISS 向量索引构建混合检索
通过 FastAPI 后端生成有据可查的结构化摘要
在 React 前端展示摘要分节、来源卡片和证据片段
支持本地开发的 fake provider 和生产环境的 OpenAI-compatible provider

技术栈

后端：FastAPI、SQLAlchemy、SQLite
检索：SQLite 关键词搜索 + FAISS 向量索引 + 确定性的合并与重排流程
前端：React、TypeScript、Vite
测试：pytest、Vitest

仓库结构

insightflow/
  backend/      FastAPI 应用、检索管道、数据导入逻辑、测试和脚本
  frontend/     React 客户端，用于搜索和有据摘要展示
  data/        本地原始数据和处理后数据目录
  docs/        验证笔记、实验记录和迭代历史
  .planning/    GSD 规划产物和项目状态

当前范围

MVP 已实现：

内容导入
支持父文档溯源的分块
混合检索
结构化摘要 API
引用链接和证据片段提取
前端来源卡片渲染

v1 明确不在范围内：

用户账户体系
社交/社区功能
个性化推荐
多轮记忆对话
移动端 App

快速上手

1. 引导后端

.\bootstrap.ps1

该脚本准备后端虚拟环境并安装依赖。

2. 启动 API

.\bootstrap.ps1 -RunApp

核心后端接口：

GET /healthz
POST /imports
POST /retrieval/index/build
POST /retrieval/search
POST /summary

3. 启动前端

Set-Location frontend
npm install
$VITE_API_BASE_URL = "http://127.0.0.1:8001"
npm run dev -- --host 127.0.0.1 --port 5173

推荐本地组合：

后端：127.0.0.1:8001
前端：127.0.0.1:5173

本地工作流示例

导入样本数据：

& .\backend\.venv\Scripts\python.exe -m app.cli.import_batch
  --input backend\tests\fixtures\sample_job_dataset.csv
  --source sample_dataset
  --source-type dataset
  --db data\processed\insightflow.db

构建检索索引：

Set-Location backend
$CHAT_MODEL_PROVIDER = "fake"
$FORCE_REBUILD = "true"
& .\.venv\Scripts\python.exe scripts\build_retrieval_index.py
  --db sqlite:///../data/processed/insightflow.db
  --vector-index-dir ../data/processed/vector_index

使用 fake provider 运行后端：

Set-Location backend
$CHAT_MODEL_PROVIDER = "fake"
$EMBEDDING_MODEL_PROVIDER = "fake"
$FORCE_REBUILD = "true"
& .\.venv\Scripts\python.exe -m uvicorn app.main:app --host 127.0.0.1 --port 8001

Provider 配置

InsightFlow 支持本地测试的 fake provider 和真实推理的 OpenAI-compatible provider。

真实 provider 配置示例：

$CHAT_MODEL_PROVIDER = "openai_compatible"
$CHAT_API_BASE_URL = "https://api.example.com/v1"
$CHAT_API_KEY = "your-chat-key"
$CHAT_MODEL_NAME = "your-chat-model"
$EMBEDDING_MODEL_PROVIDER = "openai_compatible"
$EMBEDDING_API_BASE_URL = "https://api.example.com/v1"
$EMBEDDING_API_KEY = "your-embedding-key"
$EMBEDDING_MODEL_NAME = "your-embedding-model"

重要默认路径：

数据库：data/processed/insightflow.db
向量索引：data/processed/vector_index

测试

后端：

& .\backend\.venv\Scripts\python.exe -m pytest backend\tests -q

前端：

Set-Location frontend
npm test

构建前端：

Set-Location frontend
npm run build

设计原则

证据优先：摘要应始终位于检索证据范围之内
来源透明：每个主张都应可追溯到引用和片段
模块化后端：API、services、retrieval、db、schemas 和 prompts 保持清晰分离
聚焦范围：先在职业信息检索与综合上做透，再考虑扩展

文档

项目背景：.planning/PROJECT.md
需求说明：.planning/REQUIREMENTS.md
路线图：.planning/ROADMAP.md
当前状态：.planning/STATE.md
迭代日志：docs/iteration_log.md

项目状态

仓库已完成 MVP 初始实施路径，目前处于验证、检索质量调优和发布就绪阶段。

许可证

本项目当前采用 MIT License 发布。

发布说明

首个公开版本发布说明见 docs/release_notes_v0.1.0.md。

English

InsightFlow is an AI-native web app for career-content search and grounded summarization. It ingests structured job-market content, retrieves evidence with hybrid search, and returns structured summaries with citations, evidence snippets, and source cards.

The project is intentionally narrow: it focuses on retrieval and summarization for job, internship, and career research, rather than being a general-purpose chat product or social platform.

What It Does

Imports CSV and JSON datasets into a local SQLite store
Preserves source metadata across normalization, chunking, indexing, and retrieval
Builds hybrid retrieval with SQLite keyword search and a FAISS vector index
Produces evidence-backed structured summaries through a FastAPI backend
Shows summary sections, source cards, and evidence snippets in a React frontend
Supports fake providers for local development and OpenAI-compatible providers for real inference

Stack

Backend: FastAPI, SQLAlchemy, SQLite
Retrieval: SQLite keyword search, FAISS vector index, deterministic merge and rerank pipeline
Frontend: React, TypeScript, Vite
Testing: pytest, Vitest

Repository Structure

insightflow/
  backend/      FastAPI app, retrieval pipeline, ingestion logic, tests, and scripts
  frontend/     React client for search and grounded summary display
  data/         Local raw and processed data directories
  docs/         Validation notes, experiments, and iteration history
  .planning/    GSD planning artifacts and project state

Current Scope

Implemented in the current MVP:

Content import
Chunking with parent-document traceability
Hybrid retrieval
Structured summary API
Citation linking and evidence snippet extraction
Source-card rendering in the UI

Explicitly out of scope for v1:

User accounts
Social/community features
Personalized recommendation
Multi-turn memory chat
Mobile app

Quick Start

1. Bootstrap the backend

.\bootstrap.ps1

This script prepares the backend virtual environment and installs dependencies.

2. Start the API

.\bootstrap.ps1 -RunApp

Core backend endpoints:

GET /healthz
POST /imports
POST /retrieval/index/build
POST /retrieval/search
POST /summary

3. Start the frontend

Set-Location frontend
npm install
$VITE_API_BASE_URL = "http://127.0.0.1:8001"
npm run dev -- --host 127.0.0.1 --port 5173

Recommended local pairing:

Backend: 127.0.0.1:8001
Frontend: 127.0.0.1:5173

Example Local Workflow

Import sample data:

& .\backend\.venv\Scripts\python.exe -m app.cli.import_batch
  --input backend\tests\fixtures\sample_job_dataset.csv
  --source sample_dataset
  --source-type dataset
  --db data\processed\insightflow.db

Build retrieval indexes:

Set-Location backend
$CHAT_MODEL_PROVIDER = "fake"
$FORCE_REBUILD = "true"
& .\.venv\Scripts\python.exe scripts\build_retrieval_index.py
  --db sqlite:///../data/processed/insightflow.db
  --vector-index-dir ../data/processed/vector_index

Run the backend with fake providers:

Set-Location backend
$CHAT_MODEL_PROVIDER = "fake"
$EMBEDDING_MODEL_PROVIDER = "fake"
$FORCE_REBUILD = "true"
& .\.venv\Scripts\python.exe -m uvicorn app.main:app --host 127.0.0.1 --port 8001

Provider Configuration

InsightFlow supports fake providers for local testing and OpenAI-compatible providers for real inference.

Example real-provider setup:

$CHAT_MODEL_PROVIDER = "openai_compatible"
$CHAT_API_BASE_URL = "https://api.example.com/v1"
$CHAT_API_KEY = "your-chat-key"
$CHAT_MODEL_NAME = "your-chat-model"
$EMBEDDING_MODEL_PROVIDER = "openai_compatible"
$EMBEDDING_API_BASE_URL = "https://api.example.com/v1"
$EMBEDDING_API_KEY = "your-embedding-key"
$EMBEDDING_MODEL_NAME = "your-embedding-model"

Important default paths:

Database: data/processed/insightflow.db
Vector index: data/processed/vector_index

Testing

Backend:

& .\backend\.venv\Scripts\python.exe -m pytest backend\tests -q

Frontend:

Set-Location frontend
npm test

Build frontend:

Set-Location frontend
npm run build

Design Principles

Evidence first: summaries should stay within retrieved evidence boundaries
Source transparency: each claim should be traceable to citations and snippets
Modular backend: API, services, retrieval, db, schemas, and prompts remain clearly separated
Narrow scope: solve career-information retrieval and synthesis well before expanding

Documentation

Project context: .planning/PROJECT.md
Requirements: .planning/REQUIREMENTS.md
Roadmap: .planning/ROADMAP.md
Current state: .planning/STATE.md
Iteration log: docs/iteration_log.md

Project Status

The repository has completed the initial MVP implementation path and is currently in a verification, retrieval-quality tuning, and release-readiness phase.

License

This project is currently released under the MIT License.

Release Notes

The first public release notes are available at docs/release_notes_v0.1.0.md.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.planning		.planning
backend		backend
data		data
docs		docs
frontend		frontend
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
PHASE-1-VALIDATION.md		PHASE-1-VALIDATION.md
README.md		README.md
bootstrap.ps1		bootstrap.ps1
compose.yaml		compose.yaml

Folders and files

Latest commit

History

Repository files navigation

InsightFlow

功能特点

技术栈

仓库结构

当前范围

快速上手

1. 引导后端

2. 启动 API

3. 启动前端

本地工作流示例

Provider 配置

测试

设计原则

文档

项目状态

许可证

发布说明

English

What It Does

Stack

Repository Structure

Current Scope

Quick Start

1. Bootstrap the backend

2. Start the API

3. Start the frontend

Example Local Workflow

Provider Configuration

Testing

Design Principles

Documentation

Project Status

License

Release Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages