A modular, investigation-focused OSINT platform for collecting, structuring, and analyzing open-source intelligence using graph-based link analysis.
This system is a data pipeline + investigation workspace.
- Research → collect data
- Processing → structure data
- Graph → store relationships
- Analysis → query graph
- LLM → assist, not decide
Backend:
- FastAPI
- RQ
- Redis
Frontend:
- React + TypeScript
Storage:
- SQLite / PostgreSQL
- Neo4j
LLM:
- Ollama
Frontend → API → Queue → Worker → Processing → Storage → Analysis
osint-platform/ frontend/ backend/ app/ api/ connectors/ extraction/ graph/ services/ models/ worker/
App DB:
- Investigation
- Job
- Evidence
Graph DB: Nodes:
- Person, Organization, Email, Domain, etc.
Relationships:
- ASSOCIATED_WITH
- USES_EMAIL
- SUPPORTS
- Evidence-backed data only
- Research separate from graph analysis
- Jobs instead of blocking requests
- LLM is assistant only
Requirements:
- Python 3.12
- Node.js
- Redis
- Ollama
- Neo4j
Run:
- redis-server
- cd backend && uvicorn app.main:app --reload
- rq worker
- npm run dev
- Run search
- Extract entities
- Store evidence
- Ingest graph
- Query graph
- Summarize
- Keep connectors isolated
- Keep graph logic separate
- Do not mix layers
A modular, investigation-focused OSINT platform for collecting, structuring, and analyzing open-source intelligence using graph-based link analysis.
This repository is designed for:
- local-first development on limited hardware
- clean separation between research, processing, and graph analysis
- scalable architecture without requiring a rewrite
This system is not a search tool. It is a data pipeline + investigation workspace.
Separation of concerns:
- Research → collect and extract data
- Processing → normalize and structure data
- Graph → store entities and relationships
- Analysis → query graph + run algorithms
- LLM → assist, summarize, explain (not source of truth)
Frontend (React)
↓
API (FastAPI)
↓
Queue (Redis)
↓
Worker (RQ)
↓
Processing + Extraction
↓
Storage:
- App DB (SQLite/Postgres)
- Graph DB (Neo4j)
↓
Graph Query + Analysis
↓
LLM (Ollama / API)
- FastAPI
- RQ (background jobs)
- Redis (queue + cache)
- SQLAlchemy
- Neo4j Python Driver
- React + TypeScript
- Vite
- Zustand (state)
- React Query
- React Flow (graph UI)
- SQLite (local) → PostgreSQL (later)
- Neo4j (AuraDB Free or local)
- Filesystem (optional raw artifacts)
- Ollama (local)
- Optional paid API (for higher-quality reasoning)
osint-platform/
docs/
frontend/
backend/
app/
api/
connectors/
extraction/
graph/
services/
models/
schemas/
worker/
Handles external data sources.
Each connector:
- builds queries
- fetches data
- normalizes output
Converts raw data into structured intelligence:
- entities
- relationships
- evidence
Handles Neo4j:
- Cypher templates
- ingestion logic
- graph queries
- algorithms
Runs background jobs:
- research
- extraction
- graph ingestion
- enrichment
Thin layer only:
- creates jobs
- returns results
- never runs heavy logic
- Investigation
- Job
- SourceDocument
- Evidence
- Annotation
- Person
- Organization
- Phone
- Address
- Domain
- Account
- Event
- Evidence
- ASSOCIATED_WITH
- USES_EMAIL
- USES_PHONE
- LOCATED_AT
- OWNS_DOMAIN
- HAS_ACCOUNT
- MENTIONS
- SUPPORTS
Rule:
All relationships must be backed by evidence.
- search sources
- fetch documents
- extract entities + relationships
- store evidence
- query Neo4j
- run graph algorithms
- explore connections
- explain relationships
These are intentionally separate systems.
User request
→ API creates job
→ Worker processes job
→ Results stored
→ Frontend retrieves results
Benefits:
- no blocking requests
- retry support
- scalable execution
- consistent processing
1. User runs search
2. Worker collects data
3. Extract entities + evidence
4. Store candidates
5. Ingest into Neo4j
6. Run graph queries
7. Summarize results
Start with deterministic queries:
- find node by name
- 1-hop / 2-hop neighbors
- shortest path
- evidence lookup
Then add graph algorithms:
- centrality
- similarity
- clustering
- link prediction
LLMs are used for:
- extraction (structured output)
- summarization
- explanation
LLMs are NOT used for:
- defining truth
- modifying graph directly
- replacing structured data
- Python 3.12
- Node.js
- Redis
- Ollama
- Neo4j (AuraDB Free recommended)
The backend is the first working slice of the app right now. Start there before trying to run Redis workers, Neo4j ingestion, or the frontend.
- Create or use the project virtual environment
cd backend
python3 -m venv ../venv
source ../venv/bin/activateIf you already created /Users/hadeelmusallam/Mosaic/venv, reuse it:
source /Users/hadeelmusallam/Mosaic/venv/bin/activate
cd /Users/hadeelmusallam/Mosaic/backend- Install backend dependencies
python3 -m pip install --upgrade pip
python3 -m pip install -e .- Run the FastAPI server
uvicorn app.main:app --reload- Verify the backend is running
Open these in the browser or call them with curl:
http://127.0.0.1:8000/healthhttp://127.0.0.1:8000/docs
The backend currently creates the local SQLite database automatically on startup.
The database file lives at backend/mosaic.db.
GET /healthGET /investigationsPOST /investigationsPATCH /investigations/{investigation_id}/archive
Example create request:
curl -X POST http://127.0.0.1:8000/investigations \
-H "Content-Type: application/json" \
-d '{
"title": "Test Investigation",
"description": "First DB-backed investigation"
}'Once the rest of the local stack is wired up, the intended run commands are:
redis-server
cd backend && uvicorn app.main:app --reload
rq worker
npm run dev
- create investigation
- run research job
- extract entities + evidence
- store candidates
- ingest into Neo4j
- query graph
- visualize results
- Graph Data Science algorithms
- GraphRAG integration
- multi-user investigations
- screenshot service
- export/reporting
- confidence scoring
- do not store every search permanently
- do not mix scraping inside API routes
- do not allow unrestricted Cypher from LLMs
- do not over-engineer infrastructure early
- do not rely on LLMs for truth
- modular
- explainable
- traceable
- scalable
- lightweight for local development
- connectors, extraction, and graph are separate layers
- all data must flow through normalization
- graph queries live only in
graph/ - avoid tight coupling between frontend and sources
- use typed schemas everywhere
- treat evidence as first-class data
- keep research and analysis separate
This platform evolves from:
search tool
into:
structured intelligence system
with:
- reusable data
- explainable relationships
- scalable architecture