Skip to content

logicluna-dev/Mosaic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mosaic

OSINT Graph Platform

A modular, investigation-focused OSINT platform for collecting, structuring, and analyzing open-source intelligence using graph-based link analysis.


Core Concept

This system is a data pipeline + investigation workspace.

  • Research → collect data
  • Processing → structure data
  • Graph → store relationships
  • Analysis → query graph
  • LLM → assist, not decide

Tech Stack

Backend:

  • FastAPI
  • RQ
  • Redis

Frontend:

  • React + TypeScript

Storage:

  • SQLite / PostgreSQL
  • Neo4j

LLM:

  • Ollama

Architecture

Frontend → API → Queue → Worker → Processing → Storage → Analysis


Project Structure

osint-platform/ frontend/ backend/ app/ api/ connectors/ extraction/ graph/ services/ models/ worker/


Data Model

App DB:

  • Investigation
  • Job
  • Evidence

Graph DB: Nodes:

  • Person, Organization, Email, Domain, etc.

Relationships:

  • ASSOCIATED_WITH
  • USES_EMAIL
  • SUPPORTS

Principles

  • Evidence-backed data only
  • Research separate from graph analysis
  • Jobs instead of blocking requests
  • LLM is assistant only

Local Setup

Requirements:

  • Python 3.12
  • Node.js
  • Redis
  • Ollama
  • Neo4j

Run:

  • redis-server
  • cd backend && uvicorn app.main:app --reload
  • rq worker
  • npm run dev

Workflow

  1. Run search
  2. Extract entities
  3. Store evidence
  4. Ingest graph
  5. Query graph
  6. Summarize

Notes

  • Keep connectors isolated
  • Keep graph logic separate
  • Do not mix layers

Mosaic

OSINT Graph Platform

A modular, investigation-focused OSINT platform for collecting, structuring, and analyzing open-source intelligence using graph-based link analysis.

This repository is designed for:

  • local-first development on limited hardware
  • clean separation between research, processing, and graph analysis
  • scalable architecture without requiring a rewrite

Core Concept

This system is not a search tool. It is a data pipeline + investigation workspace.

Separation of concerns:

  • Research → collect and extract data
  • Processing → normalize and structure data
  • Graph → store entities and relationships
  • Analysis → query graph + run algorithms
  • LLM → assist, summarize, explain (not source of truth)

Architecture Overview

Frontend (React)
    ↓
API (FastAPI)
    ↓
Queue (Redis)
    ↓
Worker (RQ)
    ↓
Processing + Extraction
    ↓
Storage:
  - App DB (SQLite/Postgres)
  - Graph DB (Neo4j)
    ↓
Graph Query + Analysis
    ↓
LLM (Ollama / API)

Tech Stack

Backend

  • FastAPI
  • RQ (background jobs)
  • Redis (queue + cache)
  • SQLAlchemy
  • Neo4j Python Driver

Frontend

  • React + TypeScript
  • Vite
  • Zustand (state)
  • React Query
  • React Flow (graph UI)

Storage

  • SQLite (local) → PostgreSQL (later)
  • Neo4j (AuraDB Free or local)
  • Filesystem (optional raw artifacts)

LLM

  • Ollama (local)
  • Optional paid API (for higher-quality reasoning)

Project Structure

osint-platform/
  docs/
  frontend/
  backend/
    app/
      api/
      connectors/
      extraction/
      graph/
      services/
      models/
      schemas/
    worker/

Key System Layers

1. Connectors (connectors/)

Handles external data sources.

Each connector:

  • builds queries
  • fetches data
  • normalizes output

2. Extraction (extraction/)

Converts raw data into structured intelligence:

  • entities
  • relationships
  • evidence

3. Graph Layer (graph/)

Handles Neo4j:

  • Cypher templates
  • ingestion logic
  • graph queries
  • algorithms

4. Worker Layer (worker/)

Runs background jobs:

  • research
  • extraction
  • graph ingestion
  • enrichment

5. API Layer (api/)

Thin layer only:

  • creates jobs
  • returns results
  • never runs heavy logic

Data Model

App Database

  • Investigation
  • Job
  • SourceDocument
  • Evidence
  • Annotation

Graph Database (Neo4j)

Nodes

  • Person
  • Organization
  • Email
  • Phone
  • Address
  • Domain
  • Account
  • Event
  • Evidence

Relationships

  • ASSOCIATED_WITH
  • USES_EMAIL
  • USES_PHONE
  • LOCATED_AT
  • OWNS_DOMAIN
  • HAS_ACCOUNT
  • MENTIONS
  • SUPPORTS

Rule:
All relationships must be backed by evidence.


Research vs Graph Analysis

Research (Data Collection)

  • search sources
  • fetch documents
  • extract entities + relationships
  • store evidence

Graph Analysis (Link Analysis)

  • query Neo4j
  • run graph algorithms
  • explore connections
  • explain relationships

These are intentionally separate systems.


Job-Based Workflow

User request
→ API creates job
→ Worker processes job
→ Results stored
→ Frontend retrieves results

Benefits:

  • no blocking requests
  • retry support
  • scalable execution
  • consistent processing

Example Workflow

1. User runs search
2. Worker collects data
3. Extract entities + evidence
4. Store candidates
5. Ingest into Neo4j
6. Run graph queries
7. Summarize results

Graph Query Strategy

Start with deterministic queries:

  • find node by name
  • 1-hop / 2-hop neighbors
  • shortest path
  • evidence lookup

Then add graph algorithms:

  • centrality
  • similarity
  • clustering
  • link prediction

LLM Usage

LLMs are used for:

  • extraction (structured output)
  • summarization
  • explanation

LLMs are NOT used for:

  • defining truth
  • modifying graph directly
  • replacing structured data

Local Development Setup

Requirements

  • Python 3.12
  • Node.js
  • Redis
  • Ollama
  • Neo4j (AuraDB Free recommended)

Backend Setup

The backend is the first working slice of the app right now. Start there before trying to run Redis workers, Neo4j ingestion, or the frontend.

  1. Create or use the project virtual environment
cd backend
python3 -m venv ../venv
source ../venv/bin/activate

If you already created /Users/hadeelmusallam/Mosaic/venv, reuse it:

source /Users/hadeelmusallam/Mosaic/venv/bin/activate
cd /Users/hadeelmusallam/Mosaic/backend
  1. Install backend dependencies
python3 -m pip install --upgrade pip
python3 -m pip install -e .
  1. Run the FastAPI server
uvicorn app.main:app --reload
  1. Verify the backend is running

Open these in the browser or call them with curl:

  • http://127.0.0.1:8000/health
  • http://127.0.0.1:8000/docs

The backend currently creates the local SQLite database automatically on startup. The database file lives at backend/mosaic.db.

Current Backend Endpoints

  • GET /health
  • GET /investigations
  • POST /investigations
  • PATCH /investigations/{investigation_id}/archive

Example create request:

curl -X POST http://127.0.0.1:8000/investigations \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Test Investigation",
    "description": "First DB-backed investigation"
  }'

Full Stack Run Later

Once the rest of the local stack is wired up, the intended run commands are:

redis-server
cd backend && uvicorn app.main:app --reload
rq worker
npm run dev

Minimal Viable Features

  • create investigation
  • run research job
  • extract entities + evidence
  • store candidates
  • ingest into Neo4j
  • query graph
  • visualize results

Future Enhancements

  • Graph Data Science algorithms
  • GraphRAG integration
  • multi-user investigations
  • screenshot service
  • export/reporting
  • confidence scoring

What Not To Do

  • do not store every search permanently
  • do not mix scraping inside API routes
  • do not allow unrestricted Cypher from LLMs
  • do not over-engineer infrastructure early
  • do not rely on LLMs for truth

Design Philosophy

  • modular
  • explainable
  • traceable
  • scalable
  • lightweight for local development

Notes for Codex / Cursor

  • connectors, extraction, and graph are separate layers
  • all data must flow through normalization
  • graph queries live only in graph/
  • avoid tight coupling between frontend and sources
  • use typed schemas everywhere
  • treat evidence as first-class data
  • keep research and analysis separate

Summary

This platform evolves from:

search tool

into:

structured intelligence system

with:

  • reusable data
  • explainable relationships
  • scalable architecture

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors