A knowledge graph generator for personal knowledge management systems. Transform a disorganized collection of markdown notes into a structured knowledge base with automatic entity extraction, topic clustering, and wiki-link generation.
second-brain-kg analyzes your markdown notes using AI to extract entities (people, projects, concepts), identify themes, build a knowledge graph, and generate an organized output with:
- Structured frontmatter with extracted metadata
- Automatic wiki-links to entities and related notes
- Topic-based folder organization from community detection
- Maps of Content (MOC) pages for navigation
- Entity pages for frequently mentioned topics
- Graph exports (JSON, GEXF) for visualization
Key Features:
- 🚀 Fast processing with Gemini 2.0 Flash (3,200 notes ≈ $0.47)
- 💾 Smart caching system (resume from any stage)
- 🔍 Fuzzy entity resolution (merge duplicates)
- 📊 Graph algorithms (PageRank, community detection)
- 🎯 Obsidian-compatible output
- 🔒 Non-destructive (never modifies source files)
npm install -g second-brain-kgnpx second-brain-kg --input ./notes --output ./reorganizedgit clone <repository-url>
cd second-brain-kg
npm install
npm run buildGet your free API key from: https://makersuite.google.com/app/apikey
Gemini 2.0 Flash offers generous free tier limits:
- Free tier: 1,500 requests/day, 1M tokens/min
- Cost (paid): ~$0.075/1M input tokens, ~$0.30/1M output tokens
Create a .env file in your project directory:
GEMINI_API_KEY=your-api-key-hereOr export as environment variable:
export GEMINI_API_KEY=your-api-key-heresecond-brain-kg --input ~/notes --output ~/reorganizedThat's it! The tool will:
- Scan your notes directory
- Extract entities and themes using Gemini
- Build a knowledge graph
- Organize notes into topic-based folders
- Generate wiki-links and MOC pages
- Export the reorganized notes
second-brain-kg -i <input-dir> -o <output-dir>| Option | Description | Default |
|---|---|---|
-i, --input <dir> |
Source notes directory (required) | - |
-o, --output <dir> |
Output directory (required) | - |
-m, --model <name> |
Gemini model name | gemini-2.0-flash-exp |
-c, --concurrency <n> |
Parallel API requests | 10 |
--cache-dir <dir> |
Cache directory | <output>/.cache |
--export-graph |
Export graph as JSON and GEXF | false |
--dry-run |
Preview without writing files | false |
--force |
Overwrite existing output directory | false |
-q, --quiet |
Suppress progress output | false |
-v, --verbose |
Verbose logging | false |
Preview before processing:
second-brain-kg -i ~/notes -o ~/reorganized --dry-runResume from cache (skip re-extraction):
# First run creates cache
second-brain-kg -i ~/notes -o ~/reorganized
# Modify config, resume without re-calling Gemini
second-brain-kg -i ~/notes -o ~/reorganized-v2
# Automatically uses cache from first runExport graph for visualization:
second-brain-kg -i ~/notes -o ~/reorganized --export-graph
# Creates: reorganized/_graph.gexf (import into Gephi)
# reorganized/_graph.jsonIncrease concurrency (faster processing):
second-brain-kg -i ~/notes -o ~/reorganized -c 20Force overwrite existing output:
second-brain-kg -i ~/notes -o ~/reorganized --forceFor persistent configuration, create .second-brain-kg.json in your notes directory or any parent directory:
{
"model": "gemini-2.0-flash-exp",
"geminiApiKey": "your-api-key-here-or-use-env",
"concurrency": 10,
"input": "./notes",
"output": "./output",
"cacheDir": "./output/.cache",
"quiet": false,
"exportGraph": false,
"dryRun": false
}Configuration Priority (highest to lowest):
- CLI arguments
- Config file (
.second-brain-kg.json) - Environment variables (
GEMINI_API_KEY) - Defaults
Example: Create .second-brain-kg.json.example as a template:
cp .second-brain-kg.json.example .second-brain-kg.json
# Edit .second-brain-kg.json with your settingsoutput/
├── _manifest.json # Generation metadata
├── _graph.gexf # Graph export (if --export-graph)
├── _graph.json # Graph export (if --export-graph)
├── _entities/ # Entity pages (3+ mentions)
│ ├── LeRobot.md
│ ├── Arduino.md
│ └── ...
├── robotics/ # Topic folder (from community)
│ ├── _index.md # MOC for this topic
│ ├── servo-control.md
│ ├── esp32-setup.md
│ └── ...
├── machine-learning/
│ ├── _index.md
│ ├── transformer-notes.md
│ └── ...
└── journal/ # Journal entries (by date)
├── 2025-Q1/
│ ├── 2025-01-15.md
│ └── ...
└── 2025-Q2/
└── ...
Note files: Original content with enhanced frontmatter and wiki-links:
---
title: Servo Control Notes
tags: [robotics, embedded-systems, servo-control]
type: reference
created: 2024-12-01T10:30:00Z
modified: 2024-12-15T14:20:00Z
summary: Notes on PWM-based servo control using ESP32
original_path: random-notes/servo-stuff.md
people: []
projects: [[LeRobot]], [[Arduino]]
---
Working with the [[LeRobot]] arm today. Used [[Arduino]] for initial testing...Entity pages (_entities/LeRobot.md):
---
title: LeRobot
type: entity
entity_type: project
aliases: [LeRobot, LeRobot arm, le robot]
generated: true
---
# LeRobot
Mentioned in 12 notes.
## Robotics
- [[servo-control]] (2024-12-01) — Notes on PWM-based servo control
- [[esp32-setup]] (2024-11-28) — Setting up ESP32 for robotics
## Machine Learning
- [[training-notes]] (2024-12-10) — Training models for robot controlMOC pages (robotics/_index.md):
---
title: Robotics
type: moc
generated: true
---
# Robotics
## Hardware
- [[servo-control]] — Notes on PWM-based servo control
- [[esp32-setup]] — Setting up ESP32 for robotics
## Projects
- [[lerobot-arm]] — LeRobot arm assembly notesTypical performance with Gemini 2.0 Flash:
| Notes | Concurrency | Time | Cost (est.) |
|---|---|---|---|
| 100 | 10 | ~30s | $0.01 |
| 500 | 10 | ~2m | $0.07 |
| 1,000 | 10 | ~4m | $0.15 |
| 3,200 | 10 | ~12m | $0.47 |
| 5,000 | 20 | ~15m | $0.73 |
Note: First run extracts all notes. Subsequent runs use cache and only process changed notes.
Using Gemini 2.0 Flash pricing (as of Feb 2025):
- Input: ~$0.075 per 1M tokens
- Output: ~$0.30 per 1M tokens
For 3,200 notes (~3.7M tokens total):
- Input tokens: ~3.2M → $0.24
- Output tokens: ~0.8M → $0.24
- Total: ~$0.47
- Use caching: First run caches extractions, subsequent runs are nearly free
- Increase concurrency: If within API rate limits, use
-c 20or higher - Process incrementally: Add new notes to an existing cache directory
- Use dry-run: Preview structure before committing to full run
The output is fully compatible with Obsidian:
-
Open the output folder in Obsidian:
- File → Open folder → Select
output/directory
- File → Open folder → Select
-
Explore the graph:
- Open Graph View (Ctrl/Cmd + G)
- See entity connections, topic clusters
-
Navigate with MOCs:
- Open any
_index.mdfile - Follow wiki-links to related notes
- Open any
-
Use entity pages:
- Search for an entity (e.g., "LeRobot")
- See all notes mentioning it
-
Tags work automatically:
- Use tag pane to filter by theme
- Tags from frontmatter are indexed
See docs/troubleshooting.md for common issues and solutions.
Quick fixes:
- "GEMINI_API_KEY is required": Set API key in
.envor export as environment variable - "Output directory already exists": Use
--forceto overwrite or choose a different output directory - API rate limits: Reduce
--concurrency(try-c 5) - Out of memory: Process in batches or increase Node.js heap size
- Build errors: Run
npm run buildand check TypeScript errors
The pipeline consists of 7 stages:
- Ingest — Parse markdown files, extract frontmatter, build note records
- Extract — Call Gemini to extract entities, themes, note types
- Resolve — Deduplicate entities using fuzzy matching
- Graph — Build knowledge graph, run community detection
- Plan — Map communities to folders, plan MOC pages
- Emit — Write organized markdown with wiki-links
- Manifest — Generate metadata manifest
See docs/architecture.md for detailed design documentation.
npm install
cp .env.example .env
# Edit .env with your GEMINI_API_KEY# Run all tests
npm test
# Run with UI
npm run test:ui
# Run with coverage
npm test -- --coveragenpm run buildnpm run dev -- -i ./test-notes -o ./test-outputContributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Write tests for new functionality
- Ensure all tests pass (
npm test) - Build successfully (
npm run build) - Submit a pull request
- TypeScript strict mode enabled
- 2-space indentation
- Zod for runtime validation
- Vitest for testing
- Unit tests for all utilities and stages
- Integration tests for full pipeline
- Coverage target: >80% (currently 92%)
MIT License - see LICENSE file for details
- Documentation: docs/
- Gemini API: https://makersuite.google.com/app/apikey
- Obsidian: https://obsidian.md
- Issues: GitHub Issues
- Built with Gemini API
- Graph algorithms from graphology
- Markdown parsing with unified
- Inspired by Building a Second Brain