Second Brain Knowledge Graph

A knowledge graph generator for personal knowledge management systems. Transform a disorganized collection of markdown notes into a structured knowledge base with automatic entity extraction, topic clustering, and wiki-link generation.

Overview

second-brain-kg analyzes your markdown notes using AI to extract entities (people, projects, concepts), identify themes, build a knowledge graph, and generate an organized output with:

Structured frontmatter with extracted metadata
Automatic wiki-links to entities and related notes
Topic-based folder organization from community detection
Maps of Content (MOC) pages for navigation
Entity pages for frequently mentioned topics
Graph exports (JSON, GEXF) for visualization

Key Features:

🚀 Fast processing with Gemini 2.0 Flash (3,200 notes ≈ $0.47)
💾 Smart caching system (resume from any stage)
🔍 Fuzzy entity resolution (merge duplicates)
📊 Graph algorithms (PageRank, community detection)
🎯 Obsidian-compatible output
🔒 Non-destructive (never modifies source files)

Installation

Global Installation (Recommended)

npm install -g second-brain-kg

NPX Usage (No Installation)

npx second-brain-kg --input ./notes --output ./reorganized

Local Development

git clone <repository-url>
cd second-brain-kg
npm install
npm run build

Quick Start

1. Get a Gemini API Key

Get your free API key from: https://makersuite.google.com/app/apikey

Gemini 2.0 Flash offers generous free tier limits:

Free tier: 1,500 requests/day, 1M tokens/min
Cost (paid): ~$0.075/1M input tokens, ~$0.30/1M output tokens

2. Set Your API Key

Create a .env file in your project directory:

GEMINI_API_KEY=your-api-key-here

Or export as environment variable:

export GEMINI_API_KEY=your-api-key-here

3. Run the Tool

second-brain-kg --input ~/notes --output ~/reorganized

That's it! The tool will:

Scan your notes directory
Extract entities and themes using Gemini
Build a knowledge graph
Organize notes into topic-based folders
Generate wiki-links and MOC pages
Export the reorganized notes

Usage

Basic Command

second-brain-kg -i <input-dir> -o <output-dir>

CLI Options

Option	Description	Default
`-i, --input <dir>`	Source notes directory (required)	-
`-o, --output <dir>`	Output directory (required)	-
`-m, --model <name>`	Gemini model name	`gemini-2.0-flash-exp`
`-c, --concurrency <n>`	Parallel API requests	`10`
`--cache-dir <dir>`	Cache directory	`<output>/.cache`
`--export-graph`	Export graph as JSON and GEXF	`false`
`--dry-run`	Preview without writing files	`false`
`--force`	Overwrite existing output directory	`false`
`-q, --quiet`	Suppress progress output	`false`
`-v, --verbose`	Verbose logging	`false`

Common Workflows

Preview before processing:

second-brain-kg -i ~/notes -o ~/reorganized --dry-run

Resume from cache (skip re-extraction):

# First run creates cache
second-brain-kg -i ~/notes -o ~/reorganized

# Modify config, resume without re-calling Gemini
second-brain-kg -i ~/notes -o ~/reorganized-v2
# Automatically uses cache from first run

Export graph for visualization:

second-brain-kg -i ~/notes -o ~/reorganized --export-graph
# Creates: reorganized/_graph.gexf (import into Gephi)
#          reorganized/_graph.json

Increase concurrency (faster processing):

second-brain-kg -i ~/notes -o ~/reorganized -c 20

Force overwrite existing output:

second-brain-kg -i ~/notes -o ~/reorganized --force

Configuration File

For persistent configuration, create .second-brain-kg.json in your notes directory or any parent directory:

{
  "model": "gemini-2.0-flash-exp",
  "geminiApiKey": "your-api-key-here-or-use-env",
  "concurrency": 10,
  "input": "./notes",
  "output": "./output",
  "cacheDir": "./output/.cache",
  "quiet": false,
  "exportGraph": false,
  "dryRun": false
}

Configuration Priority (highest to lowest):

CLI arguments
Config file (.second-brain-kg.json)
Environment variables (GEMINI_API_KEY)
Defaults

Example: Create .second-brain-kg.json.example as a template:

cp .second-brain-kg.json.example .second-brain-kg.json
# Edit .second-brain-kg.json with your settings

Output Structure

output/
├── _manifest.json              # Generation metadata
├── _graph.gexf                 # Graph export (if --export-graph)
├── _graph.json                 # Graph export (if --export-graph)
├── _entities/                  # Entity pages (3+ mentions)
│   ├── LeRobot.md
│   ├── Arduino.md
│   └── ...
├── robotics/                   # Topic folder (from community)
│   ├── _index.md               # MOC for this topic
│   ├── servo-control.md
│   ├── esp32-setup.md
│   └── ...
├── machine-learning/
│   ├── _index.md
│   ├── transformer-notes.md
│   └── ...
└── journal/                    # Journal entries (by date)
    ├── 2025-Q1/
    │   ├── 2025-01-15.md
    │   └── ...
    └── 2025-Q2/
        └── ...

Generated Files

Note files: Original content with enhanced frontmatter and wiki-links:

---
title: Servo Control Notes
tags: [robotics, embedded-systems, servo-control]
type: reference
created: 2024-12-01T10:30:00Z
modified: 2024-12-15T14:20:00Z
summary: Notes on PWM-based servo control using ESP32
original_path: random-notes/servo-stuff.md
people: []
projects: [[LeRobot]], [[Arduino]]
---

Working with the [[LeRobot]] arm today. Used [[Arduino]] for initial testing...

Entity pages (_entities/LeRobot.md):

---
title: LeRobot
type: entity
entity_type: project
aliases: [LeRobot, LeRobot arm, le robot]
generated: true
---

# LeRobot

Mentioned in 12 notes.

## Robotics
- [[servo-control]] (2024-12-01) — Notes on PWM-based servo control
- [[esp32-setup]] (2024-11-28) — Setting up ESP32 for robotics

## Machine Learning
- [[training-notes]] (2024-12-10) — Training models for robot control

MOC pages (robotics/_index.md):

---
title: Robotics
type: moc
generated: true
---

# Robotics

## Hardware
- [[servo-control]] — Notes on PWM-based servo control
- [[esp32-setup]] — Setting up ESP32 for robotics

## Projects
- [[lerobot-arm]] — LeRobot arm assembly notes

Performance & Cost

Processing Time

Typical performance with Gemini 2.0 Flash:

Notes	Concurrency	Time	Cost (est.)
100	10	~30s	$0.01
500	10	~2m	$0.07
1,000	10	~4m	$0.15
3,200	10	~12m	$0.47
5,000	20	~15m	$0.73

Note: First run extracts all notes. Subsequent runs use cache and only process changed notes.

Cost Estimate

Using Gemini 2.0 Flash pricing (as of Feb 2025):

Input: ~$0.075 per 1M tokens
Output: ~$0.30 per 1M tokens

For 3,200 notes (~3.7M tokens total):

Input tokens: ~3.2M → $0.24
Output tokens: ~0.8M → $0.24
Total: ~$0.47

Optimization Tips

Use caching: First run caches extractions, subsequent runs are nearly free
Increase concurrency: If within API rate limits, use -c 20 or higher
Process incrementally: Add new notes to an existing cache directory
Use dry-run: Preview structure before committing to full run

Obsidian Integration

The output is fully compatible with Obsidian:

Open the output folder in Obsidian:
- File → Open folder → Select output/ directory
Explore the graph:
- Open Graph View (Ctrl/Cmd + G)
- See entity connections, topic clusters
Navigate with MOCs:
- Open any _index.md file
- Follow wiki-links to related notes
Use entity pages:
- Search for an entity (e.g., "LeRobot")
- See all notes mentioning it
Tags work automatically:
- Use tag pane to filter by theme
- Tags from frontmatter are indexed

Troubleshooting

See docs/troubleshooting.md for common issues and solutions.

Quick fixes:

"GEMINI_API_KEY is required": Set API key in .env or export as environment variable
"Output directory already exists": Use --force to overwrite or choose a different output directory
API rate limits: Reduce --concurrency (try -c 5)
Out of memory: Process in batches or increase Node.js heap size
Build errors: Run npm run build and check TypeScript errors

Architecture

The pipeline consists of 7 stages:

Ingest — Parse markdown files, extract frontmatter, build note records
Extract — Call Gemini to extract entities, themes, note types
Resolve — Deduplicate entities using fuzzy matching
Graph — Build knowledge graph, run community detection
Plan — Map communities to folders, plan MOC pages
Emit — Write organized markdown with wiki-links
Manifest — Generate metadata manifest

See docs/architecture.md for detailed design documentation.

Development

Setup

npm install
cp .env.example .env
# Edit .env with your GEMINI_API_KEY

Running Tests

# Run all tests
npm test

# Run with UI
npm run test:ui

# Run with coverage
npm test -- --coverage

Building

npm run build

Running Locally

npm run dev -- -i ./test-notes -o ./test-output

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Write tests for new functionality
Ensure all tests pass (npm test)
Build successfully (npm run build)
Submit a pull request

Code Style

TypeScript strict mode enabled
2-space indentation
Zod for runtime validation
Vitest for testing

Testing Guidelines

Unit tests for all utilities and stages
Integration tests for full pipeline
Coverage target: >80% (currently 92%)

License

MIT License - see LICENSE file for details

Links

Documentation: docs/
Gemini API: https://makersuite.google.com/app/apikey
Obsidian: https://obsidian.md
Issues: GitHub Issues

Acknowledgments

Built with Gemini API
Graph algorithms from graphology
Markdown parsing with unified
Inspired by Building a Second Brain

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.beads		.beads
docs		docs
src		src
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.iteration_counter		.iteration_counter
.second-brain-kg.json.example		.second-brain-kg.json.example
AGENTS.md		AGENTS.md
PROGRESS.txt		PROGRESS.txt
PROMPT.md		PROMPT.md
README.md		README.md
claude-iteration-0.jsonl		claude-iteration-0.jsonl
claude-iteration-1.jsonl		claude-iteration-1.jsonl
claude-iteration-10.jsonl		claude-iteration-10.jsonl
claude-iteration-11.jsonl		claude-iteration-11.jsonl
claude-iteration-12.jsonl		claude-iteration-12.jsonl
claude-iteration-13.jsonl		claude-iteration-13.jsonl
claude-iteration-14.jsonl		claude-iteration-14.jsonl
claude-iteration-15.jsonl		claude-iteration-15.jsonl
claude-iteration-16.jsonl		claude-iteration-16.jsonl
claude-iteration-17.jsonl		claude-iteration-17.jsonl
claude-iteration-18.jsonl		claude-iteration-18.jsonl
claude-iteration-19.jsonl		claude-iteration-19.jsonl
claude-iteration-2.jsonl		claude-iteration-2.jsonl
claude-iteration-20.jsonl		claude-iteration-20.jsonl
claude-iteration-21.jsonl		claude-iteration-21.jsonl
claude-iteration-22.jsonl		claude-iteration-22.jsonl
claude-iteration-23.jsonl		claude-iteration-23.jsonl
claude-iteration-24.jsonl		claude-iteration-24.jsonl
claude-iteration-25.jsonl		claude-iteration-25.jsonl
claude-iteration-26.jsonl		claude-iteration-26.jsonl
claude-iteration-3.jsonl		claude-iteration-3.jsonl
claude-iteration-4.jsonl		claude-iteration-4.jsonl
claude-iteration-5.jsonl		claude-iteration-5.jsonl
claude-iteration-6.jsonl		claude-iteration-6.jsonl
claude-iteration-7.jsonl		claude-iteration-7.jsonl
claude-iteration-8.jsonl		claude-iteration-8.jsonl
claude-iteration-9.jsonl		claude-iteration-9.jsonl
package-lock.json		package-lock.json
package.json		package.json
ralph-wiggums-loop.sh		ralph-wiggums-loop.sh
second-brain-kg-spec.md		second-brain-kg-spec.md
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

Second Brain Knowledge Graph

Overview

Installation

Global Installation (Recommended)

NPX Usage (No Installation)

Local Development

Quick Start

1. Get a Gemini API Key

2. Set Your API Key

3. Run the Tool

Usage

Basic Command

CLI Options

Common Workflows

Configuration File

Output Structure

Generated Files

Performance & Cost

Processing Time

Cost Estimate

Optimization Tips

Obsidian Integration

Troubleshooting

Architecture

Development

Setup

Running Tests

Building

Running Locally

Contributing

Code Style

Testing Guidelines

License

Links

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages