Skip to content

ozten/second_brain_reorg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Second Brain Knowledge Graph

A knowledge graph generator for personal knowledge management systems. Transform a disorganized collection of markdown notes into a structured knowledge base with automatic entity extraction, topic clustering, and wiki-link generation.

Overview

second-brain-kg analyzes your markdown notes using AI to extract entities (people, projects, concepts), identify themes, build a knowledge graph, and generate an organized output with:

  • Structured frontmatter with extracted metadata
  • Automatic wiki-links to entities and related notes
  • Topic-based folder organization from community detection
  • Maps of Content (MOC) pages for navigation
  • Entity pages for frequently mentioned topics
  • Graph exports (JSON, GEXF) for visualization

Key Features:

  • 🚀 Fast processing with Gemini 2.0 Flash (3,200 notes ≈ $0.47)
  • 💾 Smart caching system (resume from any stage)
  • 🔍 Fuzzy entity resolution (merge duplicates)
  • 📊 Graph algorithms (PageRank, community detection)
  • 🎯 Obsidian-compatible output
  • 🔒 Non-destructive (never modifies source files)

Installation

Global Installation (Recommended)

npm install -g second-brain-kg

NPX Usage (No Installation)

npx second-brain-kg --input ./notes --output ./reorganized

Local Development

git clone <repository-url>
cd second-brain-kg
npm install
npm run build

Quick Start

1. Get a Gemini API Key

Get your free API key from: https://makersuite.google.com/app/apikey

Gemini 2.0 Flash offers generous free tier limits:

  • Free tier: 1,500 requests/day, 1M tokens/min
  • Cost (paid): ~$0.075/1M input tokens, ~$0.30/1M output tokens

2. Set Your API Key

Create a .env file in your project directory:

GEMINI_API_KEY=your-api-key-here

Or export as environment variable:

export GEMINI_API_KEY=your-api-key-here

3. Run the Tool

second-brain-kg --input ~/notes --output ~/reorganized

That's it! The tool will:

  1. Scan your notes directory
  2. Extract entities and themes using Gemini
  3. Build a knowledge graph
  4. Organize notes into topic-based folders
  5. Generate wiki-links and MOC pages
  6. Export the reorganized notes

Usage

Basic Command

second-brain-kg -i <input-dir> -o <output-dir>

CLI Options

Option Description Default
-i, --input <dir> Source notes directory (required) -
-o, --output <dir> Output directory (required) -
-m, --model <name> Gemini model name gemini-2.0-flash-exp
-c, --concurrency <n> Parallel API requests 10
--cache-dir <dir> Cache directory <output>/.cache
--export-graph Export graph as JSON and GEXF false
--dry-run Preview without writing files false
--force Overwrite existing output directory false
-q, --quiet Suppress progress output false
-v, --verbose Verbose logging false

Common Workflows

Preview before processing:

second-brain-kg -i ~/notes -o ~/reorganized --dry-run

Resume from cache (skip re-extraction):

# First run creates cache
second-brain-kg -i ~/notes -o ~/reorganized

# Modify config, resume without re-calling Gemini
second-brain-kg -i ~/notes -o ~/reorganized-v2
# Automatically uses cache from first run

Export graph for visualization:

second-brain-kg -i ~/notes -o ~/reorganized --export-graph
# Creates: reorganized/_graph.gexf (import into Gephi)
#          reorganized/_graph.json

Increase concurrency (faster processing):

second-brain-kg -i ~/notes -o ~/reorganized -c 20

Force overwrite existing output:

second-brain-kg -i ~/notes -o ~/reorganized --force

Configuration File

For persistent configuration, create .second-brain-kg.json in your notes directory or any parent directory:

{
  "model": "gemini-2.0-flash-exp",
  "geminiApiKey": "your-api-key-here-or-use-env",
  "concurrency": 10,
  "input": "./notes",
  "output": "./output",
  "cacheDir": "./output/.cache",
  "quiet": false,
  "exportGraph": false,
  "dryRun": false
}

Configuration Priority (highest to lowest):

  1. CLI arguments
  2. Config file (.second-brain-kg.json)
  3. Environment variables (GEMINI_API_KEY)
  4. Defaults

Example: Create .second-brain-kg.json.example as a template:

cp .second-brain-kg.json.example .second-brain-kg.json
# Edit .second-brain-kg.json with your settings

Output Structure

output/
├── _manifest.json              # Generation metadata
├── _graph.gexf                 # Graph export (if --export-graph)
├── _graph.json                 # Graph export (if --export-graph)
├── _entities/                  # Entity pages (3+ mentions)
│   ├── LeRobot.md
│   ├── Arduino.md
│   └── ...
├── robotics/                   # Topic folder (from community)
│   ├── _index.md               # MOC for this topic
│   ├── servo-control.md
│   ├── esp32-setup.md
│   └── ...
├── machine-learning/
│   ├── _index.md
│   ├── transformer-notes.md
│   └── ...
└── journal/                    # Journal entries (by date)
    ├── 2025-Q1/
    │   ├── 2025-01-15.md
    │   └── ...
    └── 2025-Q2/
        └── ...

Generated Files

Note files: Original content with enhanced frontmatter and wiki-links:

---
title: Servo Control Notes
tags: [robotics, embedded-systems, servo-control]
type: reference
created: 2024-12-01T10:30:00Z
modified: 2024-12-15T14:20:00Z
summary: Notes on PWM-based servo control using ESP32
original_path: random-notes/servo-stuff.md
people: []
projects: [[LeRobot]], [[Arduino]]
---

Working with the [[LeRobot]] arm today. Used [[Arduino]] for initial testing...

Entity pages (_entities/LeRobot.md):

---
title: LeRobot
type: entity
entity_type: project
aliases: [LeRobot, LeRobot arm, le robot]
generated: true
---

# LeRobot

Mentioned in 12 notes.

## Robotics
- [[servo-control]] (2024-12-01) — Notes on PWM-based servo control
- [[esp32-setup]] (2024-11-28) — Setting up ESP32 for robotics

## Machine Learning
- [[training-notes]] (2024-12-10) — Training models for robot control

MOC pages (robotics/_index.md):

---
title: Robotics
type: moc
generated: true
---

# Robotics

## Hardware
- [[servo-control]] — Notes on PWM-based servo control
- [[esp32-setup]] — Setting up ESP32 for robotics

## Projects
- [[lerobot-arm]] — LeRobot arm assembly notes

Performance & Cost

Processing Time

Typical performance with Gemini 2.0 Flash:

Notes Concurrency Time Cost (est.)
100 10 ~30s $0.01
500 10 ~2m $0.07
1,000 10 ~4m $0.15
3,200 10 ~12m $0.47
5,000 20 ~15m $0.73

Note: First run extracts all notes. Subsequent runs use cache and only process changed notes.

Cost Estimate

Using Gemini 2.0 Flash pricing (as of Feb 2025):

  • Input: ~$0.075 per 1M tokens
  • Output: ~$0.30 per 1M tokens

For 3,200 notes (~3.7M tokens total):

  • Input tokens: ~3.2M → $0.24
  • Output tokens: ~0.8M → $0.24
  • Total: ~$0.47

Optimization Tips

  1. Use caching: First run caches extractions, subsequent runs are nearly free
  2. Increase concurrency: If within API rate limits, use -c 20 or higher
  3. Process incrementally: Add new notes to an existing cache directory
  4. Use dry-run: Preview structure before committing to full run

Obsidian Integration

The output is fully compatible with Obsidian:

  1. Open the output folder in Obsidian:

    • File → Open folder → Select output/ directory
  2. Explore the graph:

    • Open Graph View (Ctrl/Cmd + G)
    • See entity connections, topic clusters
  3. Navigate with MOCs:

    • Open any _index.md file
    • Follow wiki-links to related notes
  4. Use entity pages:

    • Search for an entity (e.g., "LeRobot")
    • See all notes mentioning it
  5. Tags work automatically:

    • Use tag pane to filter by theme
    • Tags from frontmatter are indexed

Troubleshooting

See docs/troubleshooting.md for common issues and solutions.

Quick fixes:

  • "GEMINI_API_KEY is required": Set API key in .env or export as environment variable
  • "Output directory already exists": Use --force to overwrite or choose a different output directory
  • API rate limits: Reduce --concurrency (try -c 5)
  • Out of memory: Process in batches or increase Node.js heap size
  • Build errors: Run npm run build and check TypeScript errors

Architecture

The pipeline consists of 7 stages:

  1. Ingest — Parse markdown files, extract frontmatter, build note records
  2. Extract — Call Gemini to extract entities, themes, note types
  3. Resolve — Deduplicate entities using fuzzy matching
  4. Graph — Build knowledge graph, run community detection
  5. Plan — Map communities to folders, plan MOC pages
  6. Emit — Write organized markdown with wiki-links
  7. Manifest — Generate metadata manifest

See docs/architecture.md for detailed design documentation.

Development

Setup

npm install
cp .env.example .env
# Edit .env with your GEMINI_API_KEY

Running Tests

# Run all tests
npm test

# Run with UI
npm run test:ui

# Run with coverage
npm test -- --coverage

Building

npm run build

Running Locally

npm run dev -- -i ./test-notes -o ./test-output

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Write tests for new functionality
  4. Ensure all tests pass (npm test)
  5. Build successfully (npm run build)
  6. Submit a pull request

Code Style

  • TypeScript strict mode enabled
  • 2-space indentation
  • Zod for runtime validation
  • Vitest for testing

Testing Guidelines

  • Unit tests for all utilities and stages
  • Integration tests for full pipeline
  • Coverage target: >80% (currently 92%)

License

MIT License - see LICENSE file for details

Links

Acknowledgments

About

Create a context graph from your Obsidian or other markdown second brain

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages