# üß† Second Brain Database - Repository Cleanup & Refactoring Analysis

This comprehensive notebook analyzes your mature production codebase and provides automated tools to safely reorganize, clean, and modernize the repository structure while preserving all institutional knowledge.

## üìã Executive Summary

Your Second Brain Database repository shows:
- **492+ tracked files** across 7 major categories
- **Rich integration ecosystem** (MCP, LangGraph, N8N, Voice, Auth)
- **Comprehensive test suite** (100+ test files)
- **Extensive documentation** (80+ markdown files)
- **Production-ready infrastructure** with Docker, scripts, and monitoring

## üéØ Goals
1. **Preserve everything** - no data loss, only reorganization
2. **Improve maintainability** - clear folder structure and documentation
3. **Enhance developer experience** - easy navigation and contribution
4. **Prepare for scale** - production-ready organization

---

## 1Ô∏è‚É£ Repository Structure Analysis

Let's start by analyzing your current repository structure and understanding the complexity we're dealing with.

In [None]:
import os
import json
import re
from pathlib import Path
from collections import defaultdict, Counter
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Tuple

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")

# Repository root path
REPO_ROOT = Path('/Users/rohan/Documents/repos/second_brain_database')

# Load the file index
with open(REPO_ROOT / 'file_index.txt', 'r') as f:
    file_lines = [line.strip().lstrip('./') for line in f if line.strip()]

print(f"üìä Repository Analysis")
print(f"Total files tracked: {len(file_lines)}")
print(f"Repository root: {REPO_ROOT}")
print(f"\nFirst 10 files:")
for i, file in enumerate(file_lines[:10]):
    print(f"  {i+1:2d}. {file}")

# Basic file extension analysis
extensions = Counter()
for file in file_lines:
    if '.' in file:
        ext = '.' + file.split('.')[-1]
        extensions[ext] += 1
    else:
        extensions['no_extension'] += 1

print(f"\nüìà File types distribution:")
for ext, count in extensions.most_common(10):
    print(f"  {ext:<15} {count:>3d} files")

In [None]:
# Analyze directory structure depth and organization
def analyze_directory_structure(files: List[str]) -> Dict:
    """Analyze the directory structure of files"""
    structure = defaultdict(list)
    depth_analysis = defaultdict(int)
    
    for file in files:
        parts = file.split('/')
        depth = len(parts) - 1
        depth_analysis[depth] += 1
        
        if depth > 0:
            top_dir = parts[0]
            structure[top_dir].append(file)
    
    return dict(structure), dict(depth_analysis)

directories, depth_dist = analyze_directory_structure(file_lines)

# Create visualization of directory structure
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Directory distribution
dir_counts = {k: len(v) for k, v in directories.items()}
top_dirs = dict(sorted(dir_counts.items(), key=lambda x: x[1], reverse=True)[:10])

ax1.bar(top_dirs.keys(), top_dirs.values(), alpha=0.7)
ax1.set_title('Files by Top-Level Directory')
ax1.set_ylabel('Number of Files')
ax1.tick_params(axis='x', rotation=45)

# Depth distribution
ax2.bar(depth_dist.keys(), depth_dist.values(), alpha=0.7, color='orange')
ax2.set_title('Files by Directory Depth')
ax2.set_xlabel('Directory Depth')
ax2.set_ylabel('Number of Files')

plt.tight_layout()
plt.show()

print("üìÅ Top-level directories analysis:")
for dir_name, file_count in sorted(dir_counts.items(), key=lambda x: x[1], reverse=True):
    print(f"  {dir_name:<25} {file_count:>3d} files")

print(f"\nüìè Directory depth distribution:")
for depth, count in sorted(depth_dist.items()):
    indent = "  " + "  " * depth
    print(f"{indent}Depth {depth}: {count} files")

## 2Ô∏è‚É£ Code Organization Assessment

Now let's categorize files by their purpose and identify organizational patterns.

In [None]:
# File categorization rules based on patterns and locations
categorization_rules = {
    'Core Application Code': {
        'patterns': [r'^src/second_brain_database/.*\.py$'],
        'description': 'Production FastAPI application code'
    },
    'Test Files': {
        'patterns': [r'^tests/.*\.py$', r'test_.*\.py$', r'.*test.*\.py$'],
        'description': 'Unit and integration tests'
    },
    'Documentation': {
        'patterns': [r'.*\.md$', r'README.*', r'.*\.rst$'],
        'description': 'Markdown and documentation files'
    },
    'Scripts & Tools': {
        'patterns': [r'^scripts/.*\.py$', r'.*_script\.py$'],
        'description': 'Development and automation scripts'
    },
    'Configuration': {
        'patterns': [r'.*\.yml$', r'.*\.yaml$', r'.*\.toml$', r'.*\.json$', r'.*\.ini$', r'.*\.env$'],
        'description': 'Configuration files'
    },
    'Infrastructure': {
        'patterns': [r'Dockerfile.*', r'docker-compose.*', r'.*\.dockerfile$'],
        'description': 'Docker and deployment files'
    },
    'Maintenance Scripts': {
        'patterns': [r'^(fix_|verify_|clear_|install_|update_|check_).*\.py$'],
        'description': 'One-off maintenance and verification scripts'
    },
    'Specifications': {
        'patterns': [r'^\.kiro/.*', r'.*spec.*\.md$', r'.*requirements.*\.md$'],
        'description': 'Product specs and requirements'
    },
    'Workflows': {
        'patterns': [r'^n8n_workflows/.*', r'.*workflow.*\.md$'],
        'description': 'N8N and automation workflows'
    },
    'Planning Documents': {
        'patterns': [r'^TODOS/.*', r'.*_plan.*\.md$'],
        'description': 'Project planning and TODO documents'
    }
}

def categorize_file(filename: str) -> str:
    """Categorize a file based on patterns"""
    for category, rules in categorization_rules.items():
        for pattern in rules['patterns']:
            if re.match(pattern, filename, re.IGNORECASE):
                return category
    return 'Other'

# Categorize all files
file_categories = defaultdict(list)
for file in file_lines:
    category = categorize_file(file)
    file_categories[category].append(file)

# Create summary
category_summary = {cat: len(files) for cat, files in file_categories.items()}

print("üìÇ File Categorization Summary:")
print("=" * 50)
total_categorized = 0
for category, count in sorted(category_summary.items(), key=lambda x: x[1], reverse=True):
    description = categorization_rules.get(category, {}).get('description', 'Miscellaneous files')
    print(f"{category:<25} {count:>3d} files - {description}")
    total_categorized += count

print(f"\nTotal files categorized: {total_categorized}/{len(file_lines)}")

# Visualize categories
plt.figure(figsize=(12, 8))
categories = list(category_summary.keys())
counts = list(category_summary.values())

plt.pie(counts, labels=categories, autopct='%1.1f%%', startangle=90)
plt.title('File Distribution by Category')
plt.axis('equal')
plt.show()

In [None]:
# Detailed analysis of specific problem areas
print("\nüîç Detailed Category Analysis:")
print("=" * 50)

# Analyze maintenance scripts (potential cleanup candidates)
maintenance_files = file_categories.get('Maintenance Scripts', [])
print(f"\nüîß Maintenance Scripts ({len(maintenance_files)} files):")
for file in sorted(maintenance_files)[:10]:  # Show first 10
    print(f"  ‚Ä¢ {file}")
if len(maintenance_files) > 10:
    print(f"  ... and {len(maintenance_files) - 10} more")

# Analyze root-level files (should be minimal)
root_level_files = [f for f in file_lines if '/' not in f]
print(f"\nüìÅ Root-level files ({len(root_level_files)} files):")
for file in sorted(root_level_files)[:15]:
    print(f"  ‚Ä¢ {file}")

# Analyze documentation spread
doc_files = file_categories.get('Documentation', [])
doc_by_location = defaultdict(list)
for doc in doc_files:
    if '/' in doc:
        location = doc.split('/')[0]
    else:
        location = 'root'
    doc_by_location[location].append(doc)

print(f"\nüìö Documentation distribution ({len(doc_files)} files):")
for location, docs in sorted(doc_by_location.items(), key=lambda x: len(x[1]), reverse=True):
    print(f"  {location:<20} {len(docs):>3d} files")

# Look for potential duplicates or similar files
def find_similar_files(files: List[str]) -> List[Tuple[str, str]]:
    """Find files with similar names that might be duplicates"""
    similar = []
    for i, file1 in enumerate(files):
        name1 = os.path.basename(file1).lower()
        for file2 in files[i+1:]:
            name2 = os.path.basename(file2).lower()
            if name1 == name2 and file1 != file2:
                similar.append((file1, file2))
    return similar

similar_files = find_similar_files(file_lines)
if similar_files:
    print(f"\n‚ö†Ô∏è Potential duplicate names ({len(similar_files)} pairs):")
    for file1, file2 in similar_files[:5]:
        print(f"  ‚Ä¢ {file1} ‚Üî {file2}")
    if len(similar_files) > 5:
        print(f"  ... and {len(similar_files) - 5} more pairs")

## 3Ô∏è‚É£ Documentation Audit and Consolidation

Let's analyze the documentation structure and identify consolidation opportunities.

In [None]:
# Documentation analysis and consolidation opportunities
doc_files = file_categories.get('Documentation', [])

# Categorize documentation by topic/theme
doc_themes = {
    'MCP Integration': [f for f in doc_files if 'mcp' in f.lower()],
    'Family Management': [f for f in doc_files if 'family' in f.lower()],
    'Production/Deployment': [f for f in doc_files if any(kw in f.lower() for kw in ['production', 'deployment', 'setup'])],
    'Authentication': [f for f in doc_files if any(kw in f.lower() for kw in ['auth', 'webauthn', 'token'])],
    'Flutter Integration': [f for f in doc_files if 'flutter' in f.lower()],
    'Voice/AI': [f for f in doc_files if any(kw in f.lower() for kw in ['voice', 'ai', 'langgraph', 'ollama'])],
    'Workflows': [f for f in doc_files if any(kw in f.lower() for kw in ['n8n', 'workflow'])],
    'Testing': [f for f in doc_files if 'test' in f.lower()],
}

print("üìö Documentation Themes Analysis:")
print("=" * 50)

total_themed = 0
for theme, files in doc_themes.items():
    if files:
        print(f"\n{theme} ({len(files)} files):")
        total_themed += len(files)
        for file in sorted(files)[:5]:  # Show first 5
            print(f"  ‚Ä¢ {file}")
        if len(files) > 5:
            print(f"  ... and {len(files) - 5} more")

unthemed = [f for f in doc_files if not any(f in theme_files for theme_files in doc_themes.values())]
print(f"\nUnthemed documentation ({len(unthemed)} files):")
for file in sorted(unthemed)[:10]:
    print(f"  ‚Ä¢ {file}")

print(f"\nTotal: {total_themed} themed + {len(unthemed)} unthemed = {total_themed + len(unthemed)} docs")

# Identify potential consolidation opportunities
consolidation_opportunities = []

# Look for similar documentation files that could be merged
def find_consolidation_candidates(files: List[str], theme: str) -> List[Dict]:
    """Find files that could potentially be consolidated"""
    candidates = []
    if len(files) > 3:  # Only suggest consolidation if there are multiple files
        candidates.append({
            'theme': theme,
            'files': files,
            'suggestion': f"Consider consolidating {len(files)} {theme.lower()} docs into a comprehensive guide"
        })
    return candidates

print(f"\nüîÑ Consolidation Opportunities:")
print("=" * 50)
for theme, files in doc_themes.items():
    candidates = find_consolidation_candidates(files, theme)
    consolidation_opportunities.extend(candidates)

for opp in consolidation_opportunities:
    print(f"\n{opp['theme']}:")
    print(f"  {opp['suggestion']}")
    print(f"  Files to consider: {len(opp['files'])}")

# Visualize documentation distribution
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Theme distribution
theme_counts = {k: len(v) for k, v in doc_themes.items() if v}
ax1.barh(list(theme_counts.keys()), list(theme_counts.values()))
ax1.set_title('Documentation by Theme')
ax1.set_xlabel('Number of Files')

# Location distribution
ax2.pie(doc_by_location.values(), labels=doc_by_location.keys(), autopct='%1.1f%%')
ax2.set_title('Documentation by Location')

plt.tight_layout()
plt.show()

## 4Ô∏è‚É£ Test Coverage and Quality Review

Now let's analyze the test suite structure and identify any gaps or organizational issues.

In [None]:
# Test file analysis
test_files = file_categories.get('Test Files', [])

# Categorize tests by component/feature
test_categories = {
    'WebAuthn Tests': [f for f in test_files if 'webauthn' in f.lower()],
    'Family Management Tests': [f for f in test_files if 'family' in f.lower()],
    'MCP Integration Tests': [f for f in test_files if 'mcp' in f.lower()],
    'Authentication Tests': [f for f in test_files if any(kw in f.lower() for kw in ['auth', 'token', 'login'])],
    'Database Tests': [f for f in test_files if any(kw in f.lower() for kw in ['database', 'db'])],
    'Integration Tests': [f for f in test_files if 'integration' in f.lower()],
    'Performance Tests': [f for f in test_files if 'performance' in f.lower()],
    'Voice/AI Tests': [f for f in test_files if any(kw in f.lower() for kw in ['voice', 'ai'])],
}

print("üß™ Test Suite Analysis:")
print("=" * 50)
print(f"Total test files: {len(test_files)}")

total_categorized_tests = 0
for category, tests in test_categories.items():
    if tests:
        print(f"\n{category} ({len(tests)} files):")
        total_categorized_tests += len(tests)
        for test in sorted(tests)[:3]:  # Show first 3
            print(f"  ‚Ä¢ {test}")
        if len(tests) > 3:
            print(f"  ... and {len(tests) - 3} more")

uncategorized_tests = [f for f in test_files if not any(f in cat_tests for cat_tests in test_categories.values())]
print(f"\nUncategorized tests ({len(uncategorized_tests)} files):")
for test in sorted(uncategorized_tests)[:5]:
    print(f"  ‚Ä¢ {test}")

# Test quality indicators
print(f"\nüìä Test Quality Indicators:")
print("=" * 30)

# Look for comprehensive test patterns
comprehensive_tests = [f for f in test_files if any(kw in f.lower() for kw in ['comprehensive', 'complete', 'end_to_end', 'e2e'])]
print(f"Comprehensive tests: {len(comprehensive_tests)}")

# Look for unit vs integration split
unit_tests = [f for f in test_files if 'unit' in f.lower()]
integration_tests = [f for f in test_files if 'integration' in f.lower()]
print(f"Unit tests: {len(unit_tests)}")
print(f"Integration tests: {len(integration_tests)}")

# Look for test utilities
test_utils = [f for f in test_files if any(kw in f.lower() for kw in ['conftest', 'utils', 'helper', 'fixture'])]
print(f"Test utilities: {len(test_utils)}")

# Analyze test naming patterns
test_naming_patterns = {
    'test_': len([f for f in test_files if os.path.basename(f).startswith('test_')]),
    'Test classes': len([f for f in test_files if 'Test' in os.path.basename(f)]),
    'Interactive': len([f for f in test_files if 'interactive' in f.lower()]),
}

print(f"\nTest naming patterns:")
for pattern, count in test_naming_patterns.items():
    print(f"  {pattern:<20} {count} files")

# Visualize test distribution
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Test categories
test_cat_counts = {k: len(v) for k, v in test_categories.items() if v}
ax1.barh(list(test_cat_counts.keys()), list(test_cat_counts.values()))
ax1.set_title('Test Files by Category')
ax1.set_xlabel('Number of Files')

# Test types
test_types = {
    'Unit Tests': len(unit_tests),
    'Integration Tests': len(integration_tests),
    'Comprehensive Tests': len(comprehensive_tests),
    'Other Tests': len(test_files) - len(unit_tests) - len(integration_tests) - len(comprehensive_tests)
}
ax2.pie(test_types.values(), labels=test_types.keys(), autopct='%1.1f%%')
ax2.set_title('Test Types Distribution')

plt.tight_layout()
plt.show()

## 5Ô∏è‚É£ Production Readiness Evaluation

Let's evaluate the production deployment setup and infrastructure components.

In [None]:
# Production infrastructure analysis
infra_files = file_categories.get('Infrastructure', [])
config_files = file_categories.get('Configuration', [])
scripts_files = file_categories.get('Scripts & Tools', [])

print("üè≠ Production Infrastructure Analysis:")
print("=" * 50)

# Docker and containerization
docker_files = [f for f in file_lines if any(kw in f.lower() for kw in ['dockerfile', 'docker-compose'])]
print(f"Docker files: {len(docker_files)}")
for f in docker_files:
    print(f"  ‚Ä¢ {f}")

# Configuration management
config_analysis = {
    'JSON configs': [f for f in config_files if f.endswith('.json')],
    'YAML configs': [f for f in config_files if f.endswith(('.yml', '.yaml'))],
    'TOML configs': [f for f in config_files if f.endswith('.toml')],
    'Environment files': [f for f in config_files if '.env' in f],
}

print(f"\n‚öôÔ∏è Configuration Files ({len(config_files)} total):")
for config_type, files in config_analysis.items():
    if files:
        print(f"  {config_type}: {len(files)} files")
        for f in files[:2]:  # Show first 2
            print(f"    - {f}")
        if len(files) > 2:
            print(f"    ... and {len(files) - 2} more")

# Production scripts and automation
production_scripts = [f for f in file_lines if any(kw in f.lower() for kw in ['production', 'deploy', 'startup'])]
print(f"\nüöÄ Production Scripts ({len(production_scripts)} files):")
for script in sorted(production_scripts)[:10]:
    print(f"  ‚Ä¢ {script}")

# Manual operation scripts
manual_scripts = [f for f in file_lines if 'manual/' in f]
print(f"\nüîß Manual Operation Scripts ({len(manual_scripts)} files):")
for script in sorted(manual_scripts):
    print(f"  ‚Ä¢ {script}")

# Health check and monitoring
monitoring_files = [f for f in file_lines if any(kw in f.lower() for kw in ['health', 'monitor', 'check', 'status'])]
print(f"\nüìä Monitoring & Health Check Files ({len(monitoring_files)} files):")
for f in sorted(monitoring_files)[:8]:
    print(f"  ‚Ä¢ {f}")

# Production readiness checklist
production_readiness = {
    'Docker Setup': len(docker_files) > 0,
    'Environment Configuration': len([f for f in config_files if 'env' in f]) > 0,
    'Production Scripts': len(production_scripts) > 0,
    'Health Monitoring': len([f for f in monitoring_files if 'health' in f.lower()]) > 0,
    'Deployment Guides': len([f for f in doc_files if 'deploy' in f.lower()]) > 0,
    'Setup Documentation': len([f for f in doc_files if 'setup' in f.lower()]) > 0,
}

print(f"\n‚úÖ Production Readiness Checklist:")
print("=" * 40)
for item, status in production_readiness.items():
    status_icon = "‚úÖ" if status else "‚ùå"
    print(f"  {status_icon} {item}")

# Identify infrastructure improvements needed
improvements = []
if len(docker_files) == 1:
    improvements.append("Consider multi-stage Dockerfile or docker-compose for different environments")
if not any('ci' in f.lower() or 'github' in f.lower() for f in file_lines):
    improvements.append("Add CI/CD pipeline configuration")
if not any('makefile' in f.lower() for f in file_lines):
    improvements.append("Consider adding Makefile for common operations")

print(f"\nüîß Suggested Infrastructure Improvements:")
for i, improvement in enumerate(improvements, 1):
    print(f"  {i}. {improvement}")

# Visualize production components
prod_components = {
    'Docker Files': len(docker_files),
    'Config Files': len(config_files),
    'Production Scripts': len(production_scripts),
    'Manual Scripts': len(manual_scripts),
    'Monitoring Files': len(monitoring_files),
}

plt.figure(figsize=(10, 6))
plt.bar(prod_components.keys(), prod_components.values(), alpha=0.7, color='skyblue')
plt.title('Production Infrastructure Components')
plt.ylabel('Number of Files')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 6Ô∏è‚É£ Refactoring Strategy Development

Based on our analysis, let's create a comprehensive refactoring and cleanup strategy.

In [None]:
# Create comprehensive refactoring strategy
refactoring_strategy = {
    'infra/': {
        'description': 'Infrastructure and deployment files',
        'files': (
            docker_files +
            [f for f in file_lines if f in ['production_app.py', 'verify_platform_config.py']] +
            production_scripts
        ),
        'priority': 'High'
    },
    'scripts/maintenance/': {
        'description': 'One-off maintenance and fix scripts',
        'files': maintenance_files + [
            f for f in file_lines if any(pattern in f for pattern in [
                'fix_all_indentation.py', 'clear_rate_limits.py', 'install_deepseek.py',
                'update_mcp_tools.py', 'check_mcp_health.py', 'verify_task3_implementation.py'
            ])
        ],
        'priority': 'Medium'
    },
    'scripts/tools/': {
        'description': 'Development and utility scripts',
        'files': [f for f in scripts_files if 'manual/' not in f],
        'priority': 'Medium'
    },
    'automation/': {
        'description': 'N8N workflows and automation',
        'files': [f for f in file_lines if f.startswith('n8n_workflows/')],
        'priority': 'Medium'
    },
    'docs/production/': {
        'description': 'Production deployment and setup guides',
        'files': [f for f in doc_files if any(kw in f.lower() for kw in [
            'production', 'deployment', 'setup', 'startup'
        ])],
        'priority': 'High'
    },
    'docs/integrations/mcp/': {
        'description': 'MCP integration documentation',
        'files': doc_themes.get('MCP Integration', []),
        'priority': 'High'
    },
    'docs/integrations/family/': {
        'description': 'Family management system documentation',
        'files': doc_themes.get('Family Management', []),
        'priority': 'High'
    },
    'docs/integrations/auth/': {
        'description': 'Authentication and security documentation',
        'files': doc_themes.get('Authentication', []),
        'priority': 'High'
    },
    'docs/integrations/voice/': {
        'description': 'Voice and AI integration documentation',
        'files': doc_themes.get('Voice/AI', []),
        'priority': 'Medium'
    },
    'docs/specs/': {
        'description': 'Product specifications and requirements',
        'files': [f for f in file_lines if f.startswith('.kiro/')],
        'priority': 'Medium'
    },
    'docs/plans/': {
        'description': 'Project planning and TODO documents',
        'files': [f for f in file_lines if f.startswith('TODOS/')],
        'priority': 'Low'
    },
    'legacy/': {
        'description': 'Legacy and experimental files',
        'files': [
            f for f in file_lines if any(kw in f.lower() for kw in [
                'unused', 'old', 'deprecated', 'backup'
            ])
        ] + [
            f for f in root_level_files if f not in [
                'README.md', 'Dockerfile', 'requirements.txt', 'pyproject.toml',
                'docker-compose.yml', 'QUICKSTART.md', 'SETUP_GUIDE.md'
            ]
        ],
        'priority': 'Low'
    }
}

# Calculate migration impact
print("üìã Refactoring Strategy Summary:")
print("=" * 50)

total_files_to_move = 0
for destination, info in refactoring_strategy.items():
    files_count = len(info['files'])
    total_files_to_move += files_count
    priority_icon = {"High": "üî¥", "Medium": "üü°", "Low": "üü¢"}.get(info['priority'], "‚ö™")
    
    print(f"\n{priority_icon} {destination} ({files_count} files)")
    print(f"  Priority: {info['priority']}")
    print(f"  Description: {info['description']}")
    
    # Show sample files
    sample_files = info['files'][:3]
    for f in sample_files:
        print(f"    ‚Ä¢ {f}")
    if len(info['files']) > 3:
        print(f"    ... and {len(info['files']) - 3} more")

print(f"\nTotal files to relocate: {total_files_to_move}/{len(file_lines)} ({total_files_to_move/len(file_lines)*100:.1f}%)")
print(f"Files staying in place: {len(file_lines) - total_files_to_move}")

# Create priority-based migration phases
migration_phases = {
    'Phase 1 - Critical Infrastructure': [dest for dest, info in refactoring_strategy.items() if info['priority'] == 'High'],
    'Phase 2 - Development Tools': [dest for dest, info in refactoring_strategy.items() if info['priority'] == 'Medium'],
    'Phase 3 - Cleanup & Archive': [dest for dest, info in refactoring_strategy.items() if info['priority'] == 'Low']
}

print(f"\nüöÄ Migration Phases:")
print("=" * 30)
for phase, destinations in migration_phases.items():
    files_in_phase = sum(len(refactoring_strategy[dest]['files']) for dest in destinations)
    print(f"\n{phase} ({files_in_phase} files):")
    for dest in destinations:
        print(f"  ‚Ä¢ {dest}")

# Visualize refactoring impact
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Files by destination
dest_counts = {dest: len(info['files']) for dest, info in refactoring_strategy.items() if info['files']}
ax1.barh(list(dest_counts.keys()), list(dest_counts.values()))
ax1.set_title('Files by Destination Directory')
ax1.set_xlabel('Number of Files')

# Priority distribution
priority_counts = defaultdict(int)
for info in refactoring_strategy.values():
    priority_counts[info['priority']] += len(info['files'])

colors = {'High': 'red', 'Medium': 'orange', 'Low': 'green'}
ax2.pie(priority_counts.values(), labels=priority_counts.keys(), 
        colors=[colors[p] for p in priority_counts.keys()], autopct='%1.1f%%')
ax2.set_title('Migration Priority Distribution')

plt.tight_layout()
plt.show()

## 7Ô∏è‚É£ Cleanup Automation Implementation

Now let's create automated tools to safely execute the refactoring plan.

In [None]:
import shutil
from datetime import datetime

# Generate automated cleanup script
def generate_cleanup_script():
    """Generate a comprehensive cleanup script"""
    
    script_content = f'''#!/usr/bin/env python3
"""
Second Brain Database Repository Cleanup Script
Generated on: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}

This script safely reorganizes the repository structure while preserving all files.
Run with --dry-run first to see what changes will be made.
"""

import os
import shutil
import argparse
from pathlib import Path
from datetime import datetime

# Repository root
REPO_ROOT = Path(__file__).parent

def create_directory_structure():
    """Create the new directory structure"""
    directories = [
        "infra",
        "scripts/maintenance",
        "scripts/tools", 
        "automation",
        "docs/production",
        "docs/integrations/mcp",
        "docs/integrations/family",
        "docs/integrations/auth", 
        "docs/integrations/voice",
        "docs/specs",
        "docs/plans",
        "legacy"
    ]
    
    for dir_path in directories:
        full_path = REPO_ROOT / dir_path
        full_path.mkdir(parents=True, exist_ok=True)
        print(f"‚úÖ Created directory: {{dir_path}}")

def move_file_safely(src: str, dst_dir: str, dry_run: bool = False):
    """Safely move a file to destination directory"""
    src_path = REPO_ROOT / src
    dst_path = REPO_ROOT / dst_dir / os.path.basename(src)
    
    if not src_path.exists():
        print(f"‚ö†Ô∏è Source file does not exist: {{src}}")
        return False
        
    if dst_path.exists():
        print(f"‚ö†Ô∏è Destination already exists: {{dst_path}}")
        return False
    
    if dry_run:
        print(f"üîÑ Would move: {{src}} ‚Üí {{dst_dir}}/{{os.path.basename(src)}}")
        return True
    
    try:
        dst_path.parent.mkdir(parents=True, exist_ok=True)
        shutil.move(str(src_path), str(dst_path))
        print(f"‚úÖ Moved: {{src}} ‚Üí {{dst_dir}}/{{os.path.basename(src)}}")
        return True
    except Exception as e:
        print(f"‚ùå Error moving {{src}}: {{e}}")
        return False

# File migration mappings
MIGRATION_MAP = {{
'''

    # Add the migration mappings
    for destination, info in refactoring_strategy.items():
        if info['files']:
            script_content += f'    "{destination}": [\n'
            for file_path in info['files']:
                script_content += f'        "{file_path}",\n'
            script_content += '    ],\n'
    
    script_content += '''
}

def create_cleanup_log(moves_made):
    """Create a log of all moves made"""
    log_content = f"""# Repository Cleanup Log
Generated on: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}

## Summary
- Total files moved: {len(moves_made)}
- New directory structure created

## File Moves
"""
    
    for src, dst in moves_made:
        log_content += f"- `{src}` ‚Üí `{dst}`\\n"
    
    with open(REPO_ROOT / "CLEANUP_LOG.md", "w") as f:
        f.write(log_content)
    
    print(f"üìã Created cleanup log: CLEANUP_LOG.md")

def main():
    parser = argparse.ArgumentParser(description="Cleanup Second Brain Database repository")
    parser.add_argument("--dry-run", action="store_true", help="Show what would be done without making changes")
    parser.add_argument("--phase", choices=["1", "2", "3", "all"], default="all", help="Run specific migration phase")
    
    args = parser.parse_args()
    
    print("üß† Second Brain Database Repository Cleanup")
    print("=" * 50)
    
    if args.dry_run:
        print("üîç DRY RUN MODE - No files will be moved")
    
    # Create directory structure
    if not args.dry_run:
        create_directory_structure()
    
    # Execute migrations
    moves_made = []
    total_moves = 0
    successful_moves = 0
    
    phase_map = {
        "1": ["docs/production/", "docs/integrations/mcp/", "docs/integrations/family/", "docs/integrations/auth/", "infra/"],
        "2": ["scripts/maintenance/", "scripts/tools/", "automation/", "docs/integrations/voice/", "docs/specs/"],
        "3": ["docs/plans/", "legacy/"]
    }
    
    phases_to_run = phase_map.get(args.phase, []) if args.phase != "all" else list(MIGRATION_MAP.keys())
    
    for destination in phases_to_run:
        if destination in MIGRATION_MAP:
            print(f"\\nüìÇ Processing: {destination}")
            for file_path in MIGRATION_MAP[destination]:
                total_moves += 1
                if move_file_safely(file_path, destination, args.dry_run):
                    successful_moves += 1
                    if not args.dry_run:
                        moves_made.append((file_path, destination))
    
    print(f"\\nüìä Summary:")
    print(f"Total files processed: {total_moves}")
    print(f"Successful moves: {successful_moves}")
    print(f"Failed moves: {total_moves - successful_moves}")
    
    if not args.dry_run and moves_made:
        create_cleanup_log(moves_made)
        print(f"\\nüéâ Repository cleanup completed!")
        print(f"Review CLEANUP_LOG.md for details of all changes made.")
    elif args.dry_run:
        print(f"\\nüîç Dry run completed. Use --phase 1,2,3 or remove --dry-run to execute.")

if __name__ == "__main__":
    main()
'''
    
    return script_content

# Create the cleanup script
cleanup_script = generate_cleanup_script()

# Save the script
script_path = REPO_ROOT / "cleanup_repository.py"
with open(script_path, 'w') as f:
    f.write(cleanup_script)

# Make it executable
os.chmod(script_path, 0o755)

print("üõ†Ô∏è Generated cleanup automation script:")
print(f"Script location: {script_path}")
print("\nüìã Usage Instructions:")
print("1. First run with dry-run to preview changes:")
print("   python cleanup_repository.py --dry-run")
print("\n2. Run by phases for safety:")
print("   python cleanup_repository.py --phase 1  # Critical infrastructure")
print("   python cleanup_repository.py --phase 2  # Development tools") 
print("   python cleanup_repository.py --phase 3  # Cleanup & archive")
print("\n3. Or run all phases at once:")
print("   python cleanup_repository.py")

print(f"\nüìä Script will move {sum(len(info['files']) for info in refactoring_strategy.values())} files")

## üéØ Next Steps & Execution Plan

### Phase 1: Analysis & Validation (Safe)
1. **Run this notebook** to generate comprehensive repository analysis
2. **Review refactoring strategy** and file categorizations 
3. **Test cleanup script** with `--dry-run` flag
4. **Validate documentation** consolidation plans

### Phase 2: Gradual Migration (Cautious)
1. **Backup repository** (create git branch: `git checkout -b pre-cleanup-backup`)
2. **Run Phase 1** migrations (infrastructure & documentation)
3. **Test builds** to ensure nothing breaks
4. **Run Phase 2** migrations (development tools & scripts)

### Phase 3: Finalization (Confident)
1. **Run Phase 3** migrations (cleanup & archive)
2. **Update CI/CD** references to moved files
3. **Update documentation** with new structure
4. **Create team announcement** with migration guide

---

## üîß Generated Automation Tools

This notebook creates several automated tools:

1. **`cleanup_repository.py`** - Main migration script with dry-run capability
2. **File categorization analysis** - Automated classification of all repository files  
3. **Documentation consolidation strategy** - Merge similar docs, preserve all content
4. **Test coverage analysis** - Identify gaps and reorganization opportunities
5. **Production readiness assessment** - Validate critical deployment files

---

## ‚ö†Ô∏è Safety Features

- **Dry-run mode**: Preview all changes before execution
- **Phased migration**: Run in 3 safe phases with validation points
- **Complete logging**: Track every file move with detailed logs
- **Rollback capability**: Git-based rollback if issues arise
- **Zero data loss**: All files preserved, only reorganized

---

## üìû Team Communication

**Before running cleanup:**
1. Share this analysis with your team
2. Get approval for the refactoring strategy
3. Schedule the migration during low-activity periods
4. Ensure all team members have backed up local work

**After cleanup:**
1. Update team documentation with new structure
2. Share CLEANUP_LOG.md for transparency  
3. Update development workflows and tooling
4. Celebrate your clean, modern repository! üéâ