# Archive Taxonomy

## Why this notebook

This archive is not designed for perfect retrieval. It is designed to shape attention.

Search works well enough at this scale of all of my files (~10 TB); the real problem is not finding files, but remembering what is worth looking for. Flat abundance collapses curiosity. When everything is equally accessible, habit wins, and familiar choices dominate. This taxonomy exists to reintroduce texture, constraint, and adjacency so that discovery becomes possible again.

Folders answer the question “what kind of thing is this?” They do not answer what the thing means, why it matters, or how it relates to other things. Meaning lives in metadata, memory, and context. The folder structure is deliberately boring at the top and deliberately opinionated where browsing matters.

This system distinguishes between taxonomy and ontology. Ontology asks what something truly is; taxonomy asks where it should live so it can be encountered. The archive does not attempt to be a perfect model of reality. It attempts to be a usable map. Predictable misclassification is acceptable and even desirable. Search exists as a safety net, not as the primary interface.

Different media demand different depths of structure. Photos, documents, data, and projects benefit from shallow hierarchies and strong metadata because retrieval is goal-directed. Books, music, and films benefit from deeper, semantic hierarchies because browsing, recognition, and serendipity are the primary modes of engagement. In these domains, deeper folders are not bureaucracy; they are shelves.

Classification is based on how one wants to arrive at an artifact, not on an abstract definition of what it is. An album that spans multiple genres is placed in the location that matches the listening intention it best serves. Each artifact has one canonical home. That home reflects a choice about behavior, not truth.

Projects are explicitly separated from the archive. Projects are allowed to be messy, temporary, and mutable. When a project ends, its outputs are promoted into the archive and reclassified according to artifact type. This separation prevents entropy from spreading.

The archive is meant to be walked, not queried. Each directory should be small enough to browse without fatigue and rich enough to invite exploration. Constraint is not a limitation; it is the mechanism by which curiosity reappears.

This taxonomy is expected to evolve. Changes should be deliberate and documented, not reactive. The structure should age alongside its owner. Consistency matters more than correctness, and memory is reinforced through use, not optimization.

The success criterion of this system is not whether a file can be found instantly. The success criterion is whether the archive invites engagement, supports discovery, and gently resists the pull of the familiar.


## What this notebook does

This notebook defines and enforces the physical structure of the archive. It creates the directory taxonomy that all automated and manual classification must target. The notebook does not move files, interpret content, or decide meaning; it establishes the stable destinations that give those future decisions somewhere to land.

The structure produced here is intentionally explicit and repeatable. By materializing the taxonomy as directories, the notebook turns an abstract organizational philosophy into a concrete constraint. Any AI agent operating on the archive is required to place artifacts into one and only one canonical location within this structure.

The notebook serves as the contract between human intention and automated action. It makes the rules visible, inspectable, and versionable. An AI agent may misclassify, but it may not invent categories. Corrections happen by relocating artifacts within the existing structure, not by reshaping the structure itself.

This separation is deliberate. The notebook defines the “where.” The AI agent decides the “which.” Search remains the safety net, but browsing is the primary interface. Together, they form a system in which automation accelerates organization without erasing human judgment.


## Libraries Used

In [1]:
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 

# ------------------------------------------------------------
# Archive root
# ------------------------------------------------------------
ARCHIVE_ROOT = Path.cwd() / "Archive"
ARCHIVE_ROOT.mkdir(parents=True, exist_ok=True)

# ------------------------------------------------------------
# Level 1 taxonomy
# ------------------------------------------------------------
LEVEL_1 = [
    "Applications",
    "Audio",
    "Books",
    "Code",
    "Compressed",
    "Data",
    "Docs",
    "Images",
    "Intake",
    "Journal",
    "Movies",
    "Music",
    "Notes",
    "Projects",
    "Systems",
    "Video",
]

# ------------------------------------------------------------
# README explanations (contract text, not marketing)
# ------------------------------------------------------------
LEVEL_1_EXPLAINERS = {
    "Applications": (
        "Large, mode-shifting creative or technical environments.\n"
        "These are tool ecosystems, not source code or projects.\n"
        "Examples include game engines, DAWs, notation tools, and IDEs."
    ),
    "Audio": (
        "Non-music audio artifacts and audio work-in-progress.\n"
        "Recordings, experiments, stems, sound design, and cooked audio."
    ),
    "Books": (
        "Long-form written works intended to be read as books.\n"
        "PDF, EPUB, MOBI, and similar formats. Mostly read-only."
    ),
    "Code": (
        "Source code and scripts.\n"
        "Repositories, utilities, experiments, and executable logic."
    ),
    "Compressed": (
        "Archive containers such as zip, tar, and 7z files.\n"
        "This is a staging area, not a permanent home."
    ),
    "Data": (
        "Structured data artifacts.\n"
        "CSV, JSON, Parquet, logs, generated datasets, and analysis outputs."
    ),
    "Docs": (
        "Documents that are not books.\n"
        "Manuals, contracts, receipts, reference PDFs, and administrative files."
    ),
    "Images": (
        "Still images.\n"
        "Photography, scans, artwork, screenshots, and visual assets."
    ),
    "Intake": (
        "Temporary landing zone for unclassified material.\n"
        "Nothing here is considered organized or permanent."
    ),
    "Journal": (
        "Chronological personal writing.\n"
        "Reflections, logs, and time-ordered narrative entries."
    ),
    "Movies": (
        "Cinematic works.\n"
        "Feature films, documentaries, and shorts treated as cinema."
    ),
    "Music": (
        "Music intended for listening and browsing.\n"
        "Organized for discovery, adjacency, and strolling."
    ),
    "Notes": (
        "Short-form thinking artifacts.\n"
        "Scratch notes, research fragments, ideas, and provisional text."
    ),
    "Projects": (
        "Active and evolving workspaces.\n"
        "Messy by design. Completed outputs should be promoted elsewhere."
    ),
    "Systems": (
        "System-level artifacts.\n"
        "Backups, disk images, installers, and configuration snapshots."
    ),
    "Video": (
        "Non-cinematic moving images.\n"
        "Home videos, lectures, screen recordings, and clips."
    ),
}

# ------------------------------------------------------------
# Helper: create directory + README.md (idempotent)
# ------------------------------------------------------------
def ensure_dir_with_readme(path: Path, text: str):
    path.mkdir(exist_ok=True)
    readme = path / "README.md"
    if not readme.exists():
        readme.write_text(text.strip() + "\n")

# ------------------------------------------------------------
# Build Level 1 + READMEs
# ------------------------------------------------------------
for name in LEVEL_1:
    ensure_dir_with_readme(
        ARCHIVE_ROOT / name,
        LEVEL_1_EXPLAINERS.get(
            name,
            "No description provided. This directory exists by design."
        ),
    )

ARCHIVE_ROOT


PosixPath('/home/mcruggs/gitwork/Archive/Archive')

### Applications Organization

In [2]:
from pathlib import Path

applications_root = ARCHIVE_ROOT / "Applications"
applications_root.mkdir(exist_ok=True)

# ------------------------------------------------------------
# Applications taxonomy
# ------------------------------------------------------------
APPLICATIONS = {
    "3D": ["Blender"],
    "Game_Engines": ["Unreal", "Godot"],
    "Audio": ["Audacity", "Reaper"],
    "Music_Theory": ["MuseScore"],
    "Visual_Design": ["GIMP"],
    "Video": ["DaVinci_Resolve"],
    "Writing": ["Obsidian", "LaTeX_Toolchain"],
    "Electronics": ["Arduino_IDE", "PlatformIO"],
    "IDEs": ["VSCode", "Atom"],
}

# ------------------------------------------------------------
# Task-level explanations
# ------------------------------------------------------------
APPLICATION_TASK_EXPLAINERS = {
    "3D": (
        "Three-dimensional modeling and procedural world-building environments.\n"
        "Tools for spatial thinking, geometry, and form."
    ),
    "Game_Engines": (
        "Interactive simulation and game development environments.\n"
        "Tools for building real-time systems and virtual worlds."
    ),
    "Audio": (
        "Audio production and manipulation environments.\n"
        "Recording, editing, mixing, and sound experimentation."
    ),
    "Music_Theory": (
        "Music notation and theory-focused tools.\n"
        "Used for analysis, composition, and score-based thinking."
    ),
    "Visual_Design": (
        "Visual composition and design tools.\n"
        "Raster and vector-based image creation and editing."
    ),
    "Video": (
        "Video editing and post-production environments.\n"
        "Used for assembling, grading, and rendering moving images."
    ),
    "Writing": (
        "Writing and thinking environments.\n"
        "Tools that shape how long-form text is produced and organized."
    ),
    "Electronics": (
        "Embedded systems and hardware programming environments.\n"
        "Tools tied to physical devices and microcontrollers."
    ),
    "IDEs": (
        "Integrated development environments.\n"
        "General-purpose coding workspaces, kept intentionally minimal."
    ),
}

# ------------------------------------------------------------
# App-level explanations
# ------------------------------------------------------------
APPLICATION_EXPLAINERS = {
    "Blender": "3D creation suite for modeling, animation, rendering, and procedural workflows.",
    "Unreal": "High-performance game engine for real-time simulation and interactive worlds.",
    "Godot": "Lightweight, open-source game engine emphasizing simplicity and rapid iteration.",
    "Audacity": "Audio editor for recording, trimming, and basic sound manipulation.",
    "Reaper": "Highly configurable digital audio workstation focused on precision and scripting.",
    "MuseScore": "Music notation software for score writing, analysis, and playback.",
    "GIMP": "Raster-based image editor for photo manipulation and visual design.",
    "DaVinci_Resolve": "Professional video editing and color grading environment.",
    "Obsidian": "Markdown-based writing environment for linked notes and long-form thinking.",
    "LaTeX_Toolchain": "Document preparation environment for structured, typeset writing.",
    "Arduino_IDE": "Integrated environment for programming Arduino-compatible microcontrollers.",
    "PlatformIO": "Embedded development ecosystem supporting multiple boards and frameworks.",
    "VSCode": "Extensible code editor and development environment.",
    "Atom": "Hackable text editor used as a lightweight IDE.",
}

# ------------------------------------------------------------
# Helper
# ------------------------------------------------------------
def ensure_dir_with_readme(path: Path, text: str):
    path.mkdir(exist_ok=True)
    readme = path / "README.md"
    if not readme.exists():
        readme.write_text(text.strip() + "\n")

# ------------------------------------------------------------
# Build Applications tree with READMEs
# ------------------------------------------------------------
for task, apps in APPLICATIONS.items():
    task_dir = applications_root / task

    ensure_dir_with_readme(
        task_dir,
        APPLICATION_TASK_EXPLAINERS.get(
            task,
            "Application task category."
        ),
    )

    for app in apps:
        ensure_dir_with_readme(
            task_dir / app,
            APPLICATION_EXPLAINERS.get(
                app,
                "Application environment."
            ),
        )

applications_root


PosixPath('/home/mcruggs/gitwork/Archive/Archive/Applications')

### Audio

In [3]:
from pathlib import Path

audio_root = ARCHIVE_ROOT / "Audio"
audio_root.mkdir(exist_ok=True)

# ------------------------------------------------------------
# Audio Level 2 taxonomy (Music lives at Level 1, not here)
# ------------------------------------------------------------
AUDIO_LEVEL_2 = [
    "Recordings",
    "Experiments",
    "Programmatic",
    "AI",
]

# ------------------------------------------------------------
# Audio explainers
# ------------------------------------------------------------
AUDIO_EXPLAINERS = {
    "Recordings": (
        "Captured audio from the physical world.\n"
        "Live recordings, voice takes, instrument captures, and raw source material."
    ),
    "Experiments": (
        "Exploratory and provisional audio work.\n"
        "Half-formed ideas, sound tests, sketches, and throwaway explorations."
    ),
    "Programmatic": (
        "Audio generated or manipulated through code.\n"
        "Algorithmic composition, DSP experiments, scripts, and computational sound."
    ),
    "AI": (
        "Audio created or transformed using machine learning systems.\n"
        "Model outputs, training artifacts, prompts, and AI-assisted sound work."
    ),
}

# ------------------------------------------------------------
# Helper
# ------------------------------------------------------------
def ensure_dir_with_readme(path: Path, text: str):
    path.mkdir(exist_ok=True)
    readme = path / "README.md"
    if not readme.exists():
        readme.write_text(text.strip() + "\n")

# ------------------------------------------------------------
# Build Audio tree with READMEs
# ------------------------------------------------------------
for name in AUDIO_LEVEL_2:
    ensure_dir_with_readme(
        audio_root / name,
        AUDIO_EXPLAINERS.get(name, "Audio category."),
    )

audio_root


PosixPath('/home/mcruggs/gitwork/Archive/Archive/Audio')

### Books

In [6]:
from pathlib import Path

books_root = ARCHIVE_ROOT / "Books"
books_root.mkdir(exist_ok=True)

# ------------------------------------------------------------
# Books taxonomy: Level 2 + Level 3 (finalized pass)
# ------------------------------------------------------------
BOOKS_TAXONOMY = {
    "Mathematics": [
        "Algebra",
        "Analysis",
        "Topology",
        "Graph_Theory",
        "Probability",
        "Number_Theory",
        "Logic",
        "Discrete_Math",
        "Geometry",
    ],
    "Science": [
        "Physics",
        "Biology",
        "Chemistry",
        "Complexity",
        "Systems",
        "Earth_Science",
    ],
    "Philosophy": [
        "Metaphysics",
        "Epistemology",
        "Ethics",
        "Daoism",
        "Philosophy_of_Science",
        "Logic",
        "Aesthetics",
    ],
    "Literature": [
        "Classicism",
        "Modernism",
        "Postmodernism",
        "Magical_Realism",
        "Mythic",
        "Speculative",
        "Essays",
        "Literary_Criticism",
        "Drama",
        "Poetry",
    ],
    "History": [
        "Ancient",
        "Medieval",
        "Early_Modern",
        "Modern",
        "History_of_Science",
        "History_of_Philosophy",
    ],
    "Computing": [
        "Programming",
        "Algorithms",
        "Systems",
        "AI",
        "Graphics",
        "Game_Development",
        "Software_Engineering",
    ],
    "Psychology": [
        "Cognitive",
        "Behavioral",
        "Neuroscience",
        "Developmental",
    ],
    "Art": [
        "Art_History",
        "Theory",
        "Visual_Arts",
        "Architecture",
    ],
    "Music": [
        "Music_Theory",
        "History",
        "Composition",
        "Ethnomusicology",
    ],
    "Misc": [
        "Unsorted",
    ],
}

BOOKS_TAXONOMY["Literature"].extend([
    "Fantasy",
    "Science_Fiction",
    "Historical_Fiction",
    "Mystery",
    "Thriller",
    "Horror",
    "Romance",
])

# README explainers for narrative-intent Literature shelves
BOOKS_READMES.update({
    "Fantasy": (
        "Narratives built around imagined worlds and mythic structures.\n"
        "Used here as a browsing shelf, not a promise of escapism."
    ),
    "Science_Fiction": (
        "Stories driven by technological, scientific, or speculative premises.\n"
        "Focused on consequences and ideas rather than genre tropes."
    ),
    "Historical_Fiction": (
        "Narratives set within real historical periods.\n"
        "Used for immersive context rather than strict historical analysis."
    ),
    "Mystery": (
        "Narratives structured around investigation, secrecy, and revelation.\n"
        "Read for tension, structure, and puzzle-solving."
    ),
    "Thriller": (
        "Fast-paced narratives built around urgency and high stakes.\n"
        "Selected for momentum and narrative drive."
    ),
    "Horror": (
        "Works that explore fear, unease, and the unknown.\n"
        "Includes psychological, cosmic, and existential horror."
    ),
    "Romance": (
        "Narratives centered on relationships and emotional development.\n"
        "Organized separately to support intentional browsing."
    ),
})


# ------------------------------------------------------------
# README content for Books taxonomy
# ------------------------------------------------------------

BOOKS_READMES = {

    # -------------------------
    # Level 2 domains
    # -------------------------
    "Books": (
        "Long-form works intended for slow reading and sustained thought.\n"
        "This collection is organized to support browsing, wandering, and curiosity,\n"
        "not obligation or completion tracking."
    ),

    "Mathematics": (
        "Mathematical texts organized by domain and mode of thinking.\n"
        "These shelves are meant to be browsed, revisited, and explored non-linearly."
    ),

    "Science": (
        "Scientific works exploring the natural world.\n"
        "Organized by discipline and conceptual focus rather than textbook sequence."
    ),

    "Philosophy": (
        "Philosophical works organized by questions, traditions, and modes of inquiry.\n"
        "This space supports reflection rather than systematic study."
    ),

    "Literature": (
        "Literary works organized by aesthetic tradition and narrative mode.\n"
        "These shelves invite wandering based on tone, voice, and imaginative posture."
    ),

    "History": (
        "Historical works organized by era and thematic focus.\n"
        "Intended for contextual understanding rather than chronological completeness."
    ),

    "Computing": (
        "Books about computation, software, and digital systems.\n"
        "Focused on ideas, architecture, and practice rather than specific tools."
    ),

    "Psychology": (
        "Works exploring mind, behavior, and cognition.\n"
        "Organized by perspective rather than clinical or academic taxonomy."
    ),

    "Art": (
        "Books about visual art, aesthetics, and built environments.\n"
        "These shelves support visual thinking and historical context."
    ),

    "Music": (
        "Books about music as structure, history, and cultural practice.\n"
        "Separate from listening collections; focused on understanding and reflection."
    ),

    "Misc": (
        "Books that do not yet have a clear home.\n"
        "This is a temporary holding space, not a permanent category."
    ),

    # -------------------------
    # Literature (Level 3)
    # -------------------------
    "Classicism": (
        "Canonical works that shaped literary tradition.\n"
        "Often slow, foundational, and historically influential."
    ),

    "Modernism": (
        "Works marked by formal experimentation and interiority.\n"
        "Literature responding to rupture, fragmentation, and new ways of seeing."
    ),

    "Postmodernism": (
        "Literature that interrogates narrative, authority, and meaning itself.\n"
        "Often playful, ironic, recursive, or self-aware."
    ),

    "Magical_Realism": (
        "Narratives where the extraordinary is woven seamlessly into the ordinary.\n"
        "Neither fantasy nor realism, but a deliberate suspension between them."
    ),

    "Mythic": (
        "Stories rooted in archetype, legend, and inherited narrative forms.\n"
        "Includes epics, retellings, and works operating at symbolic scale."
    ),

    "Speculative": (
        "Literature that explores alternate realities and hypothetical worlds.\n"
        "Focused on ideas and consequences rather than genre conventions."
    ),

    "Essays": (
        "Short-form literary and philosophical reflections.\n"
        "Well-suited to digital reading and casual exploration."
    ),

    "Literary_Criticism": (
        "Works analyzing literature, narrative form, and aesthetic theory.\n"
        "Intended to deepen engagement rather than prescribe interpretation."
    ),

    "Drama": (
        "Plays and dramatic texts.\n"
        "Written for performance, dialogue, and embodied speech."
    ),

    "Poetry": (
        "Poetic works organized for slow reading and rereading.\n"
        "These texts reward attention, rhythm, and pause."
    ),
}

# ------------------------------------------------------------
# Helper: ensure directory and README (non-destructive)
# ------------------------------------------------------------
def ensure_dir_with_readme(path: Path, readme_text: str | None = None):
    path.mkdir(parents=True, exist_ok=True)
    if readme_text:
        readme_path = path / "README.md"
        if not readme_path.exists():
            readme_path.write_text(readme_text.strip() + "\n")

# ------------------------------------------------------------
# Root README
# ------------------------------------------------------------
ensure_dir_with_readme(
    books_root,
    BOOKS_READMES.get("Books"),
)

# ------------------------------------------------------------
# Build Books taxonomy + READMEs
# ------------------------------------------------------------
for level2, level3_list in BOOKS_TAXONOMY.items():
    level2_path = books_root / level2

    # Level 2 directory + README
    ensure_dir_with_readme(
        level2_path,
        BOOKS_READMES.get(level2),
    )

    # Level 3 directories + READMEs (if defined)
    for level3 in level3_list:
        level3_path = level2_path / level3
        ensure_dir_with_readme(
            level3_path,
            BOOKS_READMES.get(level3),
        )

books_root

PosixPath('/home/mcruggs/gitwork/Archive/Archive/Books')

### Code

### Compressed

### Docs

### Images

### Intake

In [None]:
  "Applications",
    "Audio",
    "Books",
    "",
    "",
    "Data",
    "",
    "",
    "",
    "Journal",
    "Movies",
    "Music",
    "Notes",
    "Projects",
    "Systems",
    "Video",
]