Skip to content

rbmathis/Animalia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚠️ CRITICAL: Disable C# Extensions in VSCode

Before opening this repository, disable all C# extensions (C# Dev Kit, Omnisharp, etc.). With 161K files and 3.8M lines of code, active C# extensions will attempt to load the entire symbol tree and language analysis, which will make your machine very unhappy.

To disable: Extensions → Search "C#" → Click ⚙️ → Disable


🦁 Animal Kingdom

A C# taxonomy codebase with 160,000+ species files — built to demonstrate GitHub Copilot on large repositories.

Why This Exists

Large repos challenge AI coding assistants. This one tests how well Copilot handles:

  • Massive file counts (161K files!)
  • Deep folder hierarchies (taxonomic ranks)
  • Consistent but varied code patterns

The key insight: Copilot customizations matter more at scale. See .github/copilot-instructions.md for the repo-specific context that makes Copilot effective here.

📊 Repository Stats

Language Files Lines Code Comments Blanks
C# 146,082 3,587,144 1,390,952 1,457,176 739,016
MSBuild 15,579 218,103 218,103 0 0
Markdown 2 162 122 0 40
Total 161,663 3,805,409 1,609,177 1,457,176 739,056

📦 123.6 MB of source code

💰 Estimated Cost to Develop: $62.9M • ⏱️ Schedule: 66 months • 👥 Team: 84 people

Metrics via scc using COCOMO model


Structure

root/Metazoa/Chordata/Mammalia/Carnivora/Canidae/Canis/
├── Canis.cs          # Abstract genus class
├── ICanis.cs         # Genus interface
├── Canis_lupus.cs    # Species (Wolf)
└── Canis_latrans.cs  # Species (Coyote)

DRY Taxonomy: Folder Structure as Navigation

The folder hierarchy mirrors the actual biological taxonomic rank exactly. This means knowledge of the rank is encoded in the filesystem itself — Copilot doesn't randomize or need to look up hierarchy.

Given any file path, Copilot can deterministically derive:

  • The taxonomic rank (folder depth: Kingdom → Phylum → Class → Order → Family → Genus → Species)
  • The namespace (path converted to dotted notation)
  • The parent class (folder one level up)
  • Sibling genera and families (folder neighbors)

For example, from root/Metazoa/Chordata/Mammalia/Carnivora/Canidae/Canis/Canis_lupus.cs:

Knowledge Derivation
Rank 7 levels deep = Species
Parent Folder ../ = Canis class
Family Folder ../../ = Canidae
Order Folder ../../../ = Carnivora
Namespace Path with /. = AnimalKingdom.root.Metazoa.Chordata.Mammalia.Carnivora.Canidae.Canis

This DRY principle eliminates ambiguity — there's no metadata lookup needed to understand hierarchical relationships, just filesystem traversal. It makes navigation predictable and enables Copilot to reliably construct paths, derive inheritance, and understand scope without randomized searches.

Copilot Customizations Used

Customization Purpose
copilot-instructions.md Explains structure, file patterns, key fields
Consistent naming Genus_species.cs pattern aids completion
XML doc comments Rich context for Copilot to reference
IsEnriched field Distinguishes stubs from real data

� Instructions for Working with Code

Instructions provide file-type-specific guidance for editing and maintaining the codebase.

Instruction Applies To Purpose
Breadcrumb Instructions **/breadcrumb.md YAML metadata structure, navigation patterns, taxonomy-level field conventions
C# File Instructions **/*.cs Namespace conventions, file type patterns, inheritance hierarchy, common patterns
Interface Instructions **/I[A-Z]*.cs Interface contracts, behavioral patterns, genus/family interface design
Species Instructions **/*_*.cs Species file structure, property definitions, enrichment flags, conservation status

Usage: When editing a file, check the applicable instruction for conventions, patterns, and required fields.


🎯 Skills for Domain Tasks

Skills are task-focused utilities for solving specific problems within the repository.

Skill Purpose When to Use
Pet Lookup Find species commonly kept as pets Searching for domestic/pet animals, comparing amenability-to-captivity across families
Species Lookup Find specific species by name, common name, or TaxId Locating individual species, checking properties like conservation status or lifespan
Interface Validation Validate that species and genus classes implement required interfaces Ensuring interface compliance across large taxa, bulk validation tasks
Breadcrumb Traversal Use breadcrumb metadata for efficient navigation Finding related taxa, cross-cutting queries, avoiding deep file scans
Breadcrumb Creation Generate and maintain breadcrumb metadata Creating new taxa, updating taxonomy levels, aggregating species data

Usage: When facing a repository task, check the relevant skill for the recommended approach and query strategies.


🧪 Measure Copilot Activity: Reusable Harnesses

This repository includes two production-grade harnesses for measuring and benchmarking Copilot behavior on large codebases:

Harness Technology Best For
CLI-Based PowerShell + Copilot CLI Cross-platform, detailed resource metrics
SDK-Based Node.js + Copilot SDK Direct programmatic access, lower latency

Both harnesses:

  • Execute identical scenarios across your codebase
  • Collect identical metrics (tool calls, file access, tokens, execution time)
  • Track whether breadcrumbs were used
  • Measure Copilot's efficiency navigating your repository
  • Generate comparable JSON/CSV results

See copilot-harness.md for detailed setup, architecture comparison, and how to run benchmarks. If you've already run the harnesses, check the comparison guide and results analysis.


�🐾 Demo: Pet Species Lookup

See Copilot navigate 160K files in seconds!

This repo includes a pet lookup feature that demonstrates breadcrumb-based navigation. Instead of scanning thousands of files, Copilot uses metadata tags to instantly find pet species:

# Find all pet-containing taxa in one command
grep "has-pets" root/**/breadcrumb.md

Result: 14 breadcrumbs tagged, covering dogs, cats, hamsters, rabbits, guinea pigs, chinchillas, ferrets, goldfish, and budgerigars.

Query Method Time
"Find pet mammals" grep "has-pets" <1s
"Is Felis catus a pet?" Read genus breadcrumb <1s

Try it: Ask Copilot "Can you recommend a pet for my kid that lives in an apartment?"

📄 Read the full pets.md writeup →


VS Code Setup

⚠️ Important: This repo has 15,579 .csproj files. Extensions will try to scan them all, causing VS Code to hang.

Quick Start

Run the included script to disable problematic extensions for this workspace:

Windows (PowerShell):

.\.vscode\disable-extensions.ps1

Linux/macOS:

chmod +x .vscode/disable-extensions.sh
./.vscode/disable-extensions.sh

Use --global or -Global flag to disable extensions globally instead of per-workspace.

What's in .vscode/

File Purpose
settings.json Disables C# project discovery, OmniSharp, file watchers
extensions.json Lists 38 extensions to disable for this workspace
disable-extensions.ps1 PowerShell script to apply extension disabling
disable-extensions.sh Bash script for Linux/macOS

Key Settings Applied

// .vscode/settings.json highlights
{
    "omnisharp.autoStart": false,              // Disable OmniSharp
    "dotnet.defaultSolution": "disable",       // Disable solution discovery
    "files.watcherExclude": { "**/root/**": true }  // Skip file watching
}

Extensions Disabled

The scripts disable these extension categories:

Category Extensions
C#/.NET ms-dotnettools.csharp, csdevkit, vscode-dotnet-runtime
Azure All ms-azuretools.* extensions (12 total)
Python ms-python.python, pylance, debugpy, ruff
Web/JS vscode.typescript-language-features, eslint, prettier, tailwindcss
Other vue.volar, playwright, emmet, and more

See .vscode/extensions.json for the complete list.

Acknowledgments

Breadcrumb Navigation: This repo's breadcrumb metadata approach was inspired by @ekuris-repos's excellent markdown-frontmatter pattern in the Swarm project. Their implementation demonstrates how YAML frontmatter can elegantly organize and aggregate hierarchical data across large codebases.

Generated By

AnimalKingdomGenerator — uses NCBI taxonomy + Wikidata + Copilot SDK.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages