# Database Setup for Master Birder Paper

This notebook sets up a SQLite database for the master birder paper project.
The database will be stored as a single file in the `data/` directory.

## Prerequisites

Make sure you have all dependencies installed:
```bash
pdm install
```

If you need to add new dependencies later, use:
```bash
pdm add <package-name>
```

For development dependencies:
```bash
pdm add -G dev <package-name>
```


In [3]:
import sqlite3
import os
import pandas as pd
from pathlib import Path

# Set up paths
project_root = Path.cwd().parent  # Go up one level from notebooks/
data_dir = project_root / "data"
db_path = data_dir / "master_birder.db"

print(f"Project root: {project_root}")
print(f"Data directory: {data_dir}")
print(f"Database path: {db_path}")

# Ensure data directory exists
data_dir.mkdir(exist_ok=True)
print(f"Data directory exists: {data_dir.exists()}")


Project root: /Users/ken/Documents/wk/master-birder-paper
Data directory: /Users/ken/Documents/wk/master-birder-paper/data
Database path: /Users/ken/Documents/wk/master-birder-paper/data/master_birder.db
Data directory exists: True


## Database Connection and Setup

Create a connection to the SQLite database. If the database doesn't exist, it will be created automatically.


In [4]:
# Create database connection
conn = sqlite3.connect(db_path)
cursor = conn.cursor()

print(f"Connected to database: {db_path}")
print(f"Database file exists: {db_path.exists()}")

# Enable foreign key constraints
cursor.execute("PRAGMA foreign_keys = ON")
print("Foreign key constraints enabled")


Connected to database: /Users/ken/Documents/wk/master-birder-paper/data/master_birder.db
Database file exists: True
Foreign key constraints enabled


## Database Schema Creation

This section creates the Avibase database schema based on the Mermaid diagram in `data/avibase.schema.mer`.

### Avibase Schema

The database includes the following tables:
- **AvibaseID**: Main table with Avibase IDs and concept labels
- **ParentChildRelationships**: Hierarchical relationships between concepts
- **OriginalConcepts**: Links Avibase IDs to original concept IDs
- **TaxanomicConcepts**: Taxonomic information including scientific and common names
- **NameConcepts**: Nomenclatural information (protonyms, authors, publications)
- **LifeHistory**: Life history traits and characteristics
- **GeoGraphicRange**: Geographic distribution information
- **OtherRelationships**: Various relationship types between concepts
- **Synonyms**: Multilingual synonym information


In [5]:
# Avibase schema creation
# Based on the Mermaid diagram in data/avibase.schema.mer

create_tables_sql = """
-- Avibase Database Schema
-- Based on the Mermaid diagram in data/avibase.schema.mer

-- Main Avibase ID table
CREATE TABLE IF NOT EXISTS AvibaseID (
    avibase_id TEXT PRIMARY KEY,
    concept_label TEXT
);

-- Parent-Child relationships table
CREATE TABLE IF NOT EXISTS ParentChildRelationships (
    avibase_id TEXT,
    version TEXT,
    parent_id TEXT,
    fract_weight REAL,
    PRIMARY KEY (avibase_id, version),
    FOREIGN KEY (avibase_id) REFERENCES AvibaseID (avibase_id),
    FOREIGN KEY (parent_id) REFERENCES AvibaseID (avibase_id)
);

-- Original concepts linking table
CREATE TABLE IF NOT EXISTS OriginalConcepts (
    avibase_id TEXT,
    concept_id TEXT,
    PRIMARY KEY (avibase_id, concept_id),
    FOREIGN KEY (avibase_id) REFERENCES AvibaseID (avibase_id)
);

-- Taxonomic concepts table
CREATE TABLE IF NOT EXISTS TaxanomicConcepts (
    concept_id TEXT PRIMARY KEY,
    avibase_id TEXT,
    taxon_name_id TEXT,
    authority TEXT,
    scientific_name TEXT,
    common_name TEXT,
    higher_classification TEXT,
    FOREIGN KEY (avibase_id) REFERENCES AvibaseID (avibase_id),
    FOREIGN KEY (taxon_name_id) REFERENCES NameConcepts (taxon_name_id)
);

-- Name concepts table
CREATE TABLE IF NOT EXISTS NameConcepts (
    taxon_name_id TEXT PRIMARY KEY,
    protonym TEXT,
    authors TEXT,  -- Storing as TEXT since SQLite doesn't have native array support
    year INTEGER,
    publication_source TEXT,
    tsn TEXT
);

-- Life history traits table
CREATE TABLE IF NOT EXISTS LifeHistory (
    avibase_id TEXT,
    trait TEXT,
    reference TEXT,
    value TEXT,
    PRIMARY KEY (avibase_id, trait, reference),
    FOREIGN KEY (avibase_id) REFERENCES AvibaseID (avibase_id)
);

-- Geographic range table
CREATE TABLE IF NOT EXISTS GeoGraphicRange (
    avibase_id TEXT,
    region TEXT,
    status TEXT,  -- Using TEXT instead of ENUM for SQLite compatibility
    PRIMARY KEY (avibase_id, region),
    FOREIGN KEY (avibase_id) REFERENCES AvibaseID (avibase_id)
);

-- Other relationships table
CREATE TABLE IF NOT EXISTS OtherRelationships (
    avibase_id TEXT,
    related_id TEXT,
    relationship_type TEXT,  -- Using TEXT instead of ENUM for SQLite compatibility
    PRIMARY KEY (avibase_id, related_id, relationship_type),
    FOREIGN KEY (avibase_id) REFERENCES AvibaseID (avibase_id),
    FOREIGN KEY (related_id) REFERENCES AvibaseID (avibase_id)
);

-- Synonyms table
CREATE TABLE IF NOT EXISTS Synonyms (
    avibase_id TEXT,
    language TEXT,
    synonym TEXT,
    reference TEXT,
    PRIMARY KEY (avibase_id, language, synonym),
    FOREIGN KEY (avibase_id) REFERENCES AvibaseID (avibase_id)
);

-- Create indexes for better performance
CREATE INDEX IF NOT EXISTS idx_taxonomic_concepts_avibase_id ON TaxanomicConcepts (avibase_id);
CREATE INDEX IF NOT EXISTS idx_taxonomic_concepts_taxon_name_id ON TaxanomicConcepts (taxon_name_id);
CREATE INDEX IF NOT EXISTS idx_parent_child_parent_id ON ParentChildRelationships (parent_id);
CREATE INDEX IF NOT EXISTS idx_life_history_avibase_id ON LifeHistory (avibase_id);
CREATE INDEX IF NOT EXISTS idx_geographic_range_avibase_id ON GeoGraphicRange (avibase_id);
CREATE INDEX IF NOT EXISTS idx_other_relationships_avibase_id ON OtherRelationships (avibase_id);
CREATE INDEX IF NOT EXISTS idx_other_relationships_related_id ON OtherRelationships (related_id);
CREATE INDEX IF NOT EXISTS idx_synonyms_avibase_id ON Synonyms (avibase_id);
"""

# Execute the schema creation
cursor.executescript(create_tables_sql)
conn.commit()

print("Avibase database schema created successfully!")
print("Tables created:")

# List all tables
cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%';")
tables = cursor.fetchall()
for table in tables:
    print(f"  - {table[0]}")

print(f"\nTotal tables created: {len(tables)}")


Avibase database schema created successfully!
Tables created:
  - AvibaseID
  - ParentChildRelationships
  - OriginalConcepts
  - TaxanomicConcepts
  - NameConcepts
  - LifeHistory
  - GeoGraphicRange
  - OtherRelationships
  - Synonyms

Total tables created: 9


## Database Verification

Verify that the database was created correctly and show the table structures.


In [10]:
# Show table schemas
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = cursor.fetchall()

for table in tables:
    table_name = table[0]
    print(f"\nTable: {table_name}")
    print("-" * 50)
    
    # Get table schema
    cursor.execute(f"PRAGMA table_info({table_name})")
    columns = cursor.fetchall()
    
    for col in columns:
        col_id, name, data_type, not_null, default_val, pk = col
        pk_str = " (PRIMARY KEY)" if pk else ""
        not_null_str = " NOT NULL" if not_null else ""
        default_str = f" DEFAULT {default_val}" if default_val else ""
        print(f"  {name}: {data_type}{not_null_str}{default_str}{pk_str}")

# Show database file size
if db_path.exists():
    file_size = db_path.stat().st_size
    print(f"\nDatabase file size: {file_size:,} bytes ({file_size/1024:.1f} KB)")



Table: AvibaseID
--------------------------------------------------
  avibase_id: TEXT (PRIMARY KEY)
  concept_label: TEXT

Table: ParentChildRelationships
--------------------------------------------------
  avibase_id: TEXT (PRIMARY KEY)
  version: TEXT (PRIMARY KEY)
  parent_id: TEXT
  fract_weight: REAL

Table: OriginalConcepts
--------------------------------------------------
  avibase_id: TEXT (PRIMARY KEY)
  concept_id: TEXT (PRIMARY KEY)

Table: TaxanomicConcepts
--------------------------------------------------
  concept_id: TEXT (PRIMARY KEY)
  avibase_id: TEXT
  taxon_name_id: TEXT
  authority: TEXT
  scientific_name: TEXT
  common_name: TEXT
  higher_classification: TEXT

Table: NameConcepts
--------------------------------------------------
  taxon_name_id: TEXT (PRIMARY KEY)
  protonym: TEXT
  authors: TEXT
  year: INTEGER
  publication_source: TEXT
  tsn: TEXT

Table: LifeHistory
--------------------------------------------------
  avibase_id: TEXT (PRIMARY KEY)
  tra

## Cleanup

Close the database connection when done.


In [17]:
# Close the connection
conn.close()
print("Database connection closed.")
print(f"Database file saved at: {db_path}")
print(f"File size: {db_path.stat().st_size:,} bytes")


Database connection closed.
Database file saved at: /Users/ken/Documents/wk/master-birder-paper/data/master_birder.db
File size: 118,784 bytes


## Next Steps

1. **Provide Schema**: When you're ready, provide the Mermaid diagram of your database schema
2. **Update Schema**: Replace the placeholder schema in this notebook with your actual schema
3. **Data Migration**: If you have existing data, create migration scripts
4. **Integration**: Use the helper functions in your analysis notebooks

## Dependencies

Current dependencies are sufficient for SQLite database operations:
- `sqlite3` (built into Python)
- `pandas` (for data manipulation)
- `pathlib` (for path handling)

No additional PDM dependencies needed for basic SQLite functionality.
