A comprehensive CLI tool for extracting database schemas and data from MariaDB servers. Designed for efficient database migration, development environment setup, and backup operations with advanced features including foreign key preservation, selective data extraction, and progressive processing.
- Extract: Database metadata extraction with detailed table information
- DDL: Complete schema extraction with CREATE TABLE statements
- Dump: Traditional full database backup using mysqldump
- Data: Advanced selective data extraction with foreign key preservation
- Foreign key dependency resolution with topological sorting
- Configurable data sampling (percentage or fixed row counts)
- Progressive extraction with resume capability for large datasets
- Pattern-based table filtering and exclusion
- Batch processing for optimal performance
- Docker-based execution with zero local dependencies
# Clone the repository
git clone https://github.com/rafaelkamimura/mariadb-extractor.git
cd mariadb-extractor
# Build the Docker image
make build# Requires Go 1.25 or later
go build -o mariadb-extractor# Create configuration from template
make env-example
# Edit .env with your database credentials
vim .envExample .env configuration:
MARIADB_HOST=your-database-host
MARIADB_PORT=3306
MARIADB_USER=your-username
MARIADB_PASSWORD=your-password
MARIADB_OUTPUT_PREFIX=extraction# Recommended: Complete pipeline with sample data (fastest)
make pipeline
# Alternative: Full data extraction
make pipeline-full
# Custom: Specific databases and tables
make pipeline-custom ARGS="--databases db1,db2 --sample-tables users:1000"The data command provides advanced selective extraction with foreign key preservation:
# Extract with 10% sampling
./mariadb-extractor data --all-user-databases --sample-percent 10
# Extract specific databases
./mariadb-extractor data --databases db1,db2 --exclude-tables "*_log,*_audit"
# Custom sampling per table
./mariadb-extractor data \
--databases myapp \
--sample-tables "users:1000,orders:5000" \
--exclude-tables "*_history"
# Resume interrupted extraction
./mariadb-extractor data --resume extraction-id| Flag | Description | Default |
|---|---|---|
--all-user-databases |
Extract all non-system databases | - |
--databases |
Comma-separated list of databases | - |
--exclude-tables |
Pattern-based table exclusion | - |
--sample-percent |
Global sampling percentage (0-100) | 0 |
--sample-tables |
Per-table row limits (table:count) | - |
--chunk-size |
Rows per chunk for large tables | 10000 |
--batch-size |
INSERT statement batch size | 100 |
--timeout |
Query timeout in seconds | 300 |
--resume |
Resume from previous extraction | - |
Extract complete database schemas:
# Extract all user databases
./mariadb-extractor ddl --all-user-databases
# Extract specific databases
./mariadb-extractor ddl --databases db1,db2Output:
output/mariadb-ddl.md- Formatted documentationoutput/init-scripts/01-extracted-schema.sql- Executable SQL script
Full database backup using mysqldump:
# Dump all user databases
./mariadb-extractor dump --all-user-databases
# Schema only
./mariadb-extractor dump --all-databases --schema-only
# Compressed output
./mariadb-extractor dump --all-databases --compressExtract database and table metadata:
# Extract all databases
./mariadb-extractor extract --all-databases
# Generate JSON output
./mariadb-extractor extract --output metadata| Target | Description |
|---|---|
make pipeline |
Complete pipeline with 10% sample data |
make pipeline-full |
Complete pipeline with full data |
make pipeline-custom |
Pipeline with custom extraction parameters |
| Target | Description |
|---|---|
make ddl |
Extract database schemas |
make setup-from-ddl |
Initialize local database with schema |
make extract-data-sample |
Extract 10% sample data |
make extract-data-full |
Extract complete data |
make seed-dev-data |
Import extracted data to local database |
| Target | Description |
|---|---|
make setup-dev |
Start local MariaDB and Adminer |
make dev-db-connect |
Connect to local database via CLI |
make status |
Show service and file status |
make clean |
Clean generated files and containers |
# 1. Configure connection
cp .env.example .env
vim .env
# 2. Run pipeline (DDL -> Setup -> Extract -> Seed)
make pipeline
# 3. Access database
# Web UI: http://localhost:8080
# CLI: make dev-db-connect# Extract complete production data
make pipeline-full
# Or manually with custom settings
make ddl
make setup-from-ddl
make extract-data-custom ARGS="--databases prod_db --max-rows 50000"
make seed-dev-data# Extract specific tables with sampling
docker run --rm \
--env-file .env \
-v $(pwd):/app/output \
mariadb-extractor data \
--databases users_db,orders_db \
--sample-tables "users:10000,orders:50000,products:all" \
--exclude-tables "*_temp,*_backup"mariadb-extractor/
├── cmd/
│ ├── root.go # CLI root command
│ ├── extract.go # Metadata extraction
│ ├── ddl.go # Schema extraction
│ ├── dump.go # Full backup
│ └── data.go # Selective data extraction
├── internal/
│ └── config/
│ └── env.go # Environment configuration
├── output/ # Generated files
│ └── init-scripts/
│ └── *.sql # Database initialization scripts
└── docker-compose.yml
The data command implements a sophisticated extraction pipeline:
- Schema Analysis: Discovers foreign key relationships
- Dependency Resolution: Topological sort for correct table ordering
- Extraction Planning: Optimizes based on table sizes and sampling
- Progressive Extraction: Chunks large tables with progress tracking
- Data Generation: Creates optimized INSERT statements
- Automatic dependency detection via
information_schema - Topological sorting ensures correct extraction order
SET FOREIGN_KEY_CHECKS=0/1wrapper for safe imports- Preserves referential integrity across sampled data
| Variable | Description | Default |
|---|---|---|
MARIADB_HOST |
Database host | localhost |
MARIADB_PORT |
Database port | 3306 |
MARIADB_USER |
Database username | - |
MARIADB_PASSWORD |
Database password | - |
MARIADB_OUTPUT_PREFIX |
Output file prefix | mariadb-extract |
MARIADB_TIMEOUT |
Query timeout (seconds) | 300 |
MARIADB_CHUNK_SIZE |
Rows per chunk | 10000 |
MARIADB_BATCH_SIZE |
Batch insert size | 100 |
- MariaDB: Local database instance (port 3307)
- Adminer: Web-based database management (port 8080)
- Extractor: Main application container
- Chunked Processing: Configurable chunk size for memory efficiency
- Batch Inserts: Reduces I/O with configurable batch sizes
- Progress Tracking: Resume capability for interrupted extractions
- Connection Pooling: Optimized database connections
- Percentage Sampling: Consistent sampling across all tables
- Fixed Row Counts: Specific limits per table
- Pattern Exclusion: Skip log, audit, and temporary tables
- Foreign Key Preservation: Maintains relationships in sampled data
output/mariadb-ddl.md: Human-readable schema documentationoutput/init-scripts/01-extracted-schema.sql: Complete DDL statements
output/data-extract.sql: INSERT statements with datadata-extract.progress: Resume tracking file
output/mariadb-extract.md: Formatted database informationoutput/mariadb-extract.json: Structured metadata
Connection Timeout
# Increase timeout
export MARIADB_TIMEOUT=600Foreign Key Errors
- Handled automatically with
SET FOREIGN_KEY_CHECKS=0 - Tables extracted in dependency order
Large Dataset Memory Issues
# Reduce chunk size
export MARIADB_CHUNK_SIZE=5000Resume Failed Extraction
# Check progress file
ls *.progress
# Resume with ID
make extract-data-resume# Install dependencies
go mod download
# Build binary
go build -o mariadb-extractor
# Run tests
go test ./...# Build image
docker build -t mariadb-extractor .
# Run with local code
docker run --rm -v $(pwd):/app/output mariadb-extractor --help- No hardcoded credentials in source code
- Environment-based configuration
- Secure password handling via temporary config files
- Optional table exclusion for sensitive data
- Pattern-based filtering for PII protection
- Docker Engine 20.10+
- Docker Compose 1.29+
- Go 1.25+
- MariaDB client tools (for dump command)
- Network access to MariaDB server
MIT License - See LICENSE file for details
Contributions are welcome. Please ensure:
- Code follows Go best practices
- Tests pass (
go test ./...) - Docker build succeeds
- Documentation is updated
For issues, feature requests, or questions, please open an issue on GitHub.