A highly customizable Python framework for validating database tables. Easily add your own validation rules to meet any data quality requirement. Designed for extensibility, performance, and comprehensive reporting.
- Multi-Database Support: PostgreSQL, MySQL, SQLite, SQL Server, and more.
- Chunked Data Processing: Efficiently handles large tables by processing data in chunks.
- Extensible Validation Rules: Add or modify validation logic by simply creating new rule classes.
- Duplicate Detection: Advanced duplicate analysis with detailed reporting.
- Comprehensive Excel Reporting: Auto-generates summary, detailed, and failure reports.
- Early Stopping: Stops validation on first failure for fast feedback.
- Logging: Built-in logging for traceability and debugging.
- Easy Integration: Modular design for seamless integration into any Python project.
You can install the package from the source using pip:
pip install .
For development, you can install it in editable mode:
pip install -e .
from db_validation import DatabaseConfig, DatabaseConnector, ValidationEngine
from db_validation.rules.basic_rules import NonASCIIRule, NullValueRule
from db_validation.rules.email_rule import EmailValidationRule
from db_validation.reporting.excel_reporter import ExcelReporter
# Configure your database connection
config = DatabaseConfig(
host="localhost",
port=5432,
database="your_db",
username="your_user",
password="your_password",
driver="postgresql"
)
db_connector = DatabaseConnector(config)
engine = ValidationEngine(db_connector)
# Add built-in or custom rules
engine.add_rule(NonASCIIRule())
engine.add_rule(NullValueRule())
engine.add_rule(EmailValidationRule())
# Validate tables
results = engine.validate_tables(["users", "orders"])
# Generate Excel report
reporter = ExcelReporter(results)
report_path = reporter.generate_report("validation_report.xlsx")
db_connector.close()
print(f"Validation complete. Report at: {report_path}")
To add your own validation logic, subclass ValidationRule
from rules/base.py
:
from db_validation.rules.base import ValidationRule
import pandas as pd
class MyCustomRule(ValidationRule):
def __init__(self):
super().__init__(name="my_custom_rule", description="Describe your rule here")
def validate(self, data: pd.Series):
# Your validation logic
return True, [], []
Register your rule with the engine:
engine.add_rule(MyCustomRule())
- NonASCIIRule: Detects non-ASCII characters in text columns.
- NullValueRule: Checks for null or empty values.
- DataTypeConsistencyRule: Ensures consistent data types within columns.
- EmailValidationRule: Validates email format using regex.
- DuplicateValidationRule: Detects duplicate records based on primary key(s).
- ExcelReporter: Generates summary, detailed, and failure sheets.
- EnhancedDuplicateReportGenerator: (Optional) For advanced duplicate analysis.
- Add new rules in
src/db_validation/rules/
. - Add new report types in
src/db_validation/reporting/
. - Utilities and logging in
src/db_validation/utils/
.
Unit tests are provided in tests/test_rules.py
:
python -m unittest discover tests
Vinay Reddy
*This framework is designed for maximum flexibility—add any validation logic you need, and generate actionable reports for your