Skip to content

reddy0852/Dynamic-Database-Validation-Framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Database Validation Framework

A highly customizable Python framework for validating database tables. Easily add your own validation rules to meet any data quality requirement. Designed for extensibility, performance, and comprehensive reporting.


Features

  • Multi-Database Support: PostgreSQL, MySQL, SQLite, SQL Server, and more.
  • Chunked Data Processing: Efficiently handles large tables by processing data in chunks.
  • Extensible Validation Rules: Add or modify validation logic by simply creating new rule classes.
  • Duplicate Detection: Advanced duplicate analysis with detailed reporting.
  • Comprehensive Excel Reporting: Auto-generates summary, detailed, and failure reports.
  • Early Stopping: Stops validation on first failure for fast feedback.
  • Logging: Built-in logging for traceability and debugging.
  • Easy Integration: Modular design for seamless integration into any Python project.

Installation

You can install the package from the source using pip:

pip install .

For development, you can install it in editable mode:

pip install -e .

Usage Example

from db_validation import DatabaseConfig, DatabaseConnector, ValidationEngine
from db_validation.rules.basic_rules import NonASCIIRule, NullValueRule
from db_validation.rules.email_rule import EmailValidationRule
from db_validation.reporting.excel_reporter import ExcelReporter

# Configure your database connection
config = DatabaseConfig(
    host="localhost",
    port=5432,
    database="your_db",
    username="your_user",
    password="your_password",
    driver="postgresql"
)

db_connector = DatabaseConnector(config)
engine = ValidationEngine(db_connector)

# Add built-in or custom rules
engine.add_rule(NonASCIIRule())
engine.add_rule(NullValueRule())
engine.add_rule(EmailValidationRule())

# Validate tables
results = engine.validate_tables(["users", "orders"])

# Generate Excel report
reporter = ExcelReporter(results)
report_path = reporter.generate_report("validation_report.xlsx")

db_connector.close()
print(f"Validation complete. Report at: {report_path}")

Adding Custom Validation Rules

To add your own validation logic, subclass ValidationRule from rules/base.py:

from db_validation.rules.base import ValidationRule
import pandas as pd

class MyCustomRule(ValidationRule):
    def __init__(self):
        super().__init__(name="my_custom_rule", description="Describe your rule here")

    def validate(self, data: pd.Series):
        # Your validation logic
        return True, [], []

Register your rule with the engine:

engine.add_rule(MyCustomRule())

Built-in Validation Rules

  • NonASCIIRule: Detects non-ASCII characters in text columns.
  • NullValueRule: Checks for null or empty values.
  • DataTypeConsistencyRule: Ensures consistent data types within columns.
  • EmailValidationRule: Validates email format using regex.
  • DuplicateValidationRule: Detects duplicate records based on primary key(s).

Reporting

  • ExcelReporter: Generates summary, detailed, and failure sheets.
  • EnhancedDuplicateReportGenerator: (Optional) For advanced duplicate analysis.

Extending the Framework


Testing

Unit tests are provided in tests/test_rules.py:

python -m unittest discover tests

Author

Vinay Reddy


*This framework is designed for maximum flexibility—add any validation logic you need, and generate actionable reports for your

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages