# Introduction to Data Validation

## What is Data Validation?

Data validation is the process of ensuring that data meets specific rules, standards, or criteria before being used in analysis or processes.  
It is a crucial step in the data lifecycle that helps ensure data quality and result reliability.

## Why Validate Data?

1. **Data Quality:** Ensures that data is correct and consistent.  
2. **Error Prevention:** Avoids incorrect analyses and decisions.  
3. **Compliance:** Ensures that data follows required rules or standards.  
4. **Efficiency:** Helps detect problems early, reducing manual correction.

## Common Types of Validation

1. **Type Validation:** Checks if values match the expected type (string, number, date, etc.).  
2. **Range Validation:** Ensures values are within acceptable limits.  
3. **Format Validation:** Ensures fields follow a specific pattern (e.g., email, ZIP code).  
4. **Uniqueness Validation:** Checks if key fields are unique and not duplicated.  
5. **Completeness Validation:** Verifies that required fields are filled.

## Practical Example


In [4]:
# Example of manual data validation

data = {
    'idade':[25, 30, -5, 40],
    'email':['joao@email.com', 'maria@email.com', 'invalido', 'pedro@email.com']
}

# Basic validation
for idade in data['idade']:
    if idade < 0:
        print(f"Invalid Age: {idade}")

for email in data['email']:
    if '@' not in email:
        print(f"Invalid Email: {email}")

Invalid Age: -5
Invalid Email: invalido


# Challenges of Manual Validation
1. Repetitive and error-prone code
2. Difficult to maintain and update
3. Lack of standardization
4. Difficulty in validating complex rules
