# Introduction to Validations in Metadata-Driven Data Management

In the realm of metadata-driven data management, validations play a crucial role in ensuring the integrity and quality of both the metadata and the actual data. Validations are checks and constraints applied to verify that the information conforms to predefined rules. In this notebook, we will explore two primary types of validations:

- **Validation of Meta-Model Constraints**: The meta-model serves as the blueprint for structuring metadata, defining the entities, attributes, and relationships within a given system. Meta-model constraints specify the rules and standards that metadata must adhere to. For instance, the author's name should be a string, ensuring consistency and clarity in documenting the metadata. Validating meta-model constraints ensures that the metadata remains accurate, well-defined, and aligned with the intended structure.

- **Validation of Actual Data Using Metadata Constraints**: While meta-model constraints govern the structure of metadata, they also serve as a foundation for validating the actual data. Metadata constraints, such as minimum and maximum values, regex patterns, or length requirements can be used to validate the underlying data. For example, if the metadata dictates that the minimum value of a price should be zero and not negative, data validation ensures that this condition holds true for every data point related to pricing. This type of validation safeguards against inconsistencies and inaccuracies in the underlying data.

In the subsequent sections, we will delve into practical examples and exercises to demonstrate how these validations can be implemented using Python and a metadata-driven approach.

## Let's instantiate the issues entity and its related objects

In [1]:
# let's instantiate the issues entty and its related objects
# importing the classesd defined in meta_model.py
import pandas as pd
from typing import Optional, List, Dict, Literal
from datetime import datetime
import warnings

import sys
from pathlib import Path
from importlib import import_module
from datetime import datetime

# Add the parent directory to sys.path
sys.path.append(str(Path.cwd().parent))

# Import modules from the parent directory
meta_model = import_module('meta_model')

from meta_model import Author, Constraint, Attribute, Entity, Relationship

warnings.filterwarnings('always', category=UserWarning)

# instantiate the Author for the metadata objects
author = Author(code="UNIMI", name="UniMi", description="Università degli Studi di Milano")

# Instantiate Constraints for Issuances Entity
isin_constraint_issuances = Constraint(code="ISIN_CONSTRAINT", name="ISIN Regex", regex_pattern="^[AZ]{2}[-]{0,1}[0-9A-Z]{8}[-]{0,1}[0-9]{1}$")
issue_date_constraint = Constraint(code="ISSUE_DATE_CONSTRAINT", name="No Constraint")
publication_price_constraint = Constraint(code="PUBLICATION_PRICE_CONSTRAINT", name="Min Value 0", min_value=0)
volume_constraint = Constraint(code="VOLUME_CONSTRAINT", name="No Constraint")
market_capitalization_constraint = Constraint(code="MARKET_CAPITALIZATION_CONSTRAINT", name="No Constraint")
issuer_lei_constraint = Constraint(code="ISSUER_LEI_CONSTRAINT", name="Max Length 20", max_length=20)

# Instantiate Attributes for Issuances Entity
isin_attribute_issuances = Attribute(code="ISIN", name="ISIN", constraint=isin_constraint_issuances)
issue_date_attribute = Attribute(code="ISSUE_DATE", name="Issue Date", constraint=issue_date_constraint)
publication_price_attribute = Attribute(code="PUBLICATION_PRICE", name="Publication Price", constraint=publication_price_constraint)
volume_attribute = Attribute(code="VOLUME", name="Volume", constraint=volume_constraint)
market_capitalization_attribute = Attribute(code="MARKET_CAPITALIZATION", name="Market Capitalization", constraint=market_capitalization_constraint)
issuer_lei_attribute = Attribute(code="ISSUER_LEI", name="Issuer LEI", constraint=issuer_lei_constraint)

# Instantiate the Issuances Entity
issuances_entity = Entity(
    code="ISSUANCES",
    name="Issuances",
    description="Information about securities issued",
    author=author,
    version="1.0",
    valid_from=datetime.now(),
)

# Add attributes using the add_attribute method
issuances_entity.add_attribute(isin_attribute_issuances)
issuances_entity.add_attribute(issue_date_attribute)
issuances_entity.add_attribute(publication_price_attribute)
issuances_entity.add_attribute(volume_attribute)
issuances_entity.add_attribute(market_capitalization_attribute)
issuances_entity.add_attribute(issuer_lei_attribute)

## Exercise 2
1. Validation of meta-model costraints: modify the Author class to add a method called validate that checks whether the name of the author is a string.
2. Validation of actual data using metadata constraints: create a class to validate an attribute that takes as input an attrbute and the respective column and row of a dataframe, focus on the publication_price and validate a row and where the publication_price is negative (e.g. row  index 6 with value -11)

In [2]:
# create class
class AuthorNew:
    """
    Represents an author of metadata objects.

    Attributes:
    - code (str): Unique code identifying the author.
    - name (str): Name of the author.
    - description (Optional[str]): Additional description about the author (optional).
    """
    def __init__(self, code: str, name: str, description: Optional[str] = None):
        self.code = code
        self.name = name
        self.description = description

    def validate(self):
        """
        Validate the author instance.

        Raises:
        - UserWarning: If the author's name is not a string.
        """
        if not isinstance(self.name, str):
            warnings.warn("Author's name should be a string.", UserWarning)

# test it with string input
author_new_string = AuthorNew("UNIMI", "UniMi")
author_new_string.validate()
            
# test it with non-string input
# test it with string input
author_new_non_string = AuthorNew("UNIMI", 2)
author_new_non_string.validate()



In [3]:
issuances_df = pd.read_csv("../data/issuances.csv", sep=";")

class AttributeValidator:
    """
    Class to validate attributes in a DataFrame.

    Attributes:
    - attribute (Attribute): The metadata attribute to be validated.
    - column (str): The column name in the DataFrame corresponding to the attribute.
    - data_frame (pd.DataFrame): The DataFrame containing the data to be validated.
    """
    def __init__(self, attribute, column, data_frame):
        self.attribute = attribute
        self.column = column
        self.data_frame = data_frame

    def validate_row(self, row_index):
        """
        Validate a specific row for the attribute in the DataFrame.

        Parameters:
        - row_index (int): The index of the row to be validated.

        Raises:
        - ValueError: If the data violates the metadata constraints for the specified row.
        """
        # Check for missing values in the specified row
        if pd.isnull(self.data_frame.at[row_index, self.column]):
            raise ValueError(f"Missing value found in column '{self.column}' at row {row_index + 1}.")

        # Check constraints for the specified row
        value = self.data_frame.at[row_index, self.column]

        # Get min_value and max_value from metadata constraints
        min_value = self.attribute.constraint.min_value
        max_value = self.attribute.constraint.max_value

        if min_value is not None and value < min_value:
            raise ValueError(f"Invalid value '{value}' in column '{self.column}' at row {row_index + 1}. Value should be greater than or equal to {min_value}.")

        if max_value is not None and value > max_value:
            raise ValueError(f"Invalid value '{value}' in column '{self.column}' at row {row_index + 1}. Value should be less than or equal to {max_value}.")

        if min_value is None and max_value is None:
            print(f"Info: Metadata constraints not specified for column '{self.column}'. No validation for min_value and max_value.")

# Example Usage
publication_price_attribute = Attribute(code="PUBLICATION_PRICE", name="Publication Price", constraint=Constraint(code="PUBLICATION_PRICE_CONSTRAINT", name="Min Value 0", min_value=0))

# Assume you have an 'issuances_dataframe' DataFrame
validator = AttributeValidator(attribute=publication_price_attribute, column='publication_price', data_frame=issuances_df)

row_to_validate = 6  # Change this to the specific row you want to validate

try:
    validator.validate_row(row_to_validate)
    print(f"Validation successful for row {row_to_validate + 1}.")
except ValueError as e:
    print(f"Validation failed for row {row_to_validate + 1}: {e}")


Validation failed for row 7: Invalid value '-11.0' in column 'publication_price' at row 7. Value should be greater than or equal to 0.
