The code starts by importing the os module and checking the current working directory using the %pwd magic command.

The current working directory is changed to the parent directory using os.chdir("../"), and the updated working directory is displayed using %pwd.

The code imports necessary modules and classes, including dataclass from dataclasses, Path from pathlib, and DataValidationConfig from the current project.

The ConfigurationManager class is defined. It initializes with file paths and configurations, reads the YAML files, and creates necessary directories.

The get_data_validation_config method within ConfigurationManager retrieves the data validation configuration from the YAML file and returns an instance of DataValidationConfig.

The DataValiadtion class is defined. It takes a DataValidationConfig object as input.

The validate_all_files_exist method within DataValiadtion performs the validation of required files' existence. It checks if all the files in the specified directory exist and updates the validation status accordingly in the status file.

The code creates an instance of ConfigurationManager, retrieves the data validation configuration, creates an instance of DataValiadtion, and performs the file validation.

Any exceptions that occur during the execution are raised and propagated.

Overall, the code sets up the data validation configuration, performs the validation of required files' existence, and updates the validation status.

In [1]:
import os

In [2]:
%pwd

'f:\\artificial intelegnce\\study\\ML End To End Projects Krish Naik\\github\\Text-Summarizer-Project\\research'

In [3]:
os.chdir("../")

In [4]:
%pwd

'f:\\artificial intelegnce\\study\\ML End To End Projects Krish Naik\\github\\Text-Summarizer-Project'

In [5]:
from dataclasses import dataclass
from pathlib import Path


@dataclass(frozen=True)
class DataValidationConfig:
    root_dir: Path
    STATUS_FILE: str
    ALL_REQUIRED_FILES: list

In [6]:
from textSummarizer.constants import *
from textSummarizer.utils.common import read_yaml, create_directories

In [7]:
class ConfigurationManager:
    def __init__(
        self,
        config_filepath = CONFIG_FILE_PATH,
        params_filepath = PARAMS_FILE_PATH):

        self.config = read_yaml(config_filepath)
        self.params = read_yaml(params_filepath)

        create_directories([self.config.artifacts_root])


    
    def get_data_validation_config(self) -> DataValidationConfig:
        config = self.config.data_validation

        create_directories([config.root_dir])

        data_validation_config = DataValidationConfig(
            root_dir=config.root_dir,
            STATUS_FILE=config.STATUS_FILE,
            ALL_REQUIRED_FILES=config.ALL_REQUIRED_FILES,
        )

        return data_validation_config

In [8]:
import os
from textSummarizer.logging import logger

In [9]:
class DataValiadtion:
    def __init__(self, config: DataValidationConfig):
        self.config = config


    
    def validate_all_files_exist(self)-> bool:
        try:
            validation_status = None

            all_files = os.listdir(os.path.join("artifacts","data_ingestion","samsum_dataset"))

            for file in all_files:
                if file not in self.config.ALL_REQUIRED_FILES:
                    validation_status = False
                    with open(self.config.STATUS_FILE, 'w') as f:
                        f.write(f"Validation status: {validation_status}")
                else:
                    validation_status = True
                    with open(self.config.STATUS_FILE, 'w') as f:
                        f.write(f"Validation status: {validation_status}")

            return validation_status
        
        except Exception as e:
            raise e

In [10]:
try:
    config = ConfigurationManager()
    data_validation_config = config.get_data_validation_config()
    data_validation = DataValiadtion(config=data_validation_config)
    data_validation.validate_all_files_exist()
except Exception as e:
    raise e

[2023-06-19 19:26:10,179: INFO: common: yaml file: config\config.yaml loaded successfully]
[2023-06-19 19:26:10,185: INFO: common: yaml file: params.yaml loaded successfully]
[2023-06-19 19:26:10,186: INFO: common: created directory at: artifacts]
[2023-06-19 19:26:10,188: INFO: common: created directory at: artifacts/data_validation]


import os: Importing the os module for operating system related functionalities.

%pwd: A Jupyter magic command that retrieves the current working directory and displays it as output.

os.chdir("../"): Changing the current working directory to the parent directory.

%pwd: Retrieving the current working directory again to verify the change.

from dataclasses import dataclass: Importing the dataclass decorator from the dataclasses module.

from pathlib import Path: Importing the Path class from the pathlib module.

@dataclass(frozen=True): Decorating the DataValidationConfig class with dataclass and setting frozen=True to make it immutable.

class DataValidationConfig: Defining the DataValidationConfig class with attributes root_dir, STATUS_FILE, and ALL_REQUIRED_FILES.

from textSummarizer.constants import *: Importing all constants from the textSummarizer.constants module.

from textSummarizer.utils.common import read_yaml, create_directories: Importing the read_yaml and create_directories functions from the textSummarizer.utils.common module.

class ConfigurationManager: Defining the ConfigurationManager class responsible for managing the project's configuration.

The __init__ method within ConfigurationManager initializes the class instance with config_filepath and params_filepath parameters and reads YAML files using the read_yaml function. It also creates necessary directories using the create_directories function.

get_data_validation_config method within ConfigurationManager retrieves the data validation configuration from the YAML file. It creates necessary directories and returns an instance of DataValidationConfig with the configuration values.

import os: Importing the os module.

from textSummarizer.logging import logger: Importing the logger object from the textSummarizer.logging module.

class DataValiadtion: Defining the DataValiadtion class responsible for validating the required files' existence.

The __init__ method within DataValiadtion initializes the class instance with a DataValidationConfig object.

The validate_all_files_exist method within DataValiadtion performs the validation of required files' existence. It uses os.listdir to get a list of all files in the specified directory. It checks if each file is in the ALL_REQUIRED_FILES list of the configuration. If any file is missing, it updates the validation status to False and writes the status to the STATUS_FILE. Otherwise, it updates the status to True.

The try block starts, where the configuration manager is created, the data validation configuration is retrieved, an instance of DataValiadtion is created, and the validate_all_files_exist method is called.

If any exception occurs during the execution, it is caught in the except block, and the exception is raised again.

Overall, the code snippet sets up the configuration, validates the existence of required files, and writes the validation status to a file.




