## Demos of ErrorHandle and Preflight Checks
This notebook demonstrates three ways of messaging a script's user regarding error conditions including performing preflight checks on an input DataFrame as a special case. The ErrorHandle class manages checking and reporting ad hoc conditions within a program. Preflight (preflight.py) is a special case whose methods use ErrorHandle to report problems from prechecks of input files (preflight.CheckExcelFiles class) and input DataFrames (preflight.CheckDataFrame class). All use cases rely on the table in admin file ErrorCodes.xlsx to look up messages. </br></br>
JDL 2/12/24; Version 8/30/24

In [1]:
import os, sys
# Suppress Intel MKL warning with latest Pandas
os.environ['MKL_SERVICE_FORCE_INTEL'] = '1'

import pandas as pd
import logging
logging.basicConfig(level=logging.ERROR, filename='demo.log', format='%(message)s')

#Use ErrorCodes.xlsx in the libs directory
path_libs = os.getcwd() + os.sep + 'libs' + os.sep
path_err_codes = os.getcwd() + os.sep + 'libs' + os.sep

#Add the libs and tests subdirectory to sys.path
if not path_err_codes in sys.path: sys.path.append(path_err_codes)
if not path_libs in sys.path: sys.path.append(path_libs)

#Import needed modules
from libs.error_handling import ErrorHandle
from libs.preflight import CheckDataFrame
from libs.projtables import Table
from libs.projtables import ProjectTables
import util

### Case 1: Error Handling in procedural code
* Instance ErrorHandle class to use for checking conditions during processing
* Set .Locn for error message lookup by code. Base and error-specific rows are in ErrorCodes.xlsx
* This example illustrates using Python logging to write to a log file (demo.log initialized above in the `logging.basicConfig()` command). It reinitializes the file at the beginning of the cell


In [2]:
"""
ErrorHandle is designed to make it easy to add and maintain error checks.
In errs.is_fail below, the first argument is a Boolean check that evaluates to True 
in case of error. "1" is routine-specific error code aka errs.iCodeLocal. errs.RecordErr 
method uses errs.Locn to look up errs.iCodeBase base error code in ErrorCodes.xlsx 
(errs.df_errs). It adds base error code to local error code to get lookup (aka 
errs.iCodeReport) error code and then looks up the error message. This approach makes 
it possible to just assign integers 1,2, 3 etc. to the errors in each function 
--decoupled from their iCodeReport codes in df_errs. The err_param argument is an
optional suffix that gets appended to the reported error message --in this case 
the sum that exceeds the limit of 4.
"""
#Instance the ErrrorHandle object
errs = ErrorHandle(path_err_codes, ErrMsgHeader='Procedural Demo', IsLog=True)
if errs.IsLog: errs.reset_log_file(logging.getLogger())

#This string is the lookup key for getting the base error code from ErrorCodes.xlsx
errs.Locn = 'ProceduralDemo'

# Example code
x = 2
y = 3
sum = x + y

#Check that x+ y is less than 4
if errs.is_fail(sum > 4, 1, err_param=str(sum)): errs.RecordErr()

Procedural Demo
ERROR: x + y should be less than or equal to 4. Sum: 5


### Case 2: Error Handling within a Class
* Passing Locn argument as util.current_fn() tells .is_fail to look up the error by name of the function where the error occurred
* errs is instanced in .__init__()

In [3]:
class SumAndProduct():
    def __init__(self, path_err_codes, x, y):
        self.x = x
        self.y = y
        self.sum = None
        self.product = None
        self.errs = ErrorHandle(path_err_codes, ErrMsgHeader='', IsHandle=True, IsLog=True)
        if self.errs.IsLog: self.errs.reset_log_file(logging.getLogger())

    @property
    def procedure_to_do_all_steps(self):
        self.calculate_sum()
        if not self.errs.IsErr: self.calculate_product()
        return self.sum, self.product

    def calculate_sum(self):
        self.sum = self.x + self.y
        print('Calculated sum is', self.sum)
        if self.errs.is_fail(self.sum > 6, 1, Locn=util.current_fn(), \
                             err_param=str(self.sum)): self.errs.RecordErr()

    def calculate_product(self):
        self.product = self.x * self.y
        print('Calculated product is', self.product)
        if self.errs.is_fail(self.product > 5, 1, Locn=util.current_fn(), \
                             err_param=str(self.product)): self.errs.RecordErr()

sum_inputs, prod_inputs = SumAndProduct(path_err_codes, 2, 3).procedure_to_do_all_steps

Calculated sum is 5
Calculated product is 6
ERROR: x * y should be less than or equal to 5. Product: 6


### Case 3: Performing Preflight Checks on Input Data
There are two modes for using the CheckDataFrame methods to precheck an input file. In the first, path_error_codes is the only required argument when instancing the CheckDataFrame object. Individual preflight methods require the DataFrame and possibly other optional arguments as inputs.

In [4]:
#Create a demo DataFrame
df = pd.DataFrame(data={'idx':['idx1', 'idx2', 'idx3'],
                        'col_a':[1, 2, 3], 
                        'col_b':[10, 20, 'a']})
df

Unnamed: 0,idx,col_a,col_b
0,idx1,1,10
1,idx2,2,20
2,idx3,3,a


In [5]:
"""
preflight.CheckDataFrame Mode 1 - Preflight on df passed as optional argument 
                                  to specific preflight methods
"""
#Instance a CheckDataFrame class with all default arguments
ckdf = CheckDataFrame(path_err_codes)

#Check that the DataFrame contains (at least) columns col_a and col_b
IsOk = ckdf.ContainsRequiredCols(cols_req=['col_a', 'col_b'], df=df)

#Check that DataFrame col_b is all numeric (fails because of string value 'a')
if IsOk: IsOk = ckdf.ColNumeric('col_b', df=df)

#Other programming steps or prefight checks...


ERROR: Column must contain only non-null numeric values: col_b


In [6]:
"""
preflight.CheckDataFrame Mode 2 - Data are .df attribute of a custom Table class 
                                  instance passed to CheckDataFrame as optional argument.
                                  
See libs/projtables.py for hard-coded setup of this demo.

Custom Table objects (projfiles.Table) are convenient containers for df and other 
table metadata in multi-table projects. Attributes can specify such metadata as how to 
import df and set its index. These can be initialized in tbls = projtables.ProjectTables.__init__().
In our example, this includes initialization code that sets lists for preflight checks: 

self.DemoData.required_cols = ['idx', 'col_a', 'col_b']
self.DemoData.numeric_cols = ['col_a', 'col_b']

For this example, instancing a programmatically named Table object enables referring
to the table and its data as tbls.DemoData and tbls.DemoData.df throughout the project.
"""

#Initialize project's ProjectTables (its __init__() hard codes instancing DemoData Table)
tbls = ProjectTables(files=None, lst_files=None)
tbls.DemoData.df = df

#print Table preflight list attributes and the DataFrame attribute
print('DemoData.required_cols', tbls.DemoData.required_cols)
print('DemoData.numeric_cols', tbls.DemoData.numeric_cols)
print('\n', tbls.DemoData.df)

DemoData.required_cols ['idx', 'col_a', 'col_b']
DemoData.numeric_cols ['col_a', 'col_b']

     idx  col_a col_b
0  idx1      1    10
1  idx2      2    20
2  idx3      3     a


In [7]:
#Instance a preflight.CheckDataFrame class with DemoData Table as argument
ckdf = CheckDataFrame(path_err_codes, tbl=tbls.DemoData)

#Check that DataFrame contains its required columns
IsOk = ckdf.ContainsRequiredCols()

#Default index column is hard-coded input to Table instance and can be set automatically
tbls.DemoData.ResetDefaultIndex()

#print the indexed DataFrame
tbls.DemoData.df

Unnamed: 0_level_0,col_a,col_b
idx,Unnamed: 1_level_1,Unnamed: 2_level_1
idx1,1,10
idx2,2,20
idx3,3,a


In [8]:
#Check that indexed DataFrame column values are all numeric (fails due to string value 'a')
if IsOk: IsOk = ckdf.LstColsAllNumeric()

ERROR: Column must contain only non-null numeric values: col_b
