### Design and implementation of  automated and dynamic clinical pathways for low resource settings

Geletaw Sahle, Demesewu Amenu, Girum Ketema, Frank Verbeke, Jan Cornelis, Bart Jansen

#### Goal

The aim of the research design and implementation is to:

1.  Design a hybrid and dynamic algorithm for generating CP
2.  Dynamically validating the knowledge-based CP using evidence (data-driven) i.e. dynamically validate and map the knowledge based clinical pathways with local conditions or context or tracing the history
3.  Arrange (re-arrange) the decision priority of the CP based the context such as introducing multi criteria decision analysis (probabilistic based, severity based )
4. Investigate a mechanism for potential multi-disease CPs generation

#### Importing Library

In [None]:
#import pandas
import pandas as pd 

# knowledge based indicators: Extracted from the CGs and used for a gold standard
import import_ipynb
import CG_rulesets_and_indicators

#imports secure module for creating a secure random object
import secrets      

#Import pickle Package, to reterive the saved CP model
import pickle

from sklearn.metrics import accuracy_score
from sklearn import metrics

import os.path

from imblearn.combine import SMOTETomek
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import GaussianNB
import numpy as np

from collections import Counter

### Introduction: Description of inputs, outputs and parameters

In [None]:
"""
Input - List of the presented signs and symptoms
"""
Measured_Signs_and_Symptoms_List ={} 

"""
    Potential list of signs and symptoms hasextracted from the CGs (Endorsed by WHO, 
    Ministry of Health and a group of experts)
"""
Potential_List_of_Signs_and_Symptoms = {}

"""
     An indicator (or criteria) extracted from the CGs (Expert Opinion). 
     Its used to check whether the measured symptoms satisfy the condition or not. 
     Also, used as an EXIT criteria
"""
Indicator = {}

"""
    Output -  A list of generated CP
"""
Generated_CP_LIST = {}  

"""  A list of parameters for ranking CP """
RankingParameters = {'Probability', 'Severity', 'Cost', 'Weight', 'Evidence'} 

""" A list of pruning parameters """
PruningParameters = {'Probability', 'Severity', 'Cost', 'Weight', 'Evidence'} 

Flag = False

### Indicator (or Exit Criteria)

We extracted the indicator (or Exit critieria) from the Clinical Guideline (CGs) used as a Gold Standard - through indvidual measured symptoms and combinations of measured symptoms. The indicator (or automated) is arranged in python nested dictionray structure. 

2.1. Importing the knowleged based CP - CG rulesets  

In [None]:
"""
    The CG ruleset are structured in the form of nested dictionary strucutre 
"""
import import_ipynb
import CG_rulesets_and_indicators

2.2. Assign the imported indicator to a python dictionary 

In [None]:
#assign the imported indicator to a python dictionary 
indicator = CG_rulesets_and_indicators.CG_rulesets()

2.3. Checking whether the Sample Indicators or Rulesets is proprely displayed or not

In [None]:
indicator['referralIndicator']['urgentAttention']

### Define a function to access the Knowledge Based CP rulesets (or indicators)

In [None]:
class clinicalPathways_Indicator:
    """
        A function for acessing and signaling urgent attention pathways
    """
    def get_urgentAttention_indicators(*args):
        urgentAttention_indicator = indicator['referralIndicator']['urgentAttention']
        return urgentAttention_indicator

    """
    A Function for acessing the referral Clinical Pathways indicators
    """
    def get_Convulsion_R_indicator(*args):
        convulision_indicator = indicator['referralIndicator']['convulsion']
        return convulision_indicator
    def get_severe_Pre_eclampsia_R_indicator(*args):
        pre_eclampsia_indicator= indicator['referralIndicator']['severe_pre_eclampsia']
        return pre_eclampsia_indicator
    
    def get_severe_hypertension_R_indicator(*args):
        severe_hypertension_indicator= indicator['referralIndicator']['severe_hypertension']
        return severe_hypertension_indicator
    
    def get_vaginalBleeding_R_indicator(*args):
        vaginalBleeding_indicator= indicator['referralIndicator']['vaginalBleeding']
        return vaginalBleeding_indicator
    
    def get_pretermLabour_R_indicator(*args):
        pretermLabour_indicator= indicator['referralIndicator']['pretermLabour']
        return pretermLabour_indicator
    
    def get_PROM_R_indicator(*args):
        PROM_indicator= indicator['referralIndicator']['PROM']
        return PROM_indicator
    
    def get_unsurePregnancy_R_indicator(*args):
        unsurePregnancy_indicator= indicator['referralIndicator']['unsurePregnancy']
        return unsurePregnancy_indicator

    """
    A Function for acessing the treatable CLinical Pathways indicators
    """
    def get_Convulsion_T_indicator(*args):
        convulision_indicator = indicator['treatableIndicator']['convulsion']
        return convulision_indicator
    
    def get_vaginalBleeding_T_indicator(*args):
        vaginalBleeding_indicator = indicator['treatableIndicator']['vaginalBleeding']
        return vaginalBleeding_indicator
    
    def get_pretermLabour_T_indicator(*args):
        pretermLabour_indicator = indicator['treatableIndicator']['pretermLabour']
        return pretermLabour_indicator
    
    def get_notUrgentAttention_T_indicator(*args):
        notUrgentAttention_indicator = indicator['treatableIndicator']['notUrgentAttention']
        return notUrgentAttention_indicator
    
    def get_unsurePregnancy_T_indicator(*args):
        unsurePregnancy_indicator = indicator['treatableIndicator']['unsurePregnancy']
        return unsurePregnancy_indicator
    
    """
    A Function for acessing the consideration Clinical Pathway indicators
    """
    def get_unsurePregnancy_C_indicator(*args):
        unsurePregnancy_indicator = indicator['considerationIndicator']['unsurePregnancy']
        return unsurePregnancy_indicator
    
    def get_notUrgentAttention_C_indicator(*args):
        notUrgentAttention_indicator = indicator['considerationIndicator']['notUrgentAttention']
        return notUrgentAttention_indicator

In [None]:
CPIndicator = clinicalPathways_Indicator()

In [None]:
CPIndicator.get_severe_Pre_eclampsia_R_indicator()

###  Presented (or Measured) Symptoms: Input wizard

Initialize the dominat factor: Extracted from the CGs

In [None]:
# initialize the dominat factor 
# the pregnant patient dominat factor based on the clinical guidelines 
dominat_Factors = dict([
    ('convulsion', 'yes'),
    ('diffcultyBreathing', 'yes'),     
    ('headache', 'vaginalBleeding'),
    ('vaginalBleeding', 'yes'),
    ('abdominalPain', 'yes'),
    ('swollenPainfulCalf', 'yes'),
    ('decreased_absent_FetalMovements', 'yes'),
    ('BP', ('≥140/90','≥160/110')),
    ('blurredVision', 'yes'),
    ('painfulContractions', 'yes'),
    ('Sudden_GushOFclear_or_pale_fluid', 'yes'),
    ('Temperature', '≥38°C'),
    ('headache', 'yes'),
    ('weakness', 'yes'),
    ('backPain', 'yes'),
    ('proteinuria', ('without proteinuria','≥ 1+ proteinuria')),
 ])

Entry Point Initiation: based on  dominat factor  and random choice (signs and symptoms)

In [None]:
import secrets                              # imports secure module.
cp_EntrySigns = secrets.SystemRandom()      # creates a secure random object.
presented_Signs_and_Symptoms = [] # initializing the presented signs and symptoms 
# Check the availablity of domainat factors be initiating the CP Entry
for signs, value in dominat_Factors.items():
    if "yes" in value:
        #print(signs, value) # (signs, value)
        presented_Signs_and_Symptoms.append(signs) # add the dominant presented signs into the presented signs and symptoms list

Initialize the CP Entry

In [None]:
# initialize the CP Entry 
entryNum_to_select = 4               # set the number to select here (just convention).
list_of_presented_signs = cp_EntrySigns.sample(presented_Signs_and_Symptoms, entryNum_to_select)

A CP wizard function for accepting inputs

In [None]:
def CP_Input_Wizard():
    print("Enter the presented signs and symptoms?")
    response = ''
    for Symptoms in list_of_presented_signs:
        # accepting signs and symptoms randomly
        user_input = input(Symptoms)
        responseList = user_input.split()
        
        # validate the measured sysmptoms 
        if user_input == "yes":
            Measured_Signs_and_Symptoms_List[Symptoms] = responseList
        # control and validate the user inputs 
        if Symptoms == 'Temprature' and response.lower() not in {">=38", "<38"}:
            response = input("Please enter >=38 or <38:")
        elif Symptoms == 'BP' and response.lower() not in {">=160/90", "<160/90"}:
            response = input("Please enter '>=160/90' or '<160/90':")
        elif Symptoms == 'Fever' and response.lower() not in {"yes", "no","persitant"}:
            response = input("Please enter yes, no or persitant:")
        else:
            while response.lower() not in {"yes", "no"}:                
                response = input("Please enter yes or no: ")
        # if the measured sysmptoms fullfill add to measured symptoms list

            #res.append(Symptoms)
    return Measured_Signs_and_Symptoms_List

Call to accept the measured symptoms

In [None]:
#Measured_Signs_and_Symptoms_List = CP_Input_Wizard() 

In [None]:
Measured_Signs_and_Symptoms_List

###  Explore Possible Measured Combinations

In [None]:
import itertools
from itertools import combinations

Measured_Signs_and_Symptoms_List = {
    'convulsion': 'yes', 
    'Fever' : 'yes',             
    'BP':'>=140/90', 
    'headache':'yes',
    'blurredVision':'yes', 
    'abdominalPain':'yes', 
    'bleeding':'yes',
}


In [None]:
#Measured_Signs_and_Symptoms_List['BP']

#### Process Possible Combinations for generating CP

In [None]:
"""
        A function to generate possible combinations of signs and symptoms: the function will return both indvidual 
        measured sysmptoms and possible combinations of measrued Symptoms
"""
Possiblecombinations = []
nestedList = {}
def possible_SignsandSymptoms_Combinations(value):
    noOFmeasuredSymptoms = len(value)
    for j in range(1, noOFmeasuredSymptoms+1):
        comb = combinations(value, j) 
        for i in list(comb): 
            Possiblecombinations.append(i)
    return Possiblecombinations

In [None]:
nestedList = possible_SignsandSymptoms_Combinations(Measured_Signs_and_Symptoms_List)
#nestedList

#### Checking and Validating Combination of Parameters

In [None]:
"""
    iterate over the generated combination of measured parameters nested list for indexing and validating 
    the value of each combination of measured parameters for CP generation
"""
possibleComb = {} 
def comibnationofParam(nestedList):
    # iterate the generated combination of possible parameters and generate CP
    i=0 #used for indexing the dictionary
    for subList in nestedList:
        res = {i: {k: Measured_Signs_and_Symptoms_List['BP'] if k =='BP' else 'yes' for k in subList}}
        possibleComb.update(res)
        i=i+1
    return possibleComb #return possible combination in Dictionary format

In [None]:
# if 7 measured symptoms found , incrementaly it will explore 2(7) possible combinations.. 
# presented posible combination of measured symptoms i.e.
presentedMeasuredSymptoms = comibnationofParam(nestedList)
#res # it will return index pythond dictionray list of measured symptoms (both individual and combinatio of measured symptoms)

5.3. Potential eligble measured symptoms for CP generation

In [None]:
presentedMeasuredSymptoms

### Process (or Generate) Clinical Pathways 

Intialize the generated CP Output List

In [None]:
# intialize the generated CP list
Generated_CP_LIST = pd.DataFrame({'Measured_Symptoms':[],'Urgent_Attention':'','Generated_CP':[],'Finding':[],'Evidence':[],'Prior_Prob':[],'Accuracy':[],'Pred_CP':[], 'Severity':[], 'Cost':[], 'Weight':[]})

Call the CG indicators (rulesets)

In [None]:
#Call all the knowledge based indicators
indicators = {
    'urgentIndicators':{0:clinicalPathways_Indicator.get_urgentAttention_indicators()},
    'referralIndicators':{
        0:clinicalPathways_Indicator.get_Convulsion_R_indicator(),
        1:clinicalPathways_Indicator.get_severe_Pre_eclampsia_R_indicator(),
        2:clinicalPathways_Indicator.get_severe_hypertension_R_indicator(),
        3:clinicalPathways_Indicator.get_vaginalBleeding_R_indicator(),
        4:clinicalPathways_Indicator.get_pretermLabour_R_indicator(),
        5:clinicalPathways_Indicator.get_PROM_R_indicator(),
        6:clinicalPathways_Indicator.get_unsurePregnancy_R_indicator(),
    },
    'treatableIndicators':{
        0:clinicalPathways_Indicator.get_Convulsion_T_indicator(),
        1:clinicalPathways_Indicator.get_vaginalBleeding_T_indicator(),
        2:clinicalPathways_Indicator.get_pretermLabour_T_indicator(),
        #3:clinicalPathways_Indicator.get_notUrgentAttention_T_indicator(),
        3:clinicalPathways_Indicator.get_unsurePregnancy_T_indicator(),
    }
}

In [None]:
mss  = {
    0:{"convulsion":"yes"},
    1:{"Fever":"Yes"},
    2:{"BP":">=140/90"},
    3:{"temprature":">=38"}
}
len(mss)

In [None]:
for col in mss:
    print(mss[col])

In [None]:
for i in range(0, len(indicators['urgentIndicators'][0])):
    print(indicators['urgentIndicators'][0][i])

In [None]:
for col in mss:
    for i in range(0, len(indicators['urgentIndicators'][0])):
        if mss[col] == indicators['urgentIndicators'][0][i]:
            print(mss[col])

A Function for generating Clinical Pathways

In [None]:
def clinicalPathway_generator(Generated_CP_LIST,presentedMeasuredSymptoms, indicators, Urgent_Attention, CP, finding):
    """
        Generate the clinical pathway based on the measured symptoms
        The indicators used for validating the measured symptoms and an exit criteria
    """
    # presentedMeasuredSymptoms: list of measured and combination of measured symptoms
    for index in range(0, len(presentedMeasuredSymptoms)):
        #a dictionray list of indicators (used a gold standard for evalutation)
        for j in range(0, len(indicators)):
            # check exact matching, Exit Criteria
            if dict(indicators[j], **presentedMeasuredSymptoms[index]) == indicators[j]:
            #if presentedMeasuredSymptoms[index] == indicators[j]: 
                
                #print(presentedMeasuredSymptoms[index])
                temp = presentedMeasuredSymptoms[index]
                
                #Trace the evidence from hisotry for the presentedMeasuredSymptoms[index]
            
                #generatedResult.append(presentedMeasuredSymptoms[index])
                result = pd.Series(data={'Measured_Symptoms':presentedMeasuredSymptoms[index],'Urgent_Attention':Urgent_Attention,'Generated_CP':CP,'Finding':finding, 'Evidence':'', 'Prior_Prob':'','Accuracy':'','Pred_CP':'','Severity':'', 'Cost':'', 'Weight':''}, name=len(Generated_CP_LIST))
                Generated_CP_LIST = Generated_CP_LIST.append(result)
    return Generated_CP_LIST  

In [None]:
clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['referralIndicators'][0], Urgent_Attention='Yes',CP='R',finding='Convulsion') 

A function for executing clinical pathways

In [None]:
def execute_clincalPathways(Generated_CP_LIST):
    #execute if there is any urgent conditions 
    Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['urgentIndicators'][0], Urgent_Attention='Yes',CP='NC',finding='UrgentAttention') 

    #execute referral clincal pathways
    for i in range(0, len(indicators['referralIndicators'])):
        if i == 0:
            Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['referralIndicators'][i], Urgent_Attention='Yes',CP='R',finding='Convulsion') 
        elif i == 1:
            Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['referralIndicators'][i], Urgent_Attention='Yes',CP='R',finding='severe_Pre_eclampsia')
        elif i == 2:
            Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['referralIndicators'][i], Urgent_Attention='Yes',CP='R',finding='severe_hypertension')
        elif i == 3:
            Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['referralIndicators'][i], Urgent_Attention='Yes',CP='R',finding='vaginalBleeding')
        elif i == 4:
            Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['referralIndicators'][i], Urgent_Attention='Yes',CP='R',finding='pretermLabour')
        elif i == 5:
            Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['referralIndicators'][i], Urgent_Attention='Yes',CP='R',finding='PROM')
        else:
            Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['referralIndicators'][i], Urgent_Attention='Yes',CP='R',finding='unsurePregnancy') 
            
    #execute treatable clinical pathways
    for t in range(0, len(indicators['treatableIndicators'])):
        if t == 0:
            Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['treatableIndicators'][t], Urgent_Attention='No',CP='T',finding='Convulsion') 
        elif t == 1:
            Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['treatableIndicators'][t], Urgent_Attention='No',CP='T',finding='vaginalBleeding') 
        elif t == 2:
            Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['treatableIndicators'][t], Urgent_Attention='No',CP='T',finding='pretermLabour') 
        #elif t == 3:
        #   Finding='notUrgentAttention'
        #   Generated_CP_LIST = cp_processingTest(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['treatableIndicators'][t], Urgent_Attention='No',CP='T',finding=finding)  
        else:
            Generated_CP_LIST = clinicalPathway_generator(Generated_CP_LIST, presentedMeasuredSymptoms, indicators['treatableIndicators'][t], Urgent_Attention='No',CP='T',finding='unsurePregnancy') 
    
    #execute conisderation clinical pathways i.e. multi-disease clincal pathways

    return Generated_CP_LIST

In [None]:
Generated_CP_LIST = execute_clincalPathways(Generated_CP_LIST)

In [None]:
print("The generated CP output are: ")
#Generated_CP_LIST

In [None]:
#Generated_CP_LIST['Measured_Symptoms']
Generated_CP_LIST.sort_values(by=['Generated_CP'],ascending = True)

In [None]:
type(Generated_CP_LIST)


In [None]:
# A function that converts the measured sysmptom dict into data frame format
def Transform_DictValue_to_df(Generated_CP_LIST):
    msdf = pd.DataFrame()
    for i in range(0 , len(Generated_CP_LIST['Measured_Symptoms'])):
        xx = pd.DataFrame([Generated_CP_LIST['Measured_Symptoms'][i]])
        msdf = msdf.append(xx, 'sort=False')
    return msdf

def filiter_proceed_cpdf(Generated_CP_LIST):
    cpdf = Generated_CP_LIST.filter(['Urgent_Attention','Generated_CP', 'Finding'], axis=1)
    return cpdf 
    
# merge the ms data frames and proceess data frames
def mergeProceed_msDF_and_cpDF(msdf, cpdf):
    concat_result = pd.concat([msdf, cpdf], sort=False, axis=1)
    concat_result.fillna('', inplace=True)
    return concat_result

In [None]:
msdf = Transform_DictValue_to_df(Generated_CP_LIST)
cpdf = filiter_proceed_cpdf(Generated_CP_LIST)

In [None]:
msdfCols = msdf.columns.tolist()
msdfCols

In [None]:
generatedCP_dataframe = mergeProceed_msDF_and_cpDF(msdf, cpdf)
df1 = generatedCP_dataframe
df1

In [None]:
Generated_CP_LIST.groupby('Generated_CP')['Finding'].value_counts().to_frame('Frequency')#summarised

In [None]:
def reterive_unquie_CPs(df, cols):
    cols = df.columns.tolist()
    df1 = df
    #df1['Generated_CP_Freq1'] = 1
    df4 = df1.groupby(cols).Generated_CP_Freq1.count().reset_index()
    return df4

In [None]:
reterive_unquie_CPs(df1, cols)

In [None]:
df2 = df1
cols = df2.columns.tolist()
cols

In [None]:
df2['Generated_CP_Freq'] = 1
df2 = df2.groupby(cols).Generated_CP_Freq.count().reset_index()
df2

In [None]:
print(len(df1), "generated CPs", len(df2),"unique CPs")

In [None]:
df2['Freq'] = 1
df2.groupby(['Generated_CP','Finding', 'Urgent_Attention', 'BP']).Freq.count().reset_index()
# df2.drop_duplicates()

In [None]:
#pd.crosstab(index=df1['Generated_CP'], columns=df1['Finding'])

In [None]:
#df=Generated_CP_LIST.reset_index()
#df1['Freq']=df1.groupby(by='Generated_CP')['Generated_CP'].transform('count')
#df1

In [None]:
#here is append
for i in range(0 , len(Generated_CP_LIST['Measured_Symptoms'])):
    xx = pd.DataFrame([Generated_CP_LIST['Measured_Symptoms'][i]])
    msdf = msdf.append(xx, 'sort=True')
msdf.head()

In [None]:
### here its merge
cpdf = Generated_CP_LIST.filter(['Urgent_Attention','Generated_CP', 'Finding'], axis=1)
cpdf.head()

In [None]:
concat_result = pd.concat([msdf, cpdf], sort=False, axis=1)
concat_result.fillna('', inplace=True)
concat_result

In [None]:
xx = pd.DataFrame(list(Generated_CP_LIST['Measured_Symptoms'][27].items()),columns = ['MS','Value'])
xx

In [None]:
df = pd.DataFrame(Generated_CP_LIST['Measured_Symptoms'][27].items()).T
df

In [None]:
df1 = pd.DataFrame()
df1=df
df1

In [None]:
frames = [df, df1]
result = pd.concat(frames)
result

In [None]:
for i in range(0, len(Generated_CP_LIST)):
    df = pd.DataFrame(Generated_CP_LIST['Measured_Symptoms'][i].items()) 
    df1.append(df, ignore_index=True)
df1

### Tracing Evidence

The goal is to dynamically validate the above generated knowledge-based CP using evidence (data-driven) i.e. dynamically validate and map the knowledge based clinical pathways with local conditions or context or tracing the history. 

As per our meeting (at the mid of April). We were suggesting, two approaches for tracing evidence (or validating the knowledge based pathways).

I.  During the clincal pathway generation (i.e. while executing clinicalPathway_generator function). This approaches is fine, if all the measured sysmptoms are available or presented. Otherwise, strategy II is recommended because there is a delay of measured sysmptoms result in real world implmentation.

II. After finalizing the execution of knwledge based clincal pathways. This method is ideal and applicable after  executing the knowledge based pathways, to support and validate the decisions



In [None]:
hcdata = pd.read_csv("Preg2020-Table 1_Updated.csv", encoding='utf-8')

#### Check whether an evidence is found or not for the generated pathways

In [None]:
#Copy the columns for the health center datasets
pres = pd.DataFrame(columns=hcdata.columns)
#drop the target class
pres = pres.drop('CP', axis=1)
pres

In [None]:
def clinicalPathway_generator_MethodII(Generated_CP_LIST,presentedMeasuredSymptoms, indicators, Urgent_Attention, CP, finding):
    """
        Generate the clinical pathway based on the measured symptoms
        The indicators used for validating the measured symptoms and an exit criteria
    """
    for index in range(0, len(presentedMeasuredSymptoms)):
        for j in range(0, len(indicators)):
            if presentedMeasuredSymptoms[index] == indicators[j]: # check exact matching, Exit Criteria
                #print(presentedMeasuredSymptoms[index])
                temp = presentedMeasuredSymptoms[index]
                
                #Trace the evidence from hisotry for the presentedMeasuredSymptoms[index]
                if TracingCP.check_Evidence(temp) == 'Yes':
                    Prob = TracingCP.trace_and_predict_ProbabiliticCP()
                    Evidence = 'Yes'
                else:
                    Evidence = 'No'
                    TracingCP.insert_unseen_measuredSymptoms(historicalRecords)
                #generatedResult.append(presentedMeasuredSymptoms[index])
                result = pd.Series(data={'Measured_Symptoms':presentedMeasuredSymptoms[index],'Urgent_Attention':Urgent_Attention,'CP':CP,'Finding':finding, 'Evidence':Evidence, 'Prob':Prob,'Severity':'', 'Cost':'', 'Weight':''}, name=len(Generated_CP_LIST))
                Generated_CP_LIST = Generated_CP_LIST.append(result)
    return Generated_CP_LIST  

In [None]:
class Update_CPModel():
    """
        Re-train and update the CP Model based on new records 
    """
    def splitTargetClass(data):
        X = data.drop('CP', axis=1)
        y = data['CP']
        return X,y
    
    
    def re_Train_CPModel(historicalRecords):
        """
            SMOTE and 10 cross validation gain reasonbale preformance during experimentation 
        """
        
        #Fill the missing value: not available ....
        historicalRecords=TracingCP.fill_missing_values(historicalRecords)
        
        # encode the data
        historicalRecords = historicalRecords.apply(LabelEncoder().fit_transform)
        
        # split the class
        X,y = Update_CPModel.splitTargetClass(historicalRecords)
        
        #handle the data imbalnce with
        os_us = SMOTETomek(sampling_strategy=0.5)
        X_smote_res, y_smote_res = os_us.fit_sample(X, y)

        print("10-Fold Cross-Validation: NB Accuracy with SMOTE  dataset", cross_val_score(GaussianNB(), X_smote_res, y_smote_res, cv=10), 
                     "", "Average", np.average(cross_val_score(GaussianNB(), X_smote_res, y_smote_res, cv=10)))
        
        #Save the updated model
        """                
        Update_CPModel.save_the_Updated_CPModel(CPModel)
        """
    
    def save_the_Updated_CPModel(*args):
        # Save the Modle to file in the current working directory
        Pkl_Filename = "Pickle_CP_Model.pkl"  

        with open(Pkl_Filename, 'wb') as file:  
            pickle.dump(CPModel, file)

In [None]:
class TracingCP:
    """
    """
    
    def check_Evidence(pres, *args):
        """

        """
        getColumnList=[]
        """
            Retrieving the column name
        """
        for col in Generated_CP_LIST['Measured_Symptoms'][i]:
            getColumnList.append(col)
        """
            Check the retrieved column found in the existing record or not
        """
        if set(getColumnList).issubset(pres.columns):
            Flag = "Yes"
        else:
            Flag = "No"
        return Flag
    

    def load_clinicalPathways_model(*args):
        """
            Load the clinical pathway model back from file. The model was trained and saved in the file. The model is
            reterived for predciting and caculating the posterior probablity. 
        """
        Pkl_Filename = "Pickle_CP_Model.pkl"
        if os.path.exists(Pkl_Filename):  
            try: 
                with open(Pkl_Filename, 'rb') as file:  
                    Pickled_CP_Model = pickle.load(file)
                return Pickled_CP_Model
            except EOFError:
                return "It's Empty Pickeled Model"
    
    
    def trace_and_predict_ProbabiliticCP(*args):
        """
        """
        # Call the function to fill the missing values 
        TracingCP.fill_missing_values(presented)

        #Encode the presented symptoms for calculating the probability and the target
        encoded_presented = presented.apply(LabelEncoder().fit_transform)
        #print(encoded_presented)
        
        #get the clinical pathway model
        ClinicalPathwayModel = TracingCP.load_clinicalPathways_model()
        

        #Predict using the 
        y_pred = ClinicalPathwayModel.predict(encoded_presented)
        
        #Return the mean accuracy on the given test data and labels.
        #accuracy = ClinicalPathwayModel.score(X_test, y_test)
        accuracy = ClinicalPathwayModel.score(encoded_presented, y_pred)

        #get the priors
        priorClass = ClinicalPathwayModel.class_prior_
        
        #predictive probablities 
        print("Pred_CP_Class:",y_pred,
              "Pred_Prob:", ClinicalPathwayModel.predict_proba(encoded_presented).mean(),
              "predict_log_proba:", ClinicalPathwayModel.predict_log_proba(encoded_presented).mean(),
              "Accuracy:",accuracy)
        
        return y_pred,priorClass,accuracy
    
    def insert_unseen_measuredSymptoms(historicalRecords,*args):
        """
            This function aims to insert unseen measured symptoms into the healthcenter records.
            Append on the historical records for future tracing
        """
        historicalRecords = historicalRecords.append(Generated_CP_LIST['Measured_Symptoms'][i], ignore_index=True)
        
        #call, to update the save model using the new unseen records
        Update_CPModel.re_Train_CPModel(historicalRecords)
        
        return historicalRecords.shape

    def fill_missing_values(presented):
        #Fill the missing values
        for col in presented.columns:
            # replacing na values in college with No college 
            presented[col].fillna("Notavailable", inplace = True) 
        return presented

In [None]:
TracingCP.load_clinicalPathways_model()

####  Check wether an evidence is found or not

In [None]:
for i in range(0, len(Generated_CP_LIST['Measured_Symptoms'])):
    #print(Generated_CP_LIST['Measured_Symptoms'][i])
    Evidence = TracingCP.check_Evidence(pres, Generated_CP_LIST['Measured_Symptoms'][i])
    if Generated_CP_LIST['Evidence'][i] == '':
        Generated_CP_LIST['Evidence'][i] = Evidence

In [None]:
Generated_CP_LIST

#### Tracing Probablity for the Measured Symptoms

In [None]:
from sklearn.preprocessing import LabelEncoder
#encoded_data = hcdata.apply(LabelEncoder().fit_transform)

In [None]:
#Copy the columns for the health center datasets
presented = pd.DataFrame(columns=hcdata.columns)
#drop the target class
presented = presented.drop('CP', axis=1)
presented

In [None]:
for i in range(0, len(Generated_CP_LIST['Measured_Symptoms'])):
    if Generated_CP_LIST['Evidence'][i] == 'Yes':
        #Make sure the new row is empty
        presented = presented.iloc[0:0]
        
        #get the new measured symptoms 
        new_row = pd.Series(data=Generated_CP_LIST['Measured_Symptoms'][i])
        
        #append the measured symptoms and prepare the new rows for predictions 
        presented = presented.append(new_row,ignore_index=True)
        presented.reset_index(inplace=True, drop=True)


        #fill the missing values 
        presented = TracingCP.fill_missing_values(presented)
        
        # Predic the CP for the measured symptoms
        pred_CP_Class, priorClass_Prob,accuracy = TracingCP.trace_and_predict_ProbabiliticCP(presented)
        
        #assign the new prediction class on the decision table 
        Generated_CP_LIST['Prior_Prob'][i] = priorClass_Prob.round(2)
        
        if pred_CP_Class == 1:
            Generated_CP_LIST['Pred_CP'][i] = 'T'
        else:
            Generated_CP_LIST['Pred_CP'][i] = 'R'
        
        Generated_CP_LIST['Accuracy'][i] = accuracy
    else:
        # Evidence is not found, based on the measured Symptoms
        # Add on the historical record; train and update the saved model
        historicalRecords=hcdata
        #TracingCP.insert_unseen_measuredSymptoms(historicalRecords)
        #NewpresValue = Generated_CP_LIST['Measured_Symptoms'][i]
        
        #test = test.append(NewpresValue,ignore_index=True)

In [None]:
Generated_CP_LIST

### CP Ranking

In [None]:
"""
    if IS Evidence(TRUE) then
        Visualize the CP ranking based on Evidence ; Display the ranking difference, If there is any ;
    else if IS choice Found(TRUE) then
        Process the CP ranking based on the choice such as Probability, Severity, or Cost ;
    else if IS manualWeight Found then
        Visualize the CP ranking based on the manual weight refinement ;
    else
        Process accordingly (default processing);
"""

class clinicalPathways_Ranking:
    #Generated_CP_LIST['Measured_Symptoms']
    
    def with_Evidence(Generated_CP_LIST, *args):
        #Ranking based on Evidence
        return Generated_CP_LIST[Generated_CP_LIST['Evidence']=='Yes']
    
    def without_Evidence(Generated_CP_LIST, *args):
        #print("Generated Clinical Pathways with no Evidence")
        return Generated_CP_LIST[Generated_CP_LIST['Evidence'] == 'No']

    def defaultRanking(Generated_CP_LIST, *args):
        #Default processing: First, the Treatable CP. Second, Referral CP. Lastly, NC(Not Classified)
        return Generated_CP_LIST.sort_values(by=['CP'],ascending = False)

In [None]:
RankingwithEvidence = clinicalPathways_Ranking.with_Evidence(Generated_CP_LIST)
RankingwithEvidence

In [None]:
RankingwithoutEvidence = clinicalPathways_Ranking.without_Evidence(Generated_CP_LIST)
RankingwithoutEvidence

### CP Prunning

In [None]:
"""
    if generated CP LIST Is EMPTY then
        Go to Algorithm-5 and adjust the Criteria (fall back on CP ranking). Adjust and eliminate one ;
    else if too much pruning (TRUE) then 
        Display eventual warning ;
    else
        Process the pruning based on the pruning parameters ;
        If it FULFILL the endorsed indicator, EXIT;
    end
"""
class clinicalPathways_Pruning:
    def default(*args):
        #the goal is to fliter out referral pathways quickly to minimize delay
        return Generated_CP_LIST.loc[(Generated_CP_LIST['Evidence']>='Yes') & (Generated_CP_LIST['Generated_CP'] == 'R')]
    
    def using_Urgent_Attention(*args):
        #Quickly identifies urgent attention based on CGs (Gold Standard)
        try:
            return Generated_CP_LIST[Generated_CP_LIST['Urgent_Attention'] == 'Yes']
        except Exception as e:
            print('There was an error in your urgent attention value, The value is empty :{0}'.format(e))
    
    def using_Evidence(*args):
        try:
            return Generated_CP_LIST[Generated_CP_LIST['Evidence'] == 'No']
        except Exception as e:
            print('There was an error in your evidence value, The value is empty :{0}'.format(e))
    
    def using_Prob(*args):
        return Generated_CP_LIST[Generated_CP_LIST['Prob'] >= 0.5]
    
    def using_Severity(*args):
        try:
            return Generated_CP_LIST[Generated_CP_LIST['Severity'] >= 0.5]
        except Exception as e:
            print('There was an error in your severity value, The value is empty:{0}'.format(e))

In [None]:
clinicalPathways_Pruning.default()

In [None]:
evd = Generated_CP_LIST.loc[(Generated_CP_LIST['Evidence']>='Yes') & (Generated_CP_LIST['Generated_CP'] == 'R')& (Generated_CP_LIST['Pred_CP'] == 'T')]
evd

### Output: Match Analysis

In [None]:
"""
    The exact match between dynamically generated CP and data-driven CP and their variation.
"""
def macthAnalysis(output, KB_CP, DD_CP):
    evdR = output.loc[(output['Evidence']>='Yes') & (output['Generated_CP'] == KB_CP )& (output['Pred_CP'] == DD_CP)] 
    return evdR

In [None]:
macthAnalysis(Generated_CP_LIST, "R","T")

In [None]:
Generated_CP_LIST

In [None]:
len(Generated_CP_LIST['Measured_Symptoms'][7])

In [None]:
for i in range(0, len(Generated_CP_LIST['Measured_Symptoms'])):
    if Generated_CP_LIST['Evidence'][i] == 'Yes':
        # print(Generated_CP_LIST['Measured_Symptoms'][i])
        new_row = pd.Series(data=Generated_CP_LIST['Measured_Symptoms'][i])
        presented = presented.append(new_row,ignore_index=True)
        presented.reset_index(inplace=True, drop=True)
        TracingCP.trace_and_predict_ProbabiliticCP(presented)

In [None]:
Generated_CP_LIST

In [None]:
new_row = pd.Series(data=Generated_CP_LIST['Measured_Symptoms'][4])
presented = presented.append(new_row,ignore_index=True)
presented.reset_index(inplace=True, drop=True)

In [None]:
TracingCP.trace_and_predict_ProbabiliticCP(presented)

In [None]:
for i in range(0, len(Generated_CP_LIST['Measured_Symptoms'])):
    #print(check_Column(Generated_CP_LIST['Measured_Symptoms'][i]))
    Evidence = TracingCP.check_Evidence(pres, Generated_CP_LIST['Measured_Symptoms'][i])
    Generated_CP_LIST['Evidence'][i] = Evidence
    """
        If evidence is found in the historical record using the measured symptoms
        Retrieve the pre-train clinical pathway model for prediction and tracing the probability for the presented measured symptoms.
    """
    if Evidence == 'Yes' and Generated_CP_LIST['Prob'][i] == '': 
        Generated_CP_LIST['Prob'][i]=y_pred
        #call a function for tracing the prepability 
    else:
        """
            Add the measured symptoms on the existing dataset
            Train and update the Clinical Pathway Model 
            Update the the Clinical Pathway Model
        """
        #call the function for adding the unseen records
        pres = TracingCP.insert_unseen_measuredSymptoms(pres, Generated_CP_LIST['Measured_Symptoms'][i])
        
        #pres = pres.append(Generated_CP_LIST['Measured_Symptoms'][i], ignore_index=True)
        # re-call the clinical pathway model

In [None]:
pres = TracingCP.fill_missing_values(pres)

In [None]:
pres

In [None]:
pres = pres.iloc[0:0]
pres= pres.drop(['convulsion', 'blurredVision'], axis=1)
pres

In [None]:
pre

In [None]:
pre= pre.drop(['convulsion', 'blurredVision'], axis=1)

In [None]:
pre[0:0]

In [None]:
Generated_CP_LIST['Measured_Symptoms'][5]

In [None]:
def dyna_join(df, positions):
    return pd.concat([df, df.iloc[:, positions].apply(','.join, 1).rename('new_col')], axis=1)
dyna_join(pre, [0, -2])
pre[0:0]

In [None]:
for i in range(0, len(Generated_CP_LIST['Measured_Symptoms'])):
    #print(Generated_CP_LIST['Measured_Symptoms'][i])
    Evidence = check_Evidence(Generated_CP_LIST['Measured_Symptoms'][i])
    if Evidence=='No':
        """
            #ADD: if the measured symptoms is not avilable; 
            #Append on the historical records for future tracing
        """
        pre = pre.append(Generated_CP_LIST['Measured_Symptoms'][i], ignore_index=True)
        """
            Fill a value of not available ....
        """
        fill_missing_values(pre)

In [None]:
pre = insert_unseen_measuredSymptoms(hcdata, ms)
pre

In [None]:
"""    
    If the measured symptoms are found in the historical record: 
        Retrieve the measured sysmptoms from Generated_CP_LIST 
        Call and retrieve the pre-train clinical pathway model for prediction and tracingthe probability.
    else
        Add the measured symptoms in the historical record as new information 
        Train and update the Clinical Pathway Model 
        Save the Clinical Pathway Model
"""

# If all the columns are avilable
pre = pre.append(Generated_CP_LIST['Measured_Symptoms'][1], ignore_index=True)
pre['BP']

In [None]:
Generated_CP_LIST

In [None]:
def fill_missing_values(presented):
    #Fill the missing values
    for col in presented.columns:
        # replacing na values in college with No college 
        presented[col].fillna("Notavailable", inplace = True) 
    return presented[col]

In [None]:
fill_missing_values(pre)
pre

In [None]:
test = pd.DataFrame(columns=hcdata.columns)

Generated_CP_LIST['Measured_Symptoms'][0]

In [None]:
test

In [None]:
#ADD: if the column is not avilable; append the record on the historical records
test = test.append(Generated_CP_LIST['Measured_Symptoms'][0], ignore_index=True)
fill_missing_values(test)

In [None]:
test

In [None]:
ClinicalPathwayModel = load_clinicalPathways_model()
#ClinicalPathwayModel

In [None]:
#delete all rows in pandas
pre=pre[0:0]
pre

In [None]:
#Encode the presented symptoms for calculating the probability and the target
encoded_presented = pre.apply(LabelEncoder().fit_transform)
#print(encoded_presented)

#Predict using the 
y_pred = ClinicalPathwayModel.predict(encoded_presented)
y_pred

In [None]:
def check_Evidence(*args):
    """
    
    """
    getColumnList=[]
    """
        Retrieving the column name
    """
    for col in Generated_CP_LIST['Measured_Symptoms'][i]:
        getColumnList.append(col)
    """
        Check the retrieved column found in the existing record or not
    """
    if set(getColumnList).issubset(pres.columns):
        Flag = "Yes"
    else:
        Flag = "No"
    return Flag

In [None]:
#define a function for checking evidence 
def getEvidence(hcdata,measuredSymptoms, Evidence='No'):
    """
        This function checks whether  an evidence or not 
    """
    if len(measuredSymptoms)>1:
        for key, val in measuredSymptoms.items():
            if key in hcdata.columns:
                Evidence='Yes'
    else:
        key, val = next(iter(measuredSymptoms.items())) 
        if key in hcdata.columns:
            Evidence='Yes'
    return key, val, Evidence

In [None]:
for col in range(0, len(Generated_CP_LIST['Measured_Symptoms'])):
    key, val, Evidence= getEvidence(hcdata, Generated_CP_LIST['Measured_Symptoms'][col])
    Generated_CP_LIST['Evidence'][col] = Evidence

In [None]:
#Result
Generated_CP_LIST

In [None]:
# Select those only have evidence
#Generated_CP_LIST[Generated_CP_LIST['Evidence']=='Yes']

#Generated_CP_LIST[Generated_CP_LIST['Evidence'] == 'No']
hcdata.head(1)

In [None]:
def fill_missing_values(presented):
    #Fill the missing values
    for col in presented.columns:
        # replacing na values in college with No college 
        presented[col].fillna("Notavailable", inplace = True) 
    return presented[col]

In [None]:
#Copy the columns for the health center datasets
pre = pd.DataFrame(columns=hcdata.columns)
#drop the target class
pre = pre.drop('CP', axis=1)

In [None]:
#delete all rows in pandas
pre=pre[0:0]
pre

In [None]:
if set(getColumnList).issubset(pre.columns):
    print("Yes")
else:
    print("No")

In [None]:
for col in temp:
    #print(col, temp[col])
    if col in pre:
        Flag = False
    else:
        Flag = True

In [None]:
#check whether the given column is found the historical record or not
Flag

In [None]:
# if all the columns are avilable
pre = pre.append(temp, ignore_index=True)
pre

In [None]:
temp = Generated_CP_LIST['Measured_Symptoms'][5]
temp

In [None]:
for col in Generated_CP_LIST['Measured_Symptoms']:
    pre = pre.append(col)

In [None]:
pre

In [None]:
pres = pres.append(temp, ignore_index=True)

In [None]:
# Converting into list of tuple 
#list = [(k, v) for k, v in temp.items()] 
for i in temp:
    print(i, temp[i])

In [None]:
for i in Generated_CP_LIST['Measured_Symptoms'][5]:
    print(i, Generated_CP_LIST['Measured_Symptoms'][5][i])

In [None]:
#pres
for col in Generated_CP_LIST['Measured_Symptoms']:
    #print(col)
    for i in col:
        #print(i, col[i])
        if i in pres.columns:

In [None]:
pres

In [None]:
new_row = pd.Series(data=Generated_CP_LIST['Measured_Symptoms'][4])
presented = presented.append(new_row,ignore_index=True)
presented.reset_index(inplace=True, drop=True)

# Call the function to fill the missing values 
fill_missing_values(presented)

#Encode the presented symptoms for calculating the probability and the target
encoded_presented = presented.apply(LabelEncoder().fit_transform)
#print(encoded_presented)

#Predict using the 
y_pred = ClinicalPathwayModel.predict(encoded_presented)
y_pred

In [None]:
measuredSymptoms = Generated_CP_LIST['Measured_Symptoms'][5]
#print(len(measuredSymptoms))
#new_row = pd.Series(measuredSymptoms)
#print(new_row)
#new_row['headache'], new_row['BP']
temp = measuredSymptoms.keys()
temp

In [None]:
#Copy the columns for the health center datasets
pres = pd.DataFrame(columns=hcdata.columns)
#drop the target class
pres = pres.drop('CP', axis=1)
pres

In [None]:
for col in range (0, len(Generated_CP_LIST['Measured_Symptoms'])):
    # print(Generated_CP_LIST['Measured_Symptoms'][col])
    temp = pd.Series(Generated_CP_LIST['Measured_Symptoms'][col])

In [None]:
new_row.values

In [None]:
#Create a dataframe for building the model 
for col in range (0, len(Generated_CP_LIST['Measured_Symptoms'])):
    new_row = pd.Series(data=Generated_CP_LIST['Measured_Symptoms'][col])
    print(new_row)
    presented = presented.append(new_row,ignore_index=True)
    presented.reset_index(inplace=True, drop=True)
    print(new_row)
    # Call the function to fill the missing values 
    fill_missing_values(presented)
    #Print(presented)
    
    #Encode the presented symptoms for calculating the probability and the target
    encoded_presented = presented.apply(LabelEncoder().fit_transform)
    #print(encoded_presented)
    
    #predict the target class
    y_pred = CPmodel.predict(encoded_presented)
    y_pred

In [None]:
class traceEvidence1:
    #Probabilistic Evidence Table 
    def calculate_Probability(key):
        """
            Calculating the probablity (referral or treated) based on historical evidence
        """
        df_s = hcdata.groupby(key)['CP'].value_counts() / hcdata.groupby(key)['CP'].count()
        df_f = df_s.reset_index(name='Probability')
        probEvidenceTable = df_f[df_f.values == 'Yes']
        return probEvidenceTable
    
    #get the calculated probablity
    def get_R_Probability(key):
        probEvidenceTable = traceEvidence1.calculate_Probability(key)
        if 'Refer' in probEvidenceTable.CP.values:
            prob = probEvidenceTable[probEvidenceTable['CP'] == 'Refer'][['Probability']].values
        else: 
            prob = 0
        return prob
    def get_T_Probability(key):
        probEvidenceTable = traceEvidence1.calculate_Probability(key)
        if ['Treated' in probEvidenceTable.CP.values]:
            prob = probEvidenceTable[probEvidenceTable['CP'] == 'Treated'][['Probability']].values
        return prob

In [None]:
for col in range(0, len(Generated_CP_LIST['Measured_Symptoms'])):
    key, val, Evidence= getEvidence(hcdata, Generated_CP_LIST['Measured_Symptoms'][col])
    print(key)
    if key in hcdata.columns:
        if Generated_CP_LIST['CP'][col] == 'R':
            prob = traceEvidence1.get_R_Probability(key)
        if Generated_CP_LIST['CP'][col] == 'T':
            prob = traceEvidence1.get_T_Probability(key)
    else:
        prob=0
    #Generated_CP_LIST['Prob'][col] = prob

In [None]:
#traceEvidence1.get_R_Probability('headache')
key = 'headache'
#hcdata.groupby(key)['CP'].value_counts() / hcdata.groupby(key)['CP'].count()

In [None]:
#problem where there is a logical and
Generated_CP_LIST

for col in range (0, len(Generated_CP_LIST['Measured_Symptoms'])):
    presented = pd.DataFrame(columns=hcdata.columns)
    new_row = pd.Series(data=presentedMeasuredSymptoms[4])
    presented = presented.append(new_row,ignore_index=True)
    presented.reset_index(inplace=True, drop=True)
    presented

# Tracing Evidence

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

#### Tracing the evidence based on the measured symptoms  

In [None]:
#hcdata = pd.read_csv("Preg2020-Table 1.csv", encoding='utf-8')
hcdata = pd.read_csv("Preg2020-Table 1_Updated.csv", encoding='utf-8')

In [None]:
hcdata.shape

In [None]:
# Group data by CP and summarize disease name 
hcdata.groupby(["CP"])[["CP"]].describe()

In [None]:
# Function to calculate missing values by column
def missing_values_table(df):
   
    # Total missing values
    mis_val = df.isnull().sum()
    
    # Percentage of missing values
    mis_val_percent = 100 * df.isnull().sum() / len(df)
    
    # Make a table with the results
    mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
    
    # Rename the columns
    mis_val_table_ren_columns = mis_val_table.rename(
    columns = {0 : 'Missing Values', 1 : '% of Total Values'})
    
    # Sort the table by percentage of missing descending
    # .iloc[:, 1]!= 0: filter on missing missing values not equal to zero
    mis_val_table_ren_columns = mis_val_table_ren_columns[
        mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
    '% of Total Values', ascending=False).round(2)  # round(2), keep 2 digits
    
    # Print some summary information
    print("Your slelected dataframe has {} columns.".format(df.shape[1]) + '\n' + 
    "There are {} columns that have missing values.".format(mis_val_table_ren_columns.shape[0]))
    
    # Return the dataframe with missing information
    return mis_val_table_ren_columns

In [None]:
missing_values_table(hcdata)

In [None]:
def fill_missing_values(presented):
    #Fill the missing values
    for col in presented.columns:
        # replacing na values in college with No college 
        presented[col].fillna("Notavailable", inplace = True) 
    return presented[col]

In [None]:
fill_missing_values(hcdata)
missing_values_table(hcdata)

In [None]:
#old data 
from sklearn.preprocessing import LabelEncoder
encoded_data = hcdata.apply(LabelEncoder().fit_transform)
encoded_data.head(2)
#len(encoded_data)

In [None]:
from sklearn.naive_bayes import GaussianNB
import numpy as np

In [None]:
def splitTargetClass(data):
    X = data.drop('CP', axis=1)
    y = data['CP']
    return X,y

In [None]:
X,y=splitTargetClass(encoded_data)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42)
CPmodel = GaussianNB()
CPmodel.fit(X_train,y_train)
y_pred = CPmodel.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)*100

In [None]:
accuracy

#### Save the model for future use

In [None]:
#Import pickle Package
import pickle

In [None]:
# Save the Modle to file in the current working directory
Pkl_Filename = "Pickle_CP_Model.pkl"  

with open(Pkl_Filename, 'wb') as file:  
    pickle.dump(CPmodel, file)

In [None]:
# Load the Model back from file
with open(Pkl_Filename, 'rb') as file:  
    Pickled_CP_Model = pickle.load(file)

Pickled_CP_Model

### Tracing the evidence based on the new measured symptoms for predicting CP class (Referral or Treated) 

In [None]:
def fill_missing_values(presented):
    #Fill the missing values
    for col in presented.columns:
        # replacing na values in college with No college 
        presented[col].fillna("Notavailable", inplace = True) 
    return presented[col]

In [None]:
#Copy the columns for the health center datasets
presented = pd.DataFrame(columns=hcdata.columns)
#drop the target class
presented = presented.drop('CP', axis=1)
presented

In [None]:
ms = dict(Generated_CP_LIST['Measured_Symptoms'])

In [None]:
#Generated_CP_LIST

In [None]:
#calculating evidence for the presented measured symptoms
Generated_CP_LIST['Measured_Symptoms'][4]

In [None]:
Generated_CP_LIST['Measured_Symptoms']

In [None]:
def checkEvidence(measuredSymptoms, Evidence='No'):
    """
        This function checks whether there is an evidence or not for the presented symptoms
    """
    if len(measuredSymptoms)>1:
        for key, val in measuredSymptoms.items():
            if key in hcdata.columns:
                Evidence='Yes'
    else:
        key, val = next(iter(measuredSymptoms.items())) 
        if key in hcdata.columns:
            Evidence='Yes'
    return key, val, Evidence

In [None]:
key, val, evidence = checkEvidence(Generated_CP_LIST['Measured_Symptoms'][4], Evidence='No')
key, val, evidence

In [None]:
hcdata.groupby(key).size().div(len(hcdata))

In [None]:
df_s = hcdata.groupby(key)['CP'].value_counts() / hcdata.groupby(key)['CP'].count()
df_f = df_s.reset_index(name='Probability')
probEvidence = df_f[df_f.values == 'Yes']
probEvidence

In [None]:
#probEvidence['abdominalPain']

In [None]:
#probEvidence.get_value(5, 'CP')
treatedPro = probEvidence[probEvidence['CP'] == 'Treated'][['Probability']].values
referralPro = probEvidence[probEvidence['CP'] == 'Refer'][['Probability']].values

treatedPro,referralPro

In [None]:
key

In [None]:
hcdata.groupby(key)['CP'].value_counts() / hcdata.groupby(key)['CP'].count()
df_s.reset_index(name='Probability')

In [None]:
key, val # Explore the referral and treatable evidence [Prior probability]

In [None]:
class traceEvidence:
    #Probabilistic Evidence Table 
    def calculate_Probability(key):
        """
            Calculating the probablity (referral or treated) based on historical evidence
        """
        df_s = hcdata.groupby(key)['CP'].value_counts() / hcdata.groupby(key)['CP'].count()
        df_f = df_s.reset_index(name='Probability')
        probEvidenceTable = df_f[df_f.values == 'Yes']
        return probEvidenceTable
    
    #get the calculated probablity
    def get_Probability(*args):
        probEvidenceTable = traceEvidence.calculate_Probability(key)
        if 'Treated' in probEvidence.CP.values:
            treatedPro = probEvidence[probEvidence['CP'] == 'Treated'][['Probability']].values
        if 'Refer' in probEvidence.CP.values:
            referralPro = probEvidence[probEvidence['CP'] == 'Refer'][['Probability']].values
        return referralPro, treatedPro

In [None]:
referralPro, treatedPro = traceEvidence.get_Probability(key)

In [None]:
referralPro, treatedPro

In [None]:
if 'Treated' in probEvidence.CP.values:
    treatedPro = probEvidence[probEvidence['CP'] == 'Treated'][['Probability']].values
    print(treatedPro) 

In [None]:
if 'Refer' in probEvidence.CP.values:
    treatedPro = probEvidence[probEvidence['CP'] == 'Refer'][['Probability']].values
    print(treatedPro) 

In [None]:
treatedPro = probEvidence[probEvidence['CP'] == 'Treated'][['Probability']].values
treatedPro 

In [None]:
referralPro = probEvidence[probEvidence['CP'] == 'Refer'][['Probability']].values
referralPro

In [None]:
def get_specific_ClinicalPathways(measuredSymptoms,*args):
    #execute speicifc referral or treatable pathways based on the measured sysmptoms
    ans = "No path is found based on the measured symptoms"
    treatedDF = Generated_CP_LIST[(Generated_CP_LIST['Measured_Symptoms']==measuredSymptoms) & 
                        (Generated_CP_LIST['CP'] == 'T') ]
    refferalDF = Generated_CP_LIST[(Generated_CP_LIST['Measured_Symptoms']==measuredSymptoms) & 
                        (Generated_CP_LIST['CP'] == 'R') ]
    if len(treatedDF) or len(refferalDF) == 0:
        return measuredSymptoms, ans
    else:
        return treatedDF, referralDF

In [None]:
#get the measured sysmptoms, the value and availablity of evidence
key, val, evidence = checkEvidence(vv)
key, val, evidence 

#### Prior probablity for the measured signs and symptoms

In [None]:
for k in range(0, len(Generated_CP_LIST['Measured_Symptoms'])):
    key, val, evidence = checkEvidence(Generated_CP_LIST['Measured_Symptoms'][k])
    if evidence == 'Yes':
        print(key, hcdata.groupby(key).size().div(len(hcdata)))
    else:
        print(key, "has no evidence in the given dataset")

In [None]:
# Def a function to iterate and return the measured symptoms 
def availableEvidence(measuredSymptoms):
    #if it's nested
    if len(measuredSymptoms)>1:
        for key, val in measuredSymptoms.items():
            return key
    else:
        key, val = next(iter(measuredSymptoms.items())) 
    return key

In [None]:
for k in range(0, len(Generated_CP_LIST['Measured_Symptoms'])):
    ms = availableEvidence(Generated_CP_LIST['Measured_Symptoms'][k])
    if ms in hcdata.columns:
        print(ms,"Yes")
    else:
        print(ms, "No")

In [None]:
availableEvidence(Generated_CP_LIST['Measured_Symptoms'][1])

In [None]:
if len(Generated_CP_LIST['Measured_Symptoms'][5])>1:
    temp = dict(Generated_CP_LIST['Measured_Symptoms'][5])

In [None]:
temp

In [None]:
def Evidence(measuredSymptoms):
    if measuredSymptoms in hcdata.columns:
        print("True")

In [None]:
Evidence(Generated_CP_LIST['Measured_Symptoms'][4])

In [None]:

#Sample Test for indvidual signs and symptoms
new_row = pd.Series(data=Generated_CP_LIST['Measured_Symptoms'][4])
presented = presented.append(new_row,ignore_index=True)
presented.reset_index(inplace=True, drop=True)
# Call the function to fill the missing values 
fill_missing_values(presented)
#Encode the presented symptoms for calculating the probability and the target
encoded_presented = presented.apply(LabelEncoder().fit_transform)
#print(encoded_presented)

#Predict using the 
y_pred = Pickled_CP_Model.predict(encoded_presented)

In [None]:
y_pred

In [None]:
for col in range (0, len(Generated_CP_LIST['Measured_Symptoms'])):
    new_row = pd.Series(data=Generated_CP_LIST['Measured_Symptoms'][col])
    presented = presented.append(new_row,ignore_index=True)
    presented.reset_index(inplace=True, drop=True)
    # Call the function to fill the missing values 
    fill_missing_values(presented)
    #Encode the presented symptoms for calculating the probability and the target
    encoded_presented = presented.apply(LabelEncoder().fit_transform)
    #print(encoded_presented)
    #y_pred = CPmodel.predict(encoded_presented)
encoded_presented.shape

In [None]:
#Create a dataframe for building the model 
for col in range (0, len(Generated_CP_LIST['Measured_Symptoms'])):
    new_row = pd.Series(data=Generated_CP_LIST['Measured_Symptoms'][col])
    print(new_row)
    presented = presented.append(new_row,ignore_index=True)
    presented.reset_index(inplace=True, drop=True)
    print(new_row)
    # Call the function to fill the missing values 
    fill_missing_values(presented)
    #Print(presented)
    
    #Encode the presented symptoms for calculating the probability and the target
    encoded_presented = presented.apply(LabelEncoder().fit_transform)
    #print(encoded_presented)
    
    #predict the target class
    y_pred = CPmodel.predict(encoded_presented)
    y_pred

In [None]:
# Creating a new dataset for predicting the inputs
presented = pd.DataFrame(columns=hcdata.columns)
new_row = pd.Series(data={'Category':'Pregnancy', 'Type':'1rst ANC Visit', 'Status':'<16 weeks gestation','Headache':'Yes'}, name='x')
presented = presented.append(new_row)
presented.reset_index(inplace=True, drop=True)
presented

In [None]:
presented = pd.DataFrame(columns=hcdata.columns)
new_row = pd.Series(data=presentedMeasuredSymptoms[4])
presented = presented.append(new_row,ignore_index=True)
presented.reset_index(inplace=True, drop=True)
presented

In [None]:
#Fill the missing values
for col in presented.columns:
    # replacing na values in college with No college 
    presented[col].fillna("Notavailable", inplace = True) 

In [None]:
presented = presented.drop('CP', axis=1)
presented

In [None]:
encoded_presented = presented.apply(LabelEncoder().fit_transform)
encoded_presented

In [None]:
y_pred = CPmodel.predict(encoded_presented)
y_pred

In [None]:
# Function to calculate missing values by column
def missing_values_table(df):
   
    # Total missing values
    mis_val = df.isnull().sum()
    
    # Percentage of missing values
    mis_val_percent = 100 * df.isnull().sum() / len(df)
    
    # Make a table with the results
    mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
    
    # Rename the columns
    mis_val_table_ren_columns = mis_val_table.rename(
    columns = {0 : 'Missing Values', 1 : '% of Total Values'})
    
    # Sort the table by percentage of missing descending
    # .iloc[:, 1]!= 0: filter on missing missing values not equal to zero
    mis_val_table_ren_columns = mis_val_table_ren_columns[
        mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
    '% of Total Values', ascending=False).round(2)  # round(2), keep 2 digits
    
    # Print some summary information
    print("Your slelected dataframe has {} columns.".format(df.shape[1]) + '\n' + 
    "There are {} columns that have missing values.".format(mis_val_table_ren_columns.shape[0]))
    
    # Return the dataframe with missing information
    return mis_val_table_ren_columns

In [None]:
missing_values_table(hcdata)

In [None]:
hcdata.head(2)

In [None]:
presentedSymptomPro = hcdata.groupby('abdominalPain').size().div(len(hcdata))
presentedSymptomPro

In [None]:
prob = hcdata.groupby(['CP','abdominalPain']).size().div(len(hcdata)).div(presentedSymptomPro,axis=0,level='abdominalPain')
prob

In [None]:
hcdata.groupby('abdominalPain').count()['CP'] / len(hcdata)

In [None]:
(hcdata.groupby(['abdominalPain', 'CP']).count() / hcdata.groupby('abdominalPain').count())['Category']


In [None]:
df_s = hcdata.groupby('abdominalPain')['CP'].value_counts() / hcdata.groupby('abdominalPain')['CP'].count()
df_f = df_s.reset_index(name='Probability')
probEvidence = df_f[df_f.values == 'Yes']
#df_f.head()  # your conditional probability table

In [None]:
probEvidence

In [None]:
def calculate_Probability(hcdata, measuredSymptoms, CP):
        # Calculating the probability from historical records
        probDF = hcdata.groupby(measuredSymptoms)[CP].value_counts() / hcdata.groupby(measuredSymptoms)[CP].count()
        #creating a proability column
        probDF = probDF.reset_index(name='Probability')
        
        #select and return the measured sysmptoms  probabilistic evidences only
        probEvidence = probDF[probDF.values == 'Yes']
        return probEvidence

In [None]:
probEvidence = calculate_Probability(hcdata, 'abdominalPain', 'CP')
probEvidence

In [None]:
referalProb = probEvidence[probEvidence.values == 'Refer']['Probability']
referalProb = referalProb.values[0]
referalProb

In [None]:
treatedProb = probEvidence[probEvidence.values == 'Treated']['Probability']
referalProb = treatedProb.values[0]
referalProb

In [None]:
"""cp_means = hcdata.groupby('CP').count().mean() #var()
cp_means"""

In [None]:
data,Class=splitTargetClass(hcdata)

P(Class|HCData)=P(HCData|Class)*P(Class)/P(HCData)

where:
1. Class is a particular CP class (e.g. refer or treated)
2. HCData is the health center data and features
3. p(class∣HCData)is called the posterior
4. p(HCData|class) is called the likelihood
5. p(Class)is called the prior
6. p(HCData) is called the marginal probability

#### Calculating the Prior
Prior = P(CP) i.e. how many times (Referral or Treated) appears/ total observations

P(CP= Referral) P(CP=Treated)

In [None]:
def priorProbability(data, Class):
    prior = data.groupby(Class).size().div(len(data)) 
    return prior

In [None]:
prior = priorProbability(hcdata,Class)
prior

#### Calculating Likelihood: 
Likelihood is generated for each of the features of the health center dataset. THe likelihood is probability of finding each feature given CP class label.

In [None]:
def likelihood(data, Class):
    for col in data.columns: 
        likelihood = {}
        likelihood[col] = data.groupby([Class, col]).size().div(len(data)).div(prior)
        #print(likelihood)
    return likelihood

In [None]:
likelihood = likelihood(data, Class)
likelihood

In [None]:
def posteriorProbablity():
    # Probability that the person will refer to the nearest 
    p_referral = likelihood['Age']['yes']['<=30'] * likelihood['Income']['yes']['medium'] * \
            likelihood['Student']['yes']['yes'] * likelihood['Credit_Rating']['yes']['fair'] \
            * prior['yes']

    # Probability that the person will treated in the health center 
    p_no = likelihood['Age']['no']['<=30'] * likelihood['Income']['no']['medium'] * \
           likelihood['Student']['no']['yes'] * likelihood['Credit_Rating']['no']['fair'] \
           * prior['no']

    print ('Yes : ', p_yes)
    print ('No :  ', p_no)

In [None]:
prior = hcdata.groupby('CP').size().div(len(hcdata)) 
prior

In [None]:
likelihood['abdominalPain'] = hcdata.groupby(['CP', 'abdominalPain']).size().div(len(hcdata)).div(prior)

In [None]:
likelihood
# Headache == 'Yes'

P(Class|features)=P(features|class)P(class)/P(features)

P(Class|HCData)=P(HCData|Class)*P(Class)/P(HCData)

where:
1. Class is a particular CP class (e.g. refer or treated)
2. HCData is the health center data and features
3. p(class∣HCData)is called the posterior
4. p(HCData|class) is called the likelihood
5. p(Class)is called the prior
6. p(HCData)is called the marginal probability

#### Prior 

In [None]:
# Number of Referral
n_referral = hcdata['CP'][hcdata['CP'] == 'Refer'].count()

# Number of Treated
n_treated = hcdata['CP'][hcdata['CP'] == 'Treated'].count()

# Total rows
total_ppl = hcdata['CP'].count()

In [None]:
# Number of referral divided by the total rows
P_referral = n_referral/total_ppl

# Number of treated divided by the total rows
P_treated = n_treated/total_ppl

In [None]:
P_referral, P_treated

#### Likelihood

In [None]:
# Group the data by CP and calculate the means of each feature
hcdata_means = hcdata.groupby('CP').count().mean()

In [None]:
# Group the data by CP and calculate the variance of each feature
hcdata_variance = hcdata.groupby('CP').count().var()

In [None]:
# Means for Treated
# feature_mean = hcdata_means['AbdominalPain'][hcdata_variance.index == 'Treated'].values[0]

In [None]:
hcdata.head(2)

In [None]:
from sklearn.preprocessing import LabelEncoder
encoded_data = hcdata.apply(LabelEncoder().fit_transform)

In [None]:
encoded_data.head(2)

In [None]:
from sklearn.naive_bayes import GaussianNB
import numpy as np

In [None]:
clf = GaussianNB()
clf.fit(encoded_data.drop(['CP'], axis=1), encoded_data['CP'])

In [None]:
# divide the dataset in train test using scikit learn
# now the model will train in training dataset and then we will use test dataset to predict its accuracy

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [None]:
# now preparing our model as per Gaussian Naive Bayesian

from sklearn.naive_bayes import GaussianNB

model = GaussianNB().fit(X_train, y_train) #fitting our model

In [None]:
predicted_y = model.predict(X_test) #now predicting our model to our test dataset

In [None]:
from sklearn.metrics import accuracy_score
# now calculating that how much accurate our model is with comparing our predicted values and y_test values
accuracy_score = accuracy_score(y_test, predicted_y) 
print (accuracy_score)

In [None]:
NewPerson = pd.DataFrame()

In [None]:
NewPerson['Headache'] = ['Yes']
NewPerson['Category'] = ['Test']
NewPerson

NewPerson_data = NewPerson.apply(LabelEncoder().fit_transform)
NewPerson_data

In [None]:
# the data is stored in Datadrame person
predicted_y = model.predict(NewPerson_data)

In [None]:
predicted_y