# Skeleton Notebook:

**Context**: After an important document from the Generic Space Empire (GSC) was stolen, Non-Copyrighted Alien spies began complex attacks against the prize of the GSC, the space station. These attacks are more complex than before, requiring a compromising of a combination of components, often distributed across workstations. In order to defend against these attacks, the GSC has hired our team to develop a methodology to characterize the faults found within the system after an attack. 

**Motivation**: Mode is a term for behavior of a system. All components in a system have behaviors associated with modes, as well as modes that are unknown to the component. Mode estimation takes in the mode, state, and observable variables and tries to determine a diagnoses for what is causing the outcome state of the system. Mode estimation is a very useful tool for characterizing what has occurred in a system that could produce a given end state. 


# Formulation

Mode estimation will help us identify possible diagnoses for the system, but we still need to develop a model and a search mechanism for mode estimation to be performed on. We will formulate this problem as a search problem, with a constraint satisfaction problem model, which is described in greater detail below.

**Model**: To build our model, we plan to use a constraint satisfaction problem model. Constrain Satisfaction Models are used to describe real world problems. Our model will represent an input state, which is based off a set of variable assignments, and a list of conditions for a state to be a solution. These conditions will be defined by using a set of constraints on the aforementioned variables. 

The constraints that we decide for our system will be vital in defining how the mode estimation will search for the required decisions, as well as how the search will be performed.

For the sake of our problem, we will use a very simple CSP model, which will be trouble shooting circuits. For our model :

**Input:**
For each work station
 - number of circuits
 - nominal voltage required
 
 
 For the full fleet
 - number of workstations
 - total capacity 

**Output:** Determine for each work stage
 - voltage passed to system
 
**Example Constraints:**
 - certain combinations of circuits will be constrained (for example two ground circuits cannot exist in the same loop, a ground circuit must be paired with a positive circuit). 
 - red circuits cannot exist in the same grouping as blue circuits


**Purpose:** Assess if the circuitboard is working properly for the workstation

The function below will be used to develop a model based off this CSP. 

In [5]:
class model_generation():
    
    # CONSTRAINT Decision Variables
    # will be used to define the conditions for a state to be a solution [i.e. passes a given needed voltage]
    

    # Variable 
    # what variables will be used to define each of the work stations
    
    # Domains
    # possible assignments/values of each variable
    
    def __init__(self, variables, domains, constraints):
        self.variables = variables
        self.domains = domains
        self.parents = parents # used to track parents for mode propagation
        self.constraints = {}
        for var in variables:
            if var in constraints:
                self.constraints[var] = constraints[var]
            else:
                self.constraints[var] = []
    
    def assign(self, var,val,assignment):
        ## add {var:val} to assignmnet, override the old value if there was one
        assingment{var}=val

    
    def add_constraint(self, constraint):
        # add a constraint
        pass
    
    def consistent(self, variables, assignments):
        # given list of variables and their assignments
        # check if they are consistant with all constraints
        pass
    
    def search_assigment_brute_force(self, assignments):
        # Given an empty or paritial assignement
        # try every possible assignment for each variable that does not violate constraints
        # stop when we find one possible solution to the question 
        #     in such case, len(assigment) == len(self.variable)
        # Or we searches through all combinations of assignments but no solution can be found, return None
        
        pass
    
    
    def satisfied(self, assignment):
        # check if given assignment satisfies the question and is consistent with the constraint
        # in other words, all variables are assigned to a value that not violate constraint
        pass
    
    def full_assignment(self, assignment):
        # check if given assignment is full assignment to variables 
        pass
        
    def find_nonassigned_variables(self):
        # find decision variables not yet assigned
        pass
        
        

# Simple Search

To start with diagnosing the system, our team will begin with a simple search of the model that looks for logical entailment. It should be noted that a logical entailment check does not involve probabilistic mode estimation. This check is merely to serve as a baseline for comparison between mode estimation and simple searching. In this function, we would assume full knowledge and observability of the system. Then, we would search through the system to see if there was a clear cause that could be result in the error. For example, if work station 2 failed, and work station 2 is solely dependent on component A1 which is the final leaf, then component A1 would be the diagnosis for failure. n this method, we would be looking at the states of the system, to compare the state output with the end state output.

To achieve this, we treat the diagnosis problem as hypothesis testing. First, generate all candidate faults, given symptoms. Second, test if a given candidate account for all symptoms


In [6]:
def candidates(symptom, model)
    # this function generate candidate faults for a given symptom 
    # by identifying which componets in the model are connected where
    # the fault took places
    
    # return a set of components 
    return candidates_set

def testing(symptom, candidate, model)
    # this function test to see if the candidate component's fault models 
    # predict observations
    
    # return true or false
    return boolean


def abductive_diagnosis(symptom, model):
    # given a set of symptoms within a model (aka the starting state)
    # this function will find one (possible if there is many)? diagnosis that could correlate with symptoms
    # if no clear answer is found, then the function will return an empty list
    
    candidates_set = candidates(symptom, model)
    
    for candidate in candidates_set:
        if testing(symptom, candidate, model):
            diagonses += candidate
    
    return diagnoses


The above approach only works if there are exhaustive fault models and only good for single component faults. For unknown or multiple faults, we need to take a more sophisticated approach by defining modes of failure rather than just individual components.
To determine hidden failures, the system must rely on two assumptions:
 - set of diagnoses should be complete
 - set of diagnoses should exploit all available information From these assumptions, the model must generate a list of candidates. 

The modes_diagnosis function below takes a brute force approach by randomly assigning mode values to each component then test to see if that assignment is consistent with the observation. The expected number of candidates will be 2^n(not including failure mode) and 4^n(if including two failure modes) where n is the number of components.


In [None]:
def consistent(symptom, candidate, model)
    # this function test to see if a candidate, i.e. set of mode asisgnment
    # is consistent with the symptom
    
    # return true or false
    return boolean

def modes_diagnosis(symptom, model, inputs):
    # essentially search_assignment_bruteforce function
    # this function generate all assignments of modes that are consistent 
    # with observation
    
    for component in model:
        # component is assigned G for good and U for unknown
        # components can also be assigned failures modes such as S1, always 
        # true, or S2, always false
        # assemble into a test_diagnosis
    if consistent(test_diagnosis):
        # add to set of diagnosis
    
    return diagnosis

This method is a very simple search algorithm, and would be computationally expensive, and in the case where an unknown mode is responsible for the system outcome, would produce a non-optimal solution. To improve the fidelity of our search algorithm, our system will need to compare the constraints of the model. 

# Probabilistic Mode estimation

Once the simple search has been established as a baseline for comparison, the team will focus on developing the mode estimation function. The mode estimation function works by relying on prior probability to determine future likelihood.


The modes in the simple search function were assigned at random. The number of modes estimates that can be tested is exponential due to the unknown mode. Of those estimates, the number of satisfiable diagnosis could be large as well. The problem only gets worse when failure modes are included in the search. If we have a priori distribution of possible failure modes for the components, then we can develop an approximation of most of the probability mass by a few mode estimates. 
The function below covers the methodology that we will take to determine mode estimation for a given leaf node.

In [None]:
def prior_probability(model, probs):
    # assign prior probabilities to component modes, so now our model incorporates a priori 
    # failure modes probabilities
    for component in model:
        for mode in modes:
            comp[mode][prob] = probs[mode]

def p_mode_estimate(observations, model, modes):
    # assume independence between modes and equal chance of observation given mode assignment
    # find probabilies for all the modes given their corresponding observation (aka posteri)
    
    # assume consistent interpreations are equally likely 

    # model counting using DPLL
    
    return probability_set
    
        

# Optimal CSP formulation

In the consistency-based diagnosis formulation above we approached the diagnostic problem by:
- defining a model consisting of components(functions with set inputs and outputs) and system structure(definition of how outputs of an component is connected to the input of another)
- defining a diagnostic problem as a triple of model, observations, and inputs
- generate candidate mode assignments for all components, and check to see if those assignment  are consistent with the observation
- Consistency is defined as a state that satisfies all     constraints
 
We can take a different approach by representing each component as a constraint, whose variables are the relevant quantities in the component description, as well as a variable to represent its mode. For example an AND gate could represented as the following constraint:

- {<good, 1st input 1, 2nd input 1, output 1>,<good, 1st input 1, 2nd input 0, output 0>,<good, 1st input 0, 2nd input 1, output 0>,<good, 1st input 0, 2nd input 0, output 0>,<s1, 1st input 1, 2nd input 1, output 1>,<s1, 1st input 1, 2nd input 0, output 1>,<s1, 1st input 0, 2nd input 1, output 1>,<s1, 1st input 0, 2nd input 0, output 1>,<s0, 1st input 1, 2nd input 1, output 0>,<s0, 1st input 1, 2nd input 0, output 0>,<s0, 1st input 0, 2nd input 1, output 0>,<s0, 1st input 0, 2nd input 0, output 0>,<unknown, 1st input 0 or 1, 2nd input 0 or 1, output 0 or 1>}

After specifying the constraints, we can specify the system structures as another set of constraints, such as the output of component A1 must equal to the first input of component X1.
In this formulation, we could come up with a utility function f(x) and a heuristics h(x) for each component constraint x.  With f and h, we can generate candidate in best-first order, then check consistency of a particular candidate X with value assignment of state variables Y.

# Constraint-Based A* Search

The above section discussed the logic to identifying a consistent solution, and what constraints are. Furthermore, we have laid out how we plan to utilize mode estimation to identify solutions. From here, we can apply a more robust search algorithm to find the optimal solution. 

A* is a best first search algorithm. With CBA*, A* is used to search over the decision variables. In our problem, decision varaibles are the components in the system, where the decision is what mode we assign to that component. An initial state has no assignment, and the goal state is a complete assignment of modes to all components. After full assignment of decision variables, we need to check to see if that is consistent with the state variables, i.e. the inputs and observations. 
With A* we need to identify the functions for f(x)=g(x)+h(x). g(x)could be product of probabilities of decision variables already assigned, while h(x) could be the highest-possible product of each of the remaining unassigned variables. h(x) thus defined is an admissible heurestic because it overestimate the probability. Luckily, we have the probability functions from the previous section to assist with this calculation. 
Our code for CBA* is shown below

In [3]:
def constraint_based_A_star(f, model):
    # f is utility function described above, used for heuristic measurement 
    
    queue=[]
    expanded =set()
    while len(queue) > 0:
        assignment=queue.pop() # get first value in queue
        expanded.add(assignment)
        
        # check if assignment is full assignment to decision variable
        if model.full_assignment(assignment): # potential goal
            if model.consistent(assignmnet):
                return assignment
        else:
            x_i = model.find_nonassigned_variables()
            neighbors= split_on_variable(assignment,x_i)
            # add each x_k in neighbors to queue if not in expanded
            for nbr in neighbors:
                if nbr not in expanded:
                    queue.append(nbr)
            
    # get here, expanded all possibilities but no solution
    return None

In [6]:
def split_on_variable(assignment,x_i):
#purpose choose our successor states as extensions of the current state by picking some of the varaibles
# that have not yet been assignment
    return [union(assignmnet, d_j, x_i) for d_j in x_i]

In [10]:
def consistent(assignment,goal):
    
    for constraint in goal:
        #searches over non-decision variables and checks constraints
        pass

In [12]:
def union(assignment, d_j, x_i):
    return (assignment, x_i[d_j])

One of the drawbacks of CBA* is that is keeps searching over problematic areas of the state space. For example, even if one of the states at the higher depth levels contains a premise that is logically impossible, the children of that node will still be added to the queue and expanded. This creates additional expansions that are not needed. To address the issue of searching spaces that we don't need to, we can apply a pruning strategy to CBA* to imrpove the computational processing time.

This improvement is only possible if we can detect an inconsistency after a full assignment. The consistency function we have only return a boolean and need to be modified to also return which component assignments caused a conflict. 

Suppose a conflict involves several components being assigned impossible values, we can use De Morgan's law to convert the conflict into a constituent kernel: a chain or statement involving  values that are not allowed for various components. 

Now, we can run a CBA* with a mechanism to detect conflict.  We store each detected conflict in a list.  For each conflict in the list, we remove from the search queue any new assignment that will manifest that conflict. This approach is referred to as Conflicted-Directed A* Search (CDA*).

# Conflict-Directed A* Search 

Conflicts refer to a partial set of assignments to decision variables which cannot all be true at once. To identify a conflict, we modify the consistent function that was written above.

In [None]:
def split_on_conflicts(assignment, conflict):
    
    
    #is consistent is a boolean, and conflict is the conflict
    
    # for each assignment ci, for ci in constitutent_kernel, if its self-consistent
    return (assignment,conflict)

Combining our prior definitions of consistency, our new definition of conflict, and a mode estimation heuristic, CDA* is shown in the algorithm below. It is expected that this search method will produce the same result as CBA*, but with less computational time required.

In [None]:
def cda_star()
    queue=[{}]
    rho=[] # list of conflicts
    expanded=set()
    
    while len(queue)>0:
        assignment=queue.pop() ## takes first element from assignment
        expanded.add(assignment)
        # potential goal
        if model.full_assignment(assignment):# (i.e. if assignment matches the goal)
            is_consistent, conflict = consistent(assignment)
            if is_consistent:
                return assignment
            else
                rho.add(conflict)
                search_queue_for_conflict(queue,conflict)
        else: # partial assignment to decision variables
            if all_conflict_resolved:
                # some decision variable not assigned in assignment
                xi=model.find_nonassigned_variables()
                neighbors=split_on_variable(assignment,xi)
            conflict_unresolved = some_conflict_resolved(assignment, rho)
            # some conflict is not resolved
            if conflict_unresolved is not None:
                neighbors = split_on_conflicts(assignment,conflict_unresolved)
                for each in neighbors:
                    if not in expanded_list:
                        queue.add(each)
                        
    #if the code gets here, then its looked through everything but couldnt find it           
    return no solution

In [None]:
def search_queue_for_conflict(queue,conflict):
    # remove anything from Q that manifests conflict

    return queue

In [None]:
def all_conflict_resolved(assignment, rho):
    # check if assigment resolves all conflicts in rho
    return True

In [None]:
def some_conflict_resolved(assignment, rho):
    # check if assignmnet resolves some conflicts in rho
    # find one unresolved conflict
    return conflict_unresolved

**Evaluation:** To evaluate performance of algorithms discusssed above, we can look into run time complexity and space complexity by keep track of number of nodes expanded and maximum size queue. We can see that simple search (brute force) is most computationaly expensive, next is Constraint-based A* search, and best is Conflict-Directed A* Search as it uses pruning to avoid repeatedly exploring states that all won't work for the same reason, aka conflict. 

**Expected Result**: An optimal solution after searching through the CSP to find which mode is most likely.