### Exploration (40 pts)

In these problems, you are meant to do creative exploration.  Define and explore:

E.1 A discrete inference problem (20 pts)

E.2 A continuous inference problem (20 pts)

This is meant to be open-ended; you should not feel the need to write a book chapter; but neither should you just change the numbers in one of the problems above.  After doing the readings and problems above, you should pick a concept you want to understand better or an simple modeling idea you want to try out.  You can also start to explore ideas for your project.  The general idea is for you to teach yourself (and potentially a classate) about a concept from the assignments and readings or solidify your understanding of required technical background. For additional guidance, see the grading rubric below.

You can use the readings and other sources for inspiration, but here are a few ideas:
- An inference problem using categorical data
- A disease for which there are two different tests
- A two-dimensional continuous inference problem
- The idea of a conjugate prior


#### Exploration Grading Rubric

Exploration problems will be graded according the elements in the table below.  The scores in the column headers indicate the number of points possible for each rubric element (given in the rows).  A score of zero for an element is possible if it is missing entirely.

|     | Substandard (+1) | Basic (+2) | Good (+3) | Excellent (+5) |
| :-- | :----------- | :---- | :--- | :-------- |
| <b> Pedagogical Value </b> | No clear statement of idea or concept being explored or explained; lack of motivating questions. | Simple problem with adequate motivation; still could be a useful addition to an assignment. | Good choice of problem with effective illustrations of concept(s).  Demonstrates a deeper level of understanding. | Problem also illustrates or clarifies common conceptual difficulties or misconceptions. |
| <b> Novelty of Ideas </b> | Copies existing problem or makes only a trivial modification; lack of citation(s) for source of inspiration. | Concepts are similar to those covered in the assignment but with some modifications of an existing exericse. | Ideas have clear pedagogical motivation; creates different type of problem or exercise to explore related or foundational concepts more deeply. | Applies a technique or explores concept not covered in the assignment or not discussed at length in lecture. | 
| <b> Clarity of Explanation </b> | Little or confusing explanation; figures lack labels or useful captions; no explanation of motivations. | Explanations are present, but unclear, unfocused, wordy or contain too much technical detail. | Clear and concise explanations of key ideas and motivations. | Also clear and concise, but includes illustrative figures; could be read and understood by students from a variety of backgrounds. |
| <b> Depth of Exploration </b> | Content is obvious or closely imitates assignment problems. | Uses existing problem for different data. | Applies a variation of a technique to solve a problem with an interesting motivation; explores a concept in a series of related problems. | Applies several concepts or techniques; has clear focus of inquiry that is approached from multiple directions.|

#### Exploration Introduction
It has been several years since I have looked in-depth at probability. I therefore, wanted to use this Exploration to refresh my knowledge via somewhat complex/new inference problems. Below, I look at one example of a discrete inference problem and one example of a continuous inference problem, while trying to uncover and refresh some basic knowledge for myself.

#### E.1 A discrete Inference Problem
Consider the case where you draw three cards from a standard 52-card deck where the values of the cards for ambiguous cards are as follows: Ace = 1, Jack = 11, Queen = 12, King = 13. <br><br>
Consider when we draw three cards, we set X to be the value of the first card drawn ($X \in \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13\}$) each with probability $\frac{1}{13}$. We set Y to be the sum of the next two resulting draws ($Y \in \{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26\}$). The probability of each of these outcomes is reliant on $X$ since drawing a card with the same value as that drawn in $X$ is reduced from $\frac{4}{52} = \frac{1}{13}$ to $\frac{3}{51}$ and therefore does not stay constant. Further, the probabilities are reduced again on the second draw in $Y$, reducing the proabability of drawing the same card as chosen previously for $Y$.
<br><br>
Consider, from knowing $X$ and $Y$, we want to know what cards were drawn. Obviously, its fairly impossible to accurately know what the cards drawn were since we start and finish with a model that is fairly uniform regardless of which cards were actually drawn. However, we can still model this to some effect, hopefully at least making some knowledgable decisions as follows.


##### Defining the deck of cards
We can first define a deck of cards which allows us to draw cards with the correct probabilities

In [35]:
# define the deck of cards
import random

class deck:

    def __init__(self, nVals:int=13, nEach:int=4):
        # values and probabilities
        self.vals = [n for n in range(1, nVals+1)]
        self.probs = nEach

        # the cards themselves
        self.cards = []
        for v in self.vals:
            for p in range(self.probs):
                self.cards.append(v)
    
    def deck(self):
        return self.cards

    def draw(self, nCards:int=1):
        cache = []
        for i in range(nCards):
            # shuffle the deck and take the top card
            for i in range(5): random.shuffle(self.cards)
            cache.append(self.cards.pop(0))

        return cache

    def probability(self, card:int=1):
        return self.cards.count(card)/len(self.cards)

    def value(self, cards:list):
        return sum(cards)

In [40]:
# draw some cards
d = deck()
print(d.deck())

# event X (draw the first card)
D_1 = d.draw(1) 
print(f"X = {d.value(D_1)}") 

# event Y (draw the next 2 cards)
D_2 = d.draw(2) 
print(f"Y = {d.value(D_2)}") 

[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13]
X = 2
Y = 22


Since $X$ is defined as the value of the first drawn card ($D_1$), it is easy to figure out which card was drawn for event $X$. In other words, $p(D_1|X)=1$ since if $X$ is known, then the card itself is known. In the example drawn above, $X=2$; since only one card was drawn, we know that the card is also $2$. This part is fairly obvious, but gives us information that the probability of drawing $2$ for event $Y$ is reduced. 
<br><br>
Although I don't show this here, we can develop a dynamic programming solution to find the possible solutions for $Y$. Some potential solutions are $\{13, 9\}, \{12, 10\}, \{11, 11\}$, etc. Since we did not recieve any additional information from event $X$ in this case, we effectively have equal probability of choosing the correct values that brought about $Y$. We can right this as a probability function as follows where $Y_1$ and $Y_2$ are the individual draws which create the result $Y$. Therefore, the probability that we are trying to find is $p(Y_1,Y_2|X,Y)$. 
<br><br>
We can immediately reduce the number of combinations to those values which sum to Y and assemble our probabilities from these values. If $Y=22$, then the possible combinations of cards are only 3: {13, 9}, {12, 10}, {11, 11} where we do not care about the order of the cards being drawn. Since none of the combinations inlcude the value $2$, we do not need to account for its changed probability of being selected. Therefore, we can select one of these combinations at random as a guess for the cards drawn. 
<br><br>
Since we effectively have an equal probability that any of the guesses outlined above are correct, say we guess that the cards drawn in event $Y$ ($FD_2$, $FD_3$) are (11, 11). This is a valid guess and has the same probabilty of being right as the true answer outlined below.

In [41]:
# show the cards drawn for event Y
print(D_2)

[13, 9]


We can see that although my guess above of {11, 11} is incorrect, our mathematical thinking process was correct.

#### E.2 A continuous inference problem
This past summer, I interned at Surgo Health, a company that is trying to solve healthcare problems by looking and social and behavioral contexts of patients. In doing so, they exploit causal machine learning methods to better understand and classify patients and their actions. Of course, this requires massive datasets, with my manager briefly explaining that they often avoid causal ML on datasets with fewer than 10,000 samples. Because of the lack of datasets that are readily available, part of my internship included building a generative network that would be able to generate causally-linked data across 50+ discrete features. This work ended up being the inspiration behind me taking this course, but left me with a specific question - How might we go about building bayesian networks for continuous features. As far as I could see, doing so with discrete variables was hard enough, but allows one to bin values. Making causal predictions across continous features felt somewhat far-fetched. Therefore, for this exploration, I decided to look into several papers and websites that explain continous bayesian networks.<br><br>
Looking ahead in the textbook, I was able to find a section on bayesian networks with continuous variables. It seems that implementing this as a generative model would not be entirely difficult since we can define a causal relationship via a function. Similarly, implementing a mixed model using both continuous and discrete values would mean generating values similarly as done using the continuous model and then adding a discretization step where a threshold is applied to create $n$ distinct classes.<br><br>
Below, I have tried to implement a small continuous generative network with three nodes which create a v-structure. A v-structure looks as follows: ![](v-structure.png)

We can define a node $A$, $B$, and $V$ shown above by its relationship with it's child node. For simplicity, I use a linear relationship shown below.

In [3]:
import numpy as np

class Node:

    def __init__(self, m:float=1, b:float=0) -> None:
        self.slope = m
        self.intercept = b
        self.out = None

    def propogate(self, input:np.array):
        self.out = input * self.slope + self.intercept

In [6]:
# define the parent nodes
A = Node(m=3.1, b=1.1)
B = Node(m=0.08, b=-0.79)

# pass inputs to the parent nodes to determine the input for node V
A.propogate(np.random.beta(1,1,5))
B.propogate(np.random.beta(1,1,5))

# view the outputs
print(A.out)
print(B.out)

[3.66662417 2.17961209 1.57043584 1.11726065 2.50628165]
[-0.73056406 -0.76323252 -0.71923614 -0.74444095 -0.78672049]


From the output above, the input to node $V$ would be somewhat difficult to define, but we can default this to be an additive relationship

In [7]:
# define node V
V = Node(m=5, b=0)

# define the input to V and propogate
V_in = np.add(A.out, B.out)
V.propogate(V_in)

# print output
print(V.out)

[14.68030052  7.08189789  4.25599853  1.86409853  8.59780582]


This short exploration very crudely defined a generative model to create causally-linked variables. To make this more complete, I would initialize $A$, $B$, and $V$ as Nodes in a graph which could then be used to generate values automatically with one propogation call. One thing that strikes me is the diversity of the distribution at the end of out the output from node $V$. Seeing as this generative model only has 3 nodes with fairly simple linear relationships, working backwards to regain the knowledge of slope/intercept seems much harder than using discrete variables. Hopefully, I can continue to explore this in another exploration.