## Introduction
The goal of this project was to recreate the neural network and the results (partly) of the following paper: Estimating individual treatment effect: generalization bounds and algorithms (Uri Shalit et al.). The autors of this paper aim to predict the indiviudel treatment effect (ITE) from observational data. Therefore they propose a CFR (Counterfactual Regression) framework. In addition, the authors used a Tensorflow framework for the creating of their network. In this project we aim to reconstruct their network in pytorch.

## Individual treatment effect
The invidual treatment effect (ITE) can be seen as the effect of a certain treatment for a specific patient. One can imagine that it is very usefull to estimate the outcome of a given treatment before the patient receive the treatment.A doctor could then choose the treatment based on the estimated outcome. Observational data e.g. given medication is used order to make such predictions. The ITE can be calculated using the following formula:

$$ITE (x) = E[Y_1 - Y_0 | x]$$

Where $$Y_1$$ is the outcome given the treatment and $$Y_0$$ the outcome not given the treatment. Then the ITE of a certain treatment is then the expected outcome given the treatment minus the expected outcome not given the treatment when observing data x.

## IHDP dataset
This project makes use of the IHDP (Infant Health and Development Program) dataset, in this dataset consist of data from early born baby's and their mother. It includes data of 747 patients from whom 25 covariantes (x) where measured for a total of 100 treatments.

![Table](img/Flowdiagram_repo_projec_Page-3.png)

This table shows the structure of the dataset.
The value t {1,0} indicates wether a patient is treated (1) or not (0), the y_factual is the outcome and the yc_factual is the outcome if one would have give the opposite treatment. This for example: if a diabetic patient received an insuline in treatment the t value would be 1 for the insuline treatment and the y_factual would be the "factual" outcome, the yc_factual outcome would then be the outcome given the opposite treatmentm, so when no insuline was given.

## CFR-net
For the recreating of the CFR network we firstly stepwise unrevaled the Tensorflow code made by the authors. For this purpose we created a flowchart to gain insight in the proposed network:
![Flowchart](img/Flowdiagram_repo_projec_Page-2.png)

It can be seen that the network consist of 6 layers, the ReLU a non-linear activation, uses mutiple dropout layers and at half-way it concatenates with the t-values. Finally it gives an output vector y (factual).

In [None]:
class FCNet(nn.Module):
    """
    Simple fully connected neural network with residual connections in PyTorch.
    Layers are defined in __init__ and forward pass implemented in forward.
    """

    def __init__(self):
        super(FCNet, self).__init__()

        p = 0.4

        self.h_in = nn.Linear(25, 100)
        self.layer_1 = nn.Linear(100, 100)
        self.layer_2 = nn.Linear(100, 100)
        self.layer_3 = nn.Linear(101, 100)
        self.layer_4 = nn.Linear(100, 100)
        self.layer_5 = nn.Linear(100, 100)

        self.do1 = torch.nn.Dropout(p=p)
        self.do2 = torch.nn.Dropout(p=p)
        self.do3 = torch.nn.Dropout(p=p)
        self.do4 = torch.nn.Dropout(p=p)
        self.do5 = torch.nn.Dropout(p=p)
        self.do6 = torch.nn.Dropout(p=p)
        self.fc6 = nn.Linear(100,1)

    def forward(self, x, t):
        h = self.do1(F.relu(self.h_in(x)))
        h = self.do2(F.relu(self.layer_1(h)))
        h_rep = self.do3(F.relu(self.layer_2(h)))
        h = self._build_output_graph( h, t)

        return h, h_rep


## Training loop
The training loop of the CFR network makes use of a ADAM optimizer and a MSE-loss criterion. The data set was splitted into 75 test samples and 672 train samples. A total of 2000 epochs was used for training.


## Corrections for distribution difference
To compensate for the difference in group size (treated / non treated) a sample re-weighting was introduced. All the samples were re-weighted with the following formula:

$$wi = \frac{ti}{2u} + \frac{1-ti}{2(1-u)}$$ for $$i = 1 ... n $$

With t {1,0} and u, the treatment prediction i.e. the chance of being treated.

In [None]:
    # Sample reweighting
    if flags.get_val('reweight_sample'):
        w_t = t / (2 * p_t)
        w_c = (1 - t) / (2 * 1 - p_t)
        sample_weight = w_t + w_c
    else:
        sample_weight = 1.0

In addition to the re-weighting of the samples an imbalance error was introduced in the loss function. The imbalnce error adjusts for the bias induced by the treatment group imbalance. There are different methods for the calculation of the imbalance error. For this project the squared linear Maximum Mean discrepancy (MMD) and the Wasserstein methods were used. The actual computations of these imbalance errors goes a bit beyond the scope of this blog post. However, it is good to know that there was corected for the distribution imbalance in two differnt ways. This will also result in two different outcomes.

## Outcome measures
--> PEHE
--> ATE

## Conclusion