<a href="https://colab.research.google.com/github/jcdumlao14/ESS11DataAnalysis/blob/main/Part2_Sensitivity_Analysis_and_Method_Comparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Part 2 Sensitivity Analysis and Method Comparison**

# **Comparison of ATE: Complete Case vs. Imputed**

In this analysis, we compared the Average Treatment Effect (ATE) estimates using two approaches to handle missing data: **Complete Case analysis and Imputation**.
- **ATE Values**: Both methods produced the exact same estimate of **1.621**, showing no difference between the two approaches.
- **Confidence Intervals (CIs)**: The intervals are very wide, ranging from approximately **-58 to 61** for both methods. This indicates substantial uncertainty around the estimates.
- **Consistency Across Methods**: Since both methods yielded the same ATE, we can conclude that missing data is not a major driver of variation or bias in this case.
- **Uncertainty and Reliability**: The large width of the CIs highlights that while the point estimate is consistent, the results are imprecise. This reduces confidence in drawing strong conclusions from the ATE alone.




# **Comparison of ATE: Causal Forest vs. DML**
This comparison highlights how different causal inference methods can produce varying estimates of the Average Treatment Effect (ATE).
- **ATE Values**:
   - Causal Forest estimated the ATE at **1.621**.
   - DML (Double Machine Learning) estimated the ATE at **1.529**.
- **Confidence Intervals**:
   - The Causal Forest estimate has a very wide confidence interval **(approximately -58 to 61)**, which reflects substantial uncertainty.
   - The DML estimate has a much narrower confidence interval **(-1 to 4)**, indicating greater precision.

**Key Findings:**
   - Both methods estimate similar results, indicating that there is no divergence between the two results.
   - The Causal Forest estimate has a wide confidence interval, suggesting it is more sensitive to noise or variability in the dataset.
   - The divergence between the confidence interval suggest that the confidence can meaningfully influence both the interpretation and reliability of findings.

# **Exclude Small-Sample Countries and Rerun Key Analyses**
- **Data Sizes**:
 The dataset originally had **46,162 observations**, and after applying the “exclude small-sample countries” filter, the size is still **46,162**. This means that no countries were actually excluded. This could be because (a) the exclusion criteria were not triggered, or (b) the filtering step was not applied correctly.
- **ATE (Large Countries)**:
 The ATE is **1.6213**, with a confidence interval ranging from **-58.7259 to 61.9685**. While the point estimate suggests a small positive effect, the extremely wide confidence interval shows high uncertainty, meaning the true effect could be negative, neutral, or even positive.

- **Key Findings:**
  - **No Change in Sample**: The dataset remained unchanged despite attempting to exclude small-sample countries.
  - **Uncertain ATE**: The wide confidence interval makes the estimate unreliable and prevents strong conclusions about the treatment effect.




# **Test Effect of Survey Weights: Weighted vs. Unweighted Results**
- **ATE Estimates:**
   - Unweighted: **1.621** with a confidence interval of **-60 to 60**.
   - Weighted: **-11.858** with a confidence interval of **-110 to 90**.
- **Effect of Weighting**:
 Applying survey weights makes the ATE slightly positive to negative. This suggests that weighting significantly alters the estimate, potentially due to adjustments for sampling design and population representation.
- **Confidence Intervals**:
 In both cases, the intervals are very wide, covering a broad range of possible values. This shows there is still high uncertainty in the results, regardless of whether weights are applied.

- **Key Findings**:
  - Weighting substantially influences the point estimate and highlights the importance of consederation survery design.
  - Precision remains a problem, as the true ATE could still plausibly lie anywhere within the wide intervals, limiting the strength of conclusions.


# **Compute E-value**
- **Metric Overview:**
 The E-value is a sensitivity analysis tool that evaluates how strongly an unmeasured confounder would need to be associated with both the treatment and the outcome to fully explain away the observed effect.
- **Results:**
  - **E-value: 10.34**. This means that an unmeasured confounder would need to have a risk ratio of at least 10.34 with both the treatment and outcome to account entirely for the observed association.
  
- **Key Findings:**
  - **Robustness to Confounding**: The high E-value suggests that the observed effect is fairly robust. It would require a very strong unmeasured confounder to nullify the association.
  



# **Summary Analysis**
- **ATE Variability**: The estimated Average Treatment Effect (ATE) changes depending on the method used.
  - Complete Case, Imputed, Causal Forest, Large Countries, and Unweighted analyses all converge on a similar value (approximately 1.62).
  - DML produces a slightly different estimate (approximatly 1.53).
  - The Weighted approach significantly shifts the ATE to -11.86.

- **Confidence Intervals**:
  - Complete Case, Imputed, Casusal Forest, Large Countries, and Unweighted analyses all have exteremely wide confidence intervals (approximately -58.73 to 61.97), signal very low precision.
  - DML has a much narrow confidence interval (-3.42 to 6.48) suggesting greater precesion.
  - The Weighted analysis has a wide confidence interval (-114.94 to 91.23)
- **E-value**: The E-value is 2.62, with a lower confidence interval of 0.50. The upper confidence interval is missing. The E-value suggests some robustness to unmeasured confounding, but the missing upper confidence interval limits the interpretation.




### Refer to the plot for illustration in the link provided: https://github.com/jcdumlao14/ESS11DataAnalysis/blob/main/Sensitivity_Analysis_And_Method_Comparison.ipynb