<a href="https://colab.research.google.com/github/jcdumlao14/ESS11DataAnalysis/blob/main/Part2_Sensitivity_Analysis_and_Method_Comparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Part 2 Sensitivity Analysis and Method Comparison**

# **Comparison of ATE: Complete Case vs. Imputed**

In this analysis, we compared the Average Treatment Effect (ATE) estimates using two approaches to handle missing data: **Complete Case analysis and Imputation**.
- **ATE Values**: Both methods produced the exact same estimate of **-3.34**, showing no difference between the two approaches.
- **Confidence Intervals (CIs)**: The intervals are very wide, ranging from approximately **-40.6 to 33.9** for both methods. This indicates substantial uncertainty around the estimates.
- **Consistency Across Methods**: Since both methods yielded the same ATE, we can conclude that missing data is not a major driver of variation or bias in this case.
- **Uncertainty and Reliability**: The large width of the CIs highlights that while the point estimate is consistent, the results are imprecise. This reduces confidence in drawing strong conclusions from the ATE alone.




# **Comparison of ATE: Causal Forest vs. DML**
This comparison highlights how different causal inference methods can produce varying estimates of the Average Treatment Effect (ATE).
- **ATE Values**:
   - Causal Forest estimated the ATE at **-3.34**, suggesting a small negative effect.
   - DML (Double Machine Learning) estimated the ATE at **0.58**, pointing to a small positive effect.
- **Confidence Intervals**:
   - The Causal Forest estimate has a very wide confidence interval **(-40.6 to 33.9)**, which reflects substantial uncertainty.
   - The DML estimate has a much narrower confidence interval **(-3.0 to 4.2)**, indicating greater precision.

**Key Findings:**
   - **Different Directions**: The methods disagree not only in magnitude but also in the direction of the effect (negative vs. positive).
   - **Uncertainty Gap**: Causal Forest’s wide interval suggests it is more sensitive to noise or variability in the dataset. In contrast, DML provides a more stable and precise estimate.
   - **Method Choice Matters**: The divergence between the two results underlines how the choice of causal method can meaningfully influence both the interpretation and reliability of findings

# **Exclude Small-Sample Countries and Rerun Key Analyses**
- **Data Sizes**:
 The dataset originally had **46,162 observations**, and after applying the “exclude small-sample countries” filter, the size is still **46,162**. This means that no countries were actually excluded. This could be because (a) the exclusion criteria were not triggered, or (b) the filtering step was not applied correctly.
- **ATE (Large Countries)**:
 The ATE is **-3.34**, with a confidence interval ranging from **-40.6 to 33.9**. While the point estimate suggests a small negative effect, the extremely wide confidence interval shows high uncertainty, meaning the true effect could be negative, neutral, or even positive.

- **Key Findings:**
  - **No Change in Sample**: The dataset remained unchanged despite attempting to exclude small-sample countries.
  - **Uncertain ATE**: The wide confidence interval makes the estimate unreliable and prevents strong conclusions about the treatment effect.




# **Test Effect of Survey Weights: Weighted vs. Unweighted Results**
- **ATE Estimates:**
   - Unweighted: **-3.34** with a confidence interval of **-40.6 to 33.9**.
   - Weighted: **-2.28** with a confidence interval of **-35.3 to 30.7**.
- **Effect of Weighting**:
 Applying survey weights makes the ATE slightly less negative. This suggests that weighting adjusts the estimate in a way that may better represent the population distribution, accounting for sampling design.
- **Confidence Intervals**:
 In both cases, the intervals are very wide, covering a broad range of possible values. This shows there is still high uncertainty in the results, regardless of whether weights are applied.

- **Key Findings**:
  - Weighting influences the point estimate but does not eliminate uncertainty.
  - The direction of the effect remains similar (slightly negative), though the weighted result suggests a somewhat smaller effect size.
  - Precision remains a problem, as the true ATE could still plausibly lie anywhere within the wide intervals.


# **Compute E-value**
- **Metric Overview:**
 The E-value is a sensitivity analysis tool that evaluates how strongly an unmeasured confounder would need to be associated with both the treatment and the outcome to fully explain away the observed effect.
- **Results:**
  - **E-value: 6.14**. This means that an unmeasured confounder would need to have a risk ratio of at least 6.14 with both the treatment and outcome to account entirely for the observed association.
  - **E-value Lower CI**: Reported as **67.31.** This number is unusually higher than the main E-value, which is inconsistent with standard reporting (the lower CI should typically be smaller). This may indicate a calculation or reporting issue.
- **Key Findings:**
  - **Robustness to Confounding**: The high E-value suggests that the observed effect is fairly robust. It would require a very strong unmeasured confounder to nullify the association.
  - **Potential Inconsistency**: The fact that the lower CI is much larger than the main E-value is unusual and may point to an error. Normally, the CI bound reflects a weaker threshold, not a stronger one.




# **Summary Analysis**
- **ATE Variability**: The estimated Average Treatment Effect (ATE) changes depending on the method used. Complete Case, Imputed, Causal Forest, Large Countries, and Unweighted analyses all converge on the same value (**-3.34**). In contrast, the DML and Weighted approaches produce different estimates, with DML shifting the ATE into positive territory (**0.58**) and weighting making it less negative (**-2.28**).
- **High Uncertainty**: Most methods—including Complete Case, Imputed, Causal Forest, Large Countries, and Unweighted—share extremely wide confidence intervals (**-40.6 to 33.9**), which signals very low precision. Even the weighted analysis has wide bounds (**-35.3 to 30.7**), underscoring substantial uncertainty across approaches.
- **DML Contrast**: The DML method stands out by producing a positive ATE and a much narrower confidence interval, indicating greater precision compared to the other methods.
- **E-value**: The reported E-value indicates some robustness to unmeasured confounding. However, the unusually high lower confidence interval and missing/invalid upper bound suggest a calculation or interpretation issue that should be re-checked.
- **Implications**: The results highlight how strongly the choice of method shapes both the magnitude and reliability of the ATE. Most methods yield highly uncertain estimates, while DML and Weighting reduce uncertainty but also alter the interpretation of the effect—shifting it toward positive or less negative.


### Refer to the plot for illustration in the link provided: https://github.com/jcdumlao14/ESS11DataAnalysis/blob/main/Sensitivity_Analysis_And_Method_Comparison.ipynb