# Gender Bias Study continued: what-if analysis

This is a continuation of the Gender Bias Study where we explore this dataset and the Causality library a bit further. Specifially, we'll explore what the overall outcome of admissions would look like if female and male preferences were equal. And to do that, we'll use the library's **intervention** feature. So let's load the data and fit the graph just like we did before:

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

import pandas as pd, networkx as nx, dowhy.gcm as gcm
gcm.config.disable_progress_bars()


data = pd.read_csv("student_admissions_berkeley.csv")

causal_model = gcm.StructuralCausalModel(
    nx.DiGraph([("Gender", "Department"), ("Gender", "Admission"), ("Department", "Admission")]))
gcm.auto.assign_causal_mechanisms(causal_model, data)

gcm.util.plot(causal_model.graph)

gcm.fit(causal_model, data)

We said the imbalance comes from *different preferences* in department choice between males and females and that departments have different admission rates. We can show the different admission rates not just in the original data, we can also use Causality's `draw_sample` function to sample generated data from the graph:

In [None]:
sampled_data = gcm.draw_samples(causal_model, num_samples=10000)
sampled_data.groupby(["Admission", "Department"]).size().unstack().plot(kind='bar');

As we can see, this resembles the different admission rates that we already saw in the real data.

So let’s verify **what** happens **if** we could make department preference equal between men and women. We can treat this as an intervention and compute the scenario like this:

In [None]:
import random

interventional_data = gcm.interventional_samples(causal_model, 
                                                 {'Department': lambda d: random.choice(['A', 'B', 'C', 'D', 'E', 'F'])}, 
                                                 num_samples_to_draw=100000)

admissions = interventional_data.groupby(["Admission", "Gender"]).size().unstack()
admissions.plot(kind='bar');

Let's look at the admission rates now:

In [None]:
print("Male admission rate:", round(100*admissions["Male"]["Yes"]/(admissions["Male"]["Yes"]+admissions["Male"]["No"])))
print("Female admission rate:", round(100*admissions["Female"]["Yes"]/(admissions["Female"]["Yes"]+admissions["Female"]["No"])))

Looks like in fact *women* were preferred. But is this true? Let's take a step back. This represents only a single snapshot over the data. What we actually need to do, is to repeat this computation many times, and over a random subset of our data. Why is that?

We trained our causal graph on one dataset, drew one set of samples from the interventional distribution, and obtained a point estimate from those samples. There can be variations both during the training process as well as during the drawing process. To account for that variability, we can train the same causal graph on different random subsets of our original data. So now, performing interventions on those trained causal graphs and repeating the whole operation many times, keeps our results more faithful to the data. We now have a proper confidence interval:

In [None]:
import numpy as np

def intervene(causal_model, interventions):
    admissions = gcm.interventional_samples(causal_model, 
                                           interventions, 
                                           num_samples_to_draw=10000).groupby(["Admission", "Gender"]).size().unstack()
    return {
        'male': admissions["Male"]["Yes"]/(admissions["Male"]["Yes"]+admissions["Male"]["No"]), 
        'female': admissions["Female"]["Yes"]/(admissions["Female"]["Yes"]+admissions["Female"]["No"])
    }

median, intervals = gcm.confidence_intervals(
    gcm.bootstrap_training_and_sampling(intervene, 
                                        causal_model, data, 
                                        interventions={'Department': lambda d: random.choice(['A', 'B', 'C', 'D', 'E', 'F'])}), 
    confidence_level=0.95, 
    num_bootstrap_resamples=50)

median, intervals

From looking at the `intervals`, we can see that we have a big overlap. However, the results still suggest that there was in fact a small bias towards women. The original [UC Berkeley gender bias study](https://homepage.stat.uiowa.edu/~mbognar/1030/Bickel-Berkeley.pdf) comes to the conclusion that there is a “[small but statistically significant bias in favor of women](https://en.wikipedia.org/wiki/Simpson%27s_paradox#UC_Berkeley_gender_bias)“, which would match our results here. However, we should keep in mind, there are always hidden confounders similar to Department, that are not contained in this dataset. 