GCN-Bias

Mitigating gender bias in occupation classification of job biographies with graph convolutional network

We implement the TextGCN by Yao et al. (2018) https://github.com/yao8839836/text_gcn to classify job biographies from the https://github.com/Microsoft/biosbias dataset. We investigate the mitigation of gender bias by comparing the predictions of our trained model for the original test dataset and for a transformation of our test dataset where explicit gender indicators are removed or "scrubbed" in line with De-Arteaga et al. (2019).

Download data at https://drive.google.com/drive/folders/1h2oILArbrTsdN5VrdhWAKpzppwOXtyZO?usp=sharing .

Note that this is only a demo and does not include the training of our TextGCN or the predictions for the "scrubbed" test dataset.

Also, note that we ran into problems when running the demo locally using Jupyer notebooks due to conflicting packages. We had more success running on Google colab and mounting the Google drive to access our data. We have included the Google drive mounted version of the demo in the Google drive folder above so it can be run directly in colab.

Imports
- import packages
Data
- define directory path
- read raw data csv files into pandas for both original and reduced datasets from directory
- create summary statistic table with length of training and test sets and number of occupation labels
- create occupation frequency chart for original data set
Utility Functions
- define some utility functions
Graphs
- clean original test dataset by removing words that are not included in the training vocabulary
- build feature vectors and one hot labels for original test data
- load feature vectors, one hot labels and adjacency matrix for original training data from directory
Model
- define initialisation functions
- define layers
- define training metrics
- define model
Prediction
- load GCN trained with original dataset with gender indicators
- predict occupation labels for original dataset
Analyses
- calculate TPR, TPR gender gap and $\pi_{g,y}$ for the gender "female"
- plot for $\text{Gap}_{female,y}$ and $\pi_{female,y}$ and compute correlation
- compute gender imbalance and compounding factor
- load predictions on scrubbed test dataset
- TPR, TPR gender gap and correlation between TPR gender gap and $\pi_{female,y}$ on scrubbed dataset
- plot of $\text{Gap}_{female,y}$ and $\pi_{female,y}$ for original compared to scrubbed test dataset
- proportion of compounding factors pulled towards 1 after scrubbing

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Demo.ipynb		Demo.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GCN-Bias

Mitigating gender bias in occupation classification of job biographies with graph convolutional network

About

Releases

Packages

Languages

natashabutt/GCN-Bias

Folders and files

Latest commit

History

Repository files navigation

GCN-Bias

Mitigating gender bias in occupation classification of job biographies with graph convolutional network

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages