<a href="https://colab.research.google.com/github/peeyushsinghal/nlp-debias/blob/main/nlp_debias.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Checking which embedding we should take for debias - BERT or Sentence Embedding?


*   We would check the similarity and dissimality of two sentences and see which is high quality of embedding
*   Bias-bench authors taking mean pooling of Vanilla LMs (Auto Encoding and Auto Regressive version) for studying bias in sentence encoders. Research GAP - we can use sentence embedding
*   Sentence transformers are the SoTA models to produce elegant Sentence Embeddings then we need to examine them for biases as they will be widely adopted not Vanilla LMs.



In [1]:
!pip install -q sentence_transformers

[K     |████████████████████████████████| 85 kB 3.1 MB/s 
[K     |████████████████████████████████| 4.9 MB 49.6 MB/s 
[K     |████████████████████████████████| 1.3 MB 52.3 MB/s 
[K     |████████████████████████████████| 120 kB 66.7 MB/s 
[K     |████████████████████████████████| 6.6 MB 66.4 MB/s 
[?25h  Building wheel for sentence-transformers (setup.py) ... [?25l[?25hdone


In [2]:
from sentence_transformers import SentenceTransformer, models, util
from transformers import AutoModel, AutoTokenizer
import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
tokenzier = AutoTokenizer.from_pretrained("bert-base-uncased")
model     = AutoModel.from_pretrained("bert-base-uncased")
model = model.to(device)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


Moving 0 files to the new cache system


0it [00:00, ?it/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [3]:
word_embedding_model = models.Transformer('sentence-transformers/all-MiniLM-L6-v2')
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), 'mean')
st_model = SentenceTransformer(modules=[word_embedding_model, pooling_model], device=device)

Downloading:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [4]:
def vanilla_bert(st_a,st_b):
    inputs = tokenzier([st_a, st_b], padding=True, truncation=True, return_tensors="pt")
    inputs.to(device)
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)


    st_a_emb_v = outputs.last_hidden_state[0,:,:]
    #Just ignoring the PAD tokens before doing mean pooling
    mask = inputs.attention_mask[0] > 0
    indices = torch.nonzero(mask)
    st_a_emb_v = st_a_emb_v[indices[0]: len(indices),:]


    st_b_emb_v = outputs.last_hidden_state[1,:,:]
    # Just ignoring the PAD tokens before doing mean pooling
    mask = inputs.attention_mask[1] > 0
    indices = torch.nonzero(mask)
    st_b_emb_v = st_b_emb_v[indices[0]: len(indices),:]


    # # Mean pooling 
    st_a_emb_v = torch.mean(st_a_emb_v, axis=0)
    st_b_emb_v = torch.mean(st_b_emb_v, axis=0)

    from scipy.spatial.distance import cosine
    return 1 - cosine(st_a_emb_v.cpu().detach().numpy(), st_b_emb_v.cpu().detach().numpy())

In [5]:
def st_bert(st_a, st_b):  
    st_a_emb = st_model.encode(st_a,  convert_to_tensor=True)
    st_b_emb = st_model.encode(st_b,  convert_to_tensor=True)

    from scipy.spatial.distance import cosine
    return 1- cosine(st_a_emb.cpu().detach().numpy(), st_b_emb.cpu().detach().numpy())  

In [6]:
st_a = "I am looking for an apartment in London"
st_b = "I am looking for a place to stay in London"

print("When sentences are similar")
print("==========================")
print("vanilla BERT", vanilla_bert(st_a, st_b))
print("SoTA Sentence Transformer", st_bert(st_a, st_b))
print()
print("When sentences are totally different")
print("==========================")
st_a = "I am looking for an apartment in London"
st_b = "I am looking for a zoo to donate my pet Lion in London" 
print("vanilla BERT", vanilla_bert(st_a, st_b))
print("SoTA Sentence Transformer", st_bert(st_a, st_b))

When sentences are similar
vanilla BERT 0.9250388741493225
SoTA Sentence Transformer 0.8244884014129639

When sentences are totally different
vanilla BERT 0.8194140195846558
SoTA Sentence Transformer 0.48806947469711304






*   SOATA Sentence Transformer is distilled and has embedding size lesser than base BERT
**We see from above the quality of embedding is understood when we have divergent / different sentences. Therefore, it makes more case to use sentence transformers as embedding**

# Diagnosing Bias

In [1]:
!rm -rf /content/bias-bench
!rm -rf /content/mlm-scoring

In [2]:
!git clone https://github.com/PrithivirajDamodaran/bias-bench.git
%cd bias-bench 
%pwd
!python -m pip install -qe .

Cloning into 'bias-bench'...
remote: Enumerating objects: 323, done.[K
remote: Counting objects: 100% (217/217), done.[K
remote: Compressing objects: 100% (206/206), done.[K
remote: Total 323 (delta 118), reused 11 (delta 11), pack-reused 106[K
Receiving objects: 100% (323/323), 6.47 MiB | 9.14 MiB/s, done.
Resolving deltas: 100% (155/155), done.
/content/bias-bench
[K     |██████████████████████████████▎ | 834.1 MB 1.2 MB/s eta 0:00:42tcmalloc: large alloc 1147494400 bytes == 0x650e4000 @  0x7f7de6197615 0x58e046 0x4f2e5e 0x4d19df 0x51b31c 0x5b41c5 0x58f49e 0x51b221 0x5b41c5 0x58f49e 0x51837f 0x4cfabb 0x517aa0 0x4cfabb 0x517aa0 0x4cfabb 0x517aa0 0x4ba70a 0x538136 0x590055 0x51b180 0x5b41c5 0x58f49e 0x51837f 0x5b41c5 0x58f49e 0x51740e 0x58f2a7 0x517947 0x5b41c5 0x58f49e
[K     |████████████████████████████████| 881.9 MB 1.8 kB/s 
[K     |████████████████████████████████| 3.5 MB 47.5 MB/s 
[K     |████████████████████████████████| 311 kB 65.5 MB/s 
[K     |█████████████████████

We would do SEAT tests - which essentially are hypothesis testing and we look for the p-value if it is significant or not. 

Null hypothesis being there is no difference in association to attributes

In [3]:
%%time
! python ./experiments/seat.py --n_samples 100000  --model_name_or_path sentence-transformers/all-MiniLM-L6-v2

Running SEAT benchmark:
 - persistent_dir: /content/bias-bench
 - tests: None
 - n_samples: 100000
 - parametric: False
 - model: BertModel
 - model_name_or_path: sentence-transformers/all-MiniLM-L6-v2
Downloading: 100% 612/612 [00:00<00:00, 476kB/s]
Downloading: 100% 86.7M/86.7M [00:01<00:00, 60.8MB/s]
Downloading: 100% 350/350 [00:00<00:00, 186kB/s]
Downloading: 100% 226k/226k [00:00<00:00, 1.24MB/s]
Downloading: 100% 455k/455k [00:00<00:00, 1.49MB/s]
Downloading: 100% 112/112 [00:00<00:00, 81.3kB/s]
Running test angry_black_woman_stereotype
Loading /content/bias-bench/data/seat/angry_black_woman_stereotype.jsonl...
Computing sentence encodings
	Done!
Computing cosine similarities...
Null hypothesis: no difference between WhiteFemaleNames and BlackFemaleNames in association to attributes NearAntonyms and AngryBlackWomanStereotype
Computing pval...
Using non-parametric test
Drawing 99999 samples (and biasing by 1)
pval: 0.218
computing effect size...
esize: 0.293
Running test angry_bl

# Debiasing 
We would use INLP technique to debias
* Idea is to get the projection matrix

In [4]:
from google.colab import drive
drive.mount('/content/gdrive',force_remount=False)

Mounted at /content/gdrive


In [5]:
!mkdir ./data/text
import os
!cp "/content/gdrive/MyDrive/wikipedia-text-data/wikipedia-2.5.txt.zip"  "./data/text/wikipedia-2.5.txt.zip"
os.system('unzip ./data/text/wikipedia-2.5.txt.zip -d ./data/text/')

0

In [6]:
%%time
import nltk
nltk.download('punkt')
!python ./experiments/inlp_projection_matrix.py  --bias_type gender --model_name_or_path sentence-transformers/all-MiniLM-L6-v2

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Computing projection matrix:
 - persistent_dir: /content/bias-bench
 - model: BertModel
 - model_name_or_path: sentence-transformers/all-MiniLM-L6-v2
 - bias_type: gender
 - n_classifiers: 80
 - seed: 0
Loading INLP data:   6% 87837/1372632 [00:13<02:40, 8024.76it/s]INLP dataset collected:
 - Num. male sentences: 10000
 - Num. female sentences: 10000
 - Num. neutral sentences: 10000
Loading INLP data:   6% 88640/1372632 [00:13<03:17, 6491.58it/s]
Encoding male sentences: 100% 10000/10000 [03:44<00:00, 44.57it/s]
Encoding female sentences: 100% 10000/10000 [03:36<00:00, 46.10it/s]
Encoding neutral sentences: 100% 10000/10000 [03:17<00:00, 50.76it/s]
Dataset split sizes:
Train size: 14700; Dev size: 6300; Test size: 9000
iteration: 79, accuracy: 0.33380952380952383: 100% 80/80 [06:53<00:00,  5.17s/it]
Saving computed projection matrix to: /content/bias-bench/results/projection_matrix/all-MiniLM-L6-v2.pt
CPU times: user 18.7 s, sys: 2.79 s, total: 21.5 s
Wall time: 18min 1s


In [7]:
%%time
! python ./experiments/seat_debias.py --n_samples 100000   --tests sent-weat6 sent-weat6b sent-weat7 sent-weat7b sent-weat8 sent-weat8b --model INLPBertModel --projection_matrix  /content/bias-bench/results/projection_matrix/all-MiniLM-L6-v2.pt --model_name_or_path sentence-transformers/all-MiniLM-L6-v2
#! python ./experiments/seat_debias.py --n_samples 100000   --tests sent-weat3 sent-weat3b sent-weat4 sent-weat5 sent-weat5b sent-angry_black_woman_stereotype sent-angry_black_woman_stereotype_b --model INLPBertModel --projection_matrix  /content/bias-bench/results/projection_matrix/all-MiniLM-L6-v2.pt --model_name_or_path sentence-transformers/all-MiniLM-L6-v2
#! python ./experiments/seat_debias.py --n_samples 100000   --tests sent-religion1 sent-religion1b sent-religion2 sent-religion2b --model INLPBertModel --projection_matrix  /content/bias-bench/results/projection_matrix/all-MiniLM-L6-v2.pt --model_name_or_path sentence-transformers/all-MiniLM-L6-v2

Running SEAT benchmark:
 - persistent_dir: /content/bias-bench
 - tests: ['sent-weat6', 'sent-weat6b', 'sent-weat7', 'sent-weat7b', 'sent-weat8', 'sent-weat8b']
 - n_samples: 100000
 - parametric: False
 - model: INLPBertModel
 - model_name_or_path: sentence-transformers/all-MiniLM-L6-v2
 - bias_direction: None
 - projection_matrix: /content/bias-bench/results/projection_matrix/all-MiniLM-L6-v2.pt
 - load_path: None
 - bias_type: None
Running test sent-weat6
Loading /content/bias-bench/data/seat/sent-weat6.jsonl...
Computing sentence encodings
	Done!
Computing cosine similarities...
Null hypothesis: no difference between MaleNames and FemaleNames in association to attributes Career and Family
Computing pval...
Using non-parametric test
Drawing 99999 samples (and biasing by 1)
pval: 0.867
computing effect size...
esize: -0.198
Running test sent-weat6b
Loading /content/bias-bench/data/seat/sent-weat6b.jsonl...
Computing sentence encodings
	Done!
Computing cosine similarities...
Null hypo

# New Section