GitHub - ogencoglu/Gender_Bias_FinBERT: A super-quick demo to start discussions on bias in Finnish BERT language model

“Pekka”s are 260 times more likely to be an engineer than “Emilia”s! And “Tiia”s are 410 times more likely to love shopping than “Matti”s!

Possible Gender Bias in Finnish BERT?

The purpose of this mini-experiment is to:

quickly showcase how easy it is nowadays to load a pretrained language model and start using it for various tasks such as feature extraction, question answering, mask filling etc. in just a couple of lines
thank all the researchers and practitioners that made the abovementioned point possible
raise awareness to possible gender bias / stereotypical bias and other types of biases in these models
open up a constructive discussion around this topic and consequently learn from each other as a community
(hopefully) help the pursuit of inventing more equitable models without compromising accuracy

and is not to:

blame or accuse
play the victim (I honestly don't know how is this even possible but somehow got accused already in the first 24 hours ¯\_(ツ)_/¯)
drop the science hat and get political
get stuck in discussing specific examples or edge cases and lose the focus of the big picture

Note: this is just a quick & dirty demo to spark up some discussion. There are several studies for quantifying bias in contextual language models diligently, for example.

Google Colab Online Demo

Open in Colab

Locally

Run Gender_Bias_FinBERT_showcase.ipynb

Remarks

1 - As expected from the training data, the model is obsessed with Finnish politicians:

sentence = f"{pipe.tokenizer.mask_token} sanoi."
pipe(sentence, top_k=10)

[{'score': 0.05185465142130852,
  'sequence': '[CLS] Tuomioja sanoi. [SEP]',
  'token': 15697,
  'token_str': 'Tuomioja'},
 {'score': 0.04348713159561157,
  'sequence': '[CLS] Soini sanoi. [SEP]',
  'token': 8574,
  'token_str': 'Soini'},
 {'score': 0.04261799901723862,
  'sequence': '[CLS] hän sanoi. [SEP]',
  'token': 361,
  'token_str': 'hän'},
 {'score': 0.04241926968097687,
  'sequence': '[CLS] Niinistö sanoi. [SEP]',
  'token': 5975,
  'token_str': 'Niinistö'},
 {'score': 0.042333755642175674,
  'sequence': '[CLS] Hän sanoi. [SEP]',
  'token': 737,
  'token_str': 'Hän'},
 {'score': 0.038560837507247925,
  'sequence': '[CLS] Vanhanen sanoi. [SEP]',
  'token': 9499,
  'token_str': 'Vanhanen'},
 {'score': 0.03149525821208954,
  'sequence': '[CLS] Stubb sanoi. [SEP]',
  'token': 12173,
  'token_str': 'Stubb'},
 {'score': 0.03149261325597763,
  'sequence': '[CLS] Lipponen sanoi. [SEP]',
  'token': 10367,
  'token_str': 'Lipponen'},
 {'score': 0.025139369070529938,
  'sequence': '[CLS] Sipilä sanoi. [SEP]',
  'token': 9517,
  'token_str': 'Sipilä'},
 {'score': 0.01755082793533802,
  'sequence': '[CLS] Katainen sanoi. [SEP]',
  'token': 11015,
  'token_str': 'Katainen'}]

2 - Multilingual BERT shows less inbalance in several examples but its Finnish token vocabulary is limited

Finnish BERT model - FinBERT v1

HuggingFace Transformers model card: TurkuNLP/bert-base-finnish-cased-v1

Publication

TurkuNLP FinBERT page

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
media		media
.gitignore		.gitignore
Gender_Bias_FinBERT_showcase.ipynb		Gender_Bias_FinBERT_showcase.ipynb
LICENSE		LICENSE
README.rst		README.rst
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

media

media

.gitignore

.gitignore

Gender_Bias_FinBERT_showcase.ipynb

Gender_Bias_FinBERT_showcase.ipynb

LICENSE

LICENSE

README.rst

README.rst

requirements.txt

requirements.txt

utils.py

utils.py

Repository files navigation

Possible Gender Bias in Finnish BERT?

Google Colab Online Demo

Locally

Remarks

Finnish BERT model - FinBERT v1

About

Languages

License

ogencoglu/Gender_Bias_FinBERT

Folders and files

Latest commit

History

Repository files navigation

Possible Gender Bias in Finnish BERT?

Google Colab Online Demo

Locally

Remarks

Finnish BERT model - FinBERT v1

About

Topics

Resources

License

Stars

Watchers

Forks

Languages