Repository for probing research

Our pipeline

Get training data.
'Spoiled' data, so that adjective's gender is broken.
Trained two Russian BERT models.
Marked data from rusenteval with stanza either per sent or per word.
Conducted 3 types of probing experiments (by CLS token, by mean sentence embedding and per token), largely relying on NeuroX.
Compared the results.

There are src files for probing and an example notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
html_visualisations		html_visualisations
rusenteval		rusenteval
src		src
visualisations		visualisations
README.md		README.md
example_probing.ipynb		example_probing.ipynb
illiterate_text_deeppavlov.ipynb		illiterate_text_deeppavlov.ipynb
rubert_training.ipynb		rubert_training.ipynb
statistics&confusion_matrix.ipynb		statistics&confusion_matrix.ipynb