epetsen / break-llms Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

Contains data and code of paper on 'break' and large language models

0 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
fig		fig
reps		reps
results		results
.gitignore		.gitignore
README.md		README.md
annotated_break_data.csv		annotated_break_data.csv
annotated_dataset_study.ipynb		annotated_dataset_study.ipynb
break_utils.py		break_utils.py
get_all_reps.ipynb		get_all_reps.ipynb
probing.ipynb		probing.ipynb
requirements.txt		requirements.txt
static.ipynb		static.ipynb
visualizations.ipynb		visualizations.ipynb
wordnet.ipynb		wordnet.ipynb

Repository files navigation

break-llms

Reference

Code for

Petersen, Erika and Christopher Potts. 2022. Lexical Semantics with Large Language Models: A Case Study of English break. Ms., Stanford University.

Overview

annotated_break_data.csv: the annotated dataset
annotated_dataset_study.ipynb gets basic stats and tables for the annotated dataset
static.ipynb: static representations for break in various versions of word2vec, GloVe, and fastText.
get_all_reps.ipynb: gets all the break representations for all the models we consider. These representations are required for the notebooks probing.ipynb and visualizations.ipynb.
probing.ipynb: probing experiment code.
visualizations.ipynb: t-SNE-based visualizations of the break representations.
wordnet.ipynb: basic analysis of the WordNet hypernym graph for break.
break_utils.py: helper code for many of the notebooks.
fig: directory containing visualizations included in the paper (output from visualizations.ipynb and wordnet.ipynb).
reps: directory in which representations are stored when get_all_reps.ipynb is run.
results: probing results files for the probes reported in the paper.

About

Contains data and code of paper on 'break' and large language models

Report repository

Releases

No releases published

Packages

No packages published

Contributors 2

Languages