Skip to content

Contains data and code of paper on 'break' and large language models

Notifications You must be signed in to change notification settings

epetsen/break-llms

Repository files navigation

break-llms

Reference

Code for

Petersen, Erika and Christopher Potts. 2022. Lexical Semantics with Large Language Models: A Case Study of English break. Ms., Stanford University.

Overview

  1. annotated_break_data.csv: the annotated dataset

  2. annotated_dataset_study.ipynb gets basic stats and tables for the annotated dataset

  3. static.ipynb: static representations for break in various versions of word2vec, GloVe, and fastText.

  4. get_all_reps.ipynb: gets all the break representations for all the models we consider. These representations are required for the notebooks probing.ipynb and visualizations.ipynb.

  5. probing.ipynb: probing experiment code.

  6. visualizations.ipynb: t-SNE-based visualizations of the break representations.

  7. wordnet.ipynb: basic analysis of the WordNet hypernym graph for break.

  8. break_utils.py: helper code for many of the notebooks.

  9. fig: directory containing visualizations included in the paper (output from visualizations.ipynb and wordnet.ipynb).

  10. reps: directory in which representations are stored when get_all_reps.ipynb is run.

  11. results: probing results files for the probes reported in the paper.

About

Contains data and code of paper on 'break' and large language models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published