Skip to content

An R package for estimating and doing statistical inference on context-specific word embeddings.

Notifications You must be signed in to change notification settings

prodriguezsosa/conText

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo-conText

About

conText provides a fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021).

How to Install

install.packages("conText")

Datasets

To use conText you will need three objects:

  1. A (quanteda) corpus with the documents and corresponding document variables you want to evaluate.
  2. A set of (GloVe) pre-trained embeddings.
  3. A transformation matrix specific to the pre-trained embeddings.

conText includes sample objects for all three but keep in mind these are just meant to illustrate function implementations. In this Dropbox folder we have included the raw versions of these objects including the full Stanford GloVe 300-dimensional embeddings (labeled glove.rds) and its corresponding transformation matrix estimated by Khodak et al. (2018) (labeled khodakA.rds).

Quick Start Guides

Check out this Quick Start Guide to get going with conText (last updated: 08/04/2023).

Latest Updates

We are hugely thankful to Will Hobbs and Breanna Green for bringing to our attention clear examples where finite sample bias was larger than we had anticipated when implementing our main estimation routine, conText. We are actively collaborating with them to evaluate alternative fixes. In the meantime we've implemented and recommend using Jackknife debiasing. Please refer to the Finite Sample Bias vignette for additional information on the issue and simulation results using various debiasing methods.

Multilanguage Resources

For those working in languages other than English, we have a set of data and code resources here: https://alcembeddings.org/

About

An R package for estimating and doing statistical inference on context-specific word embeddings.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages