Skip to content

Code and data accompanying our ACL 2020 paper, "Unsupervised Domain Clusters in Pretrained Language Models".

Notifications You must be signed in to change notification settings

roeeaharoni/unsupervised-domain-clusters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

unsupervised-domain-clusters

This repository contains code and data accompanying our ACL 2020 paper, "Unsupervised Domain Clusters in Pretrained Language Models".

data

The multi-domain German-English parallel data we used in the paper is available here (626MB). It is a new data split we created that avoids duplicate examples and leakage from the train split to the dev/test splits. The original multi-domain data first appeared in Koehn and Knowles (2017) and consists of five datasets available in the Opus website.

code

Available in a notebook in the src directory. Please contact me in roee.aharoni@gmail.com for any questions.

bibtex

If you find this useful for your work, please use the following citation:

@inproceedings{aharoni2020unsupervised,
  title={Unsupervised domain clusters in pretrained language models},
  author={Aharoni, Roee and Goldberg, Yoav},
  booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  year={2020},
  url={https://arxiv.org/abs/2004.02105},
  publisher = "Association for Computational Linguistics"
}

About

Code and data accompanying our ACL 2020 paper, "Unsupervised Domain Clusters in Pretrained Language Models".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published