Skip to content

roeeaharoni/unsupervised-domain-clusters

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 

unsupervised-domain-clusters

This repository contains code and data accompanying our ACL 2020 paper, "Unsupervised Domain Clusters in Pretrained Language Models".

data

The multi-domain German-English parallel data we used in the paper is available here (626MB). It is a new data split we created that avoids duplicate examples and leakage from the train split to the dev/test splits. The original multi-domain data first appeared in Koehn and Knowles (2017) and consists of five datasets available in the Opus website.

code

Available in a notebook in the src directory. Please contact me in roee.aharoni@gmail.com for any questions.

bibtex

If you find this useful for your work, please use the following citation:

@inproceedings{aharoni2020unsupervised,
  title={Unsupervised domain clusters in pretrained language models},
  author={Aharoni, Roee and Goldberg, Yoav},
  booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  year={2020},
  url={https://arxiv.org/abs/2004.02105},
  publisher = "Association for Computational Linguistics"
}

About

Code and data accompanying our ACL 2020 paper, "Unsupervised Domain Clusters in Pretrained Language Models".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published