Skip to content

Commit

Permalink
Cornell movies dialog dataset (#1319)
Browse files Browse the repository at this point in the history
* cornell movie dialogs
  • Loading branch information
shahules786 committed Feb 10, 2023
1 parent dfeb969 commit 6dc8848
Show file tree
Hide file tree
Showing 2 changed files with 691 additions and 0 deletions.
42 changes: 42 additions & 0 deletions notebooks/data-augmentation/movie-dialogs/README.md
@@ -0,0 +1,42 @@
## Dataset Summary

The dataset was created using
[Cornell Movies Dialog Corpus](https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html)
which contains a large metadata-rich collection of fictional conversations
extracted from raw movie scripts. Dialogs and meta-data from the underlying
Corpus were used to design a dataset that can be used to InstructGPT based
models to learn movie scripts.

Example :

```
User: Assume RICK and ALICE are characters from a fantasy-horror movie, continue the conversation between them
RICK: I heard you screaming. Was it a bad one?
ALICE: It was bad.
RICK: Doesn't the dream master work for you anymore?
Assistant: Sure
ALICE: I can't find him.
RICK: Hey, since when do you play Thomas Edison? This looks like Sheila's.
ALICE: It is...was. It's a zapper, it might help me stay awake.
RICK: Yeah, or turn you into toast.
```

## Usage

```python

from datasets import load_dataset
dataset = load_dataset("shahules786/OA-cornell-movies-dialog")
```

## Citations

```
@InProceedings{Danescu-Niculescu-Mizil+Lee:11a,
author={Cristian Danescu-Niculescu-Mizil and Lillian Lee},
title={Chameleons in imagined conversations:
A new approach to understanding coordination of linguistic style in dialogs.},
booktitle={Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, ACL 2011},
year={2011}
}
```

0 comments on commit 6dc8848

Please sign in to comment.