GitHub - npow/ubuntu-corpus: Ubuntu Dialog Corpus

This repository contains the source code to extract the dialogs used in the following paper:

The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems arXiv:1506.08909.

psql -d template1
> create database ubuntu;

# ln -s /path/to/ubuntu/corpus data
# node createTable.js
# pypy main.py

This produces a file ubuntu.sql

# psql -d ubuntu
> copy messages from '/tmp/ubuntu.sql';

# node createTable.js index

# node extractDialogs.js nicks.txt

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
createTable.js		createTable.js
extractDialogs.js		extractDialogs.js
irclogparser.py		irclogparser.py
main.py		main.py

Provide feedback