Bayesian generative model for conversation data
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data first commit Sep 23, 2015
results sc Sep 23, 2015
src sc Oct 14, 2015
stopwords first commit Sep 23, 2015
LICENSE Initial commit Sep 23, 2015
README.md sc Sep 24, 2015

README.md

Bayesian Echo Chamber

A new Bayesian generative model for social interaction data, for uncovering influence relations from time-stamped conversation data.

Please refer to

Fangjian Guo, Charles Blundell, Hanna Wallach and Katherine A. Heller. The Bayesian Echo Chamber: modeling social influence via linguistic accommodation. AISTATS 2015, San Diego, CA, USA. JMLR: W&CP volume 38.

for details of the model.

Files

├── data                              a collection of datasets
├── results                           results are produced here
│   ├── 12-angry-men-analytics.Rmd    generating report from result
│   └── Makefile                      compile an html report from R markdown
├── src
│   ├── bec.py                        main "Bayesian echo chamber" class
│   ├── bec_sampler.py                a wrapper of the sampler of bec
│   ├── hawkes.py                     an implementation of Hawkes process
│   ├── likelihoods.py                several likelihoods
│   ├── run_bec_12angrymen.py         a demo script producing result for data/12-angry-men
│   ├── slice_sampler.py              slice sampler
│   ├── talkbankXMLparse.py           parser for talkbank xml format
└── stopwords
    └── english.stop                  list of stop words in English

Usage

  1. Run python run_bec_12angrymen.py would produce samples and other auxiliary files under results/12-angry-men/. One could customize scripts based on run_bec_12angrymen.py for other datasets and configurations.
  2. Run make under results/ could produce an html report compiled from R Markdown file 12-angry-men-analytics.Rmd. One could customize the Rmd file for analyzing other datasets.

Dependencies

  • Python modules (tested under Python 2.7)
    • numpy, scipy
    • matplotlib
    • nltk for word stemming in talkbankXMLparse.py
  • R libraries for generating report
    • knitr
    • ggplot2
    • coda
    • plyr
    • qgraph
    • pander

Datasets

The conversation data is read from the TalkBank xml format. A conversation consists of several utterances, with each utterance described with the following entities: speaker, content, start time and end time, which looks like the snippet below.

<u who="Juror 7" uID="#7">
<w>So</w>
<w>how</w>
<w>come</w>
<w>you</w>
<w>vote</w>
<w>not</w>
<w>guilty</w>
<media start="47.4640" end="49.3820" unit="s"/>
</u>

Currently, we have prepared the following datasets under data/ directory.

  1. 12 Angry Men: transcribed from the 1957 movie subtitle.
  2. SCOTUS: oral arguments from 50 years of the United States Supreme Court, obtained from TalkBank.
  3. synthetic: a synthetic example with 3 agents speaking with a vocabulary of 20, with time stamps generated from a Hawkes process and contents generated from the BEC model.

Authors

This repo is maintained by Richard Guo. We also acknowledge the earlier contribution of Juston Moore to bec.py, likelihoods.py and slice_sampler.py.