GitHub - richardkwo/bayesian-echo-chamber: Bayesian generative model for conversation data

Bayesian Echo Chamber

A new Bayesian generative model for social interaction data, for uncovering influence relations from time-stamped conversation data.

Please refer to

Fangjian Guo, Charles Blundell, Hanna Wallach and Katherine A. Heller. The Bayesian Echo Chamber: modeling social influence via linguistic accommodation. AISTATS 2015, San Diego, CA, USA. JMLR: W&CP volume 38.

for details of the model.

Files

├── data                              a collection of datasets
├── results                           results are produced here
│   ├── 12-angry-men-analytics.Rmd    generating report from result
│   └── Makefile                      compile an html report from R markdown
├── src
│   ├── bec.py                        main "Bayesian echo chamber" class
│   ├── bec_sampler.py                a wrapper of the sampler of bec
│   ├── hawkes.py                     an implementation of Hawkes process
│   ├── likelihoods.py                several likelihoods
│   ├── run_bec_12angrymen.py         a demo script producing result for data/12-angry-men
│   ├── slice_sampler.py              slice sampler
│   ├── talkbankXMLparse.py           parser for talkbank xml format
└── stopwords
    └── english.stop                  list of stop words in English

Usage

Run python run_bec_12angrymen.py would produce samples and other auxiliary files under results/12-angry-men/. One could customize scripts based on run_bec_12angrymen.py for other datasets and configurations.
Run make under results/ could produce an html report compiled from R Markdown file 12-angry-men-analytics.Rmd. One could customize the Rmd file for analyzing other datasets.

Dependencies

Python modules (tested under Python 2.7)
- numpy, scipy
- matplotlib
- nltk for word stemming in talkbankXMLparse.py
R libraries for generating report
- knitr
- ggplot2
- coda
- plyr
- qgraph
- pander

Datasets

The conversation data is read from the TalkBank xml format. A conversation consists of several utterances, with each utterance described with the following entities: speaker, content, start time and end time, which looks like the snippet below.

<u who="Juror 7" uID="#7">
<w>So</w>
<w>how</w>
<w>come</w>
<w>you</w>
<w>vote</w>
<w>not</w>
<w>guilty</w>
<media start="47.4640" end="49.3820" unit="s"/>
</u>

Currently, we have prepared the following datasets under data/ directory.

12 Angry Men: transcribed from the 1957 movie subtitle.
SCOTUS: oral arguments from 50 years of the United States Supreme Court, obtained from TalkBank.
synthetic: a synthetic example with 3 agents speaking with a vocabulary of 20, with time stamps generated from a Hawkes process and contents generated from the BEC model.

Authors

This repo is maintained by Richard Guo. We also acknowledge the earlier contribution of Juston Moore to bec.py, likelihoods.py and slice_sampler.py.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
results		results
src		src
stopwords		stopwords
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bayesian Echo Chamber

Files

Usage

Dependencies

Datasets

Authors

About

Releases

Packages

Languages

License

richardkwo/bayesian-echo-chamber

Folders and files

Latest commit

History

Repository files navigation

Bayesian Echo Chamber

Files

Usage

Dependencies

Datasets

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages