Bayesian Echo Chamber
A new Bayesian generative model for social interaction data, for uncovering influence relations from time-stamped conversation data.
Please refer to
Fangjian Guo, Charles Blundell, Hanna Wallach and Katherine A. Heller. The Bayesian Echo Chamber: modeling social influence via linguistic accommodation. AISTATS 2015, San Diego, CA, USA. JMLR: W&CP volume 38.
for details of the model.
├── data a collection of datasets ├── results results are produced here │ ├── 12-angry-men-analytics.Rmd generating report from result │ └── Makefile compile an html report from R markdown ├── src │ ├── bec.py main "Bayesian echo chamber" class │ ├── bec_sampler.py a wrapper of the sampler of bec │ ├── hawkes.py an implementation of Hawkes process │ ├── likelihoods.py several likelihoods │ ├── run_bec_12angrymen.py a demo script producing result for data/12-angry-men │ ├── slice_sampler.py slice sampler │ ├── talkbankXMLparse.py parser for talkbank xml format └── stopwords └── english.stop list of stop words in English
python run_bec_12angrymen.pywould produce samples and other auxiliary files under
results/12-angry-men/. One could customize scripts based on
run_bec_12angrymen.pyfor other datasets and configurations.
results/could produce an html report compiled from R Markdown file
12-angry-men-analytics.Rmd. One could customize the
Rmdfile for analyzing other datasets.
- Python modules (tested under Python 2.7)
- numpy, scipy
- nltk for word stemming in
- R libraries for generating report
The conversation data is read from the TalkBank xml format. A conversation consists of several utterances, with each utterance described with the following entities: speaker, content, start time and end time, which looks like the snippet below.
<u who="Juror 7" uID="#7"> <w>So</w> <w>how</w> <w>come</w> <w>you</w> <w>vote</w> <w>not</w> <w>guilty</w> <media start="47.4640" end="49.3820" unit="s"/> </u>
Currently, we have prepared the following datasets under
- 12 Angry Men: transcribed from the 1957 movie subtitle.
- SCOTUS: oral arguments from 50 years of the United States Supreme Court, obtained from TalkBank.
- synthetic: a synthetic example with 3 agents speaking with a vocabulary of 20, with time stamps generated from a Hawkes process and contents generated from the BEC model.