# Converting FOMC Dataset into the ConvoKit Format

This notebook helps constructing a Convokit-formatted version of the dataset originally distributed with the following paper:

[Talk it up or play it down? (Un)expected correlations between (de-)emphasis and recurrence of discussion points in consequential U.S. economic policy meetings. Chenhao Tan and Lillian Lee. Presented in Text As Data 2016](https://chenhaot.com/papers/de-emphasis-fomc.html).

Please cite this paper when using this corpus in your research.

**Main Contributors:** Johan Michalove, Joy Ming, Austen Mack-Crane

**Conversion Notebook Contributors:** Johan Michalove, Joy Ming, Austen Mack-Crane, Yash Chatha, Sean Zhang

**Original Dataset:** [FOMC](https://chenhaot.com/pages/de-emphasis-fomc.html)

## Installation and Setup

In [None]:
# For Colab
# try:
#     import convokit
# except ModuleNotFoundError:
#     !pip install convokit

In [None]:
import convokit
from tqdm import tqdm
from convokit import Corpus, Speaker, Utterance
from collections import defaultdict

## Downloading the Data

In [None]:
# For Colab

# from google.colab import drive
# drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
path_to_data = "/content/drive/My Drive/fomc_transcripts.jsonlist"
with open(path_to_data, "r", encoding='utf-8', errors='ignore') as f:
    fomc_transcripts = f.readlines()

In [None]:
import json
data = [json.loads(l) for l in fomc_transcripts]

268


In [None]:
print(data[0])



In [None]:
speaker_names = set()
# TODO: lowercase names, take out spaces

for meeting in data:
  speeches = meeting['speeches']
  for speech in speeches:
    speaker_names.add(speech['speaker'])

In [None]:
print(speaker_names)

{'M R . COLDWELL.', 'MR. SIMPSON.', 'MR.   MAYO.', 'MR. SANTOMERO.', 'VICE CHAIRMAN MCDONOUGH.', 'MR. SHEETS.', 'MR. JOHNSON.', 'MR. MOSKOW.', 'MS. KOLE.', 'MR. MELZER.', 'MS. MINEHAN AND OTHERS.', 'MR. BEEBE.', 'MS. BIES.', 'MR.   TRUMAN.', "MR. O'CONNELL.", 'MR. BRAYTON.', 'M R . BALLES.', 'MR.   SYRON.', 'MR. GARDNER.', 'GOVERNOR JOHNSON.', 'M R . PARDEE.', 'MR SHEETS.', 'MR. BLACK.', 'MS. YELLEN.', 'MR. ROOS.', 'MR. EVANS.', 'MR. PORTER.', 'MR.   LINDSEY.', 'MR. DAVIS.', 'MR. BOHNE.', 'MS.   GREENE.', 'MR. PARKINSON.', 'MR.   STERNLIGHT.', 'MR. PLOSSER.', 'MR. SLIFMAN.', 'MR.   BLACK.', 'MR.   ALTMANN.', 'MR. DOYLE.', 'CHAIRMAN VOLCKER.', 'MR. MCDONALD.', 'MS. PHILLIPS.', 'MR. ROBINSON.', 'MR. ALTMANN.', 'MR. AHMED.', 'MS. MOSSER.', 'MR. PARRY.', 'MR. KEEHN.', 'MR. HOENIG.', 'VICE CHAIRMAN SOLOMON ET AL.', 'MR. ALTMANN .', 'VICE CHAIRMAN VOLCKER.', 'MR. RUDEBUSCH.', 'MR. MAYO.', 'CHAIRMAN MILLER.', 'M R . AXILROD.', 'MR. MANNION.', 'MR. LEAHY.', 'MR.    RICE.', 'MR. CZERWINSKI.', '

## Creating Speakers

In [32]:

corpus_speakers = {speaker: Speaker(id = speaker, meta={'is_chair': "CHAIR" in speaker and "VICE" not in speaker, 'is_vice_chair': 'VICE' in speaker}) for speaker in speaker_names}

In [26]:
print(corpus_speakers)

{'M R . COLDWELL.': Speaker({'obj_type': 'speaker', 'meta': {'is_chair': False, 'is_vice_chair': False}, 'vectors': [], 'owner': None, 'id': 'M R . COLDWELL.'}), 'MR. SIMPSON.': Speaker({'obj_type': 'speaker', 'meta': {'is_chair': False, 'is_vice_chair': False}, 'vectors': [], 'owner': None, 'id': 'MR. SIMPSON.'}), 'MR.   MAYO.': Speaker({'obj_type': 'speaker', 'meta': {'is_chair': False, 'is_vice_chair': False}, 'vectors': [], 'owner': None, 'id': 'MR.   MAYO.'}), 'MR. SANTOMERO.': Speaker({'obj_type': 'speaker', 'meta': {'is_chair': False, 'is_vice_chair': False}, 'vectors': [], 'owner': None, 'id': 'MR. SANTOMERO.'}), 'VICE CHAIRMAN MCDONOUGH.': Speaker({'obj_type': 'speaker', 'meta': {'is_chair': False, 'is_vice_chair': True}, 'vectors': [], 'owner': None, 'id': 'VICE CHAIRMAN MCDONOUGH.'}), 'MR. SHEETS.': Speaker({'obj_type': 'speaker', 'meta': {'is_chair': False, 'is_vice_chair': False}, 'vectors': [], 'owner': None, 'id': 'MR. SHEETS.'}), 'MR. JOHNSON.': Speaker({'obj_type': 'sp

## Creating Utterance Objects

In [57]:
from pandas.core.computation.scope import Timestamp
utterance_corpus = {}

for meeting in data:
  speeches = meeting['speeches']
  for speech in speeches:
    idx = "{}_{}".format(meeting['date'], speech['speech_index'])
    speaker = corpus_speakers[speech['speaker']]
    if int(speech['speech_index']) > 1:
      reply_to = "{}_{}".format(meeting['date'], int(speech['speech_index'])-1)
    else:
      reply_to = None
    # Proxy for timestamp, currently not in use
    timestamp = int(meeting['date'])*100000 + int(speech['speech_index'])
    utterance_corpus[idx] = Utterance(id=idx, speaker=speaker, 
                                      text=speech['text'], 
                                      reply_to=reply_to,
                                      timestamp=timestamp,
                                      conversation_id=str(meeting['date']))
    
    # utterance_corpus[idx].add_meta('meeting_date', int(meeting['date']))
    utterance_corpus[idx].add_meta('speech_index', int(speech['speech_index']))

In [41]:
utterance_corpus['19770118_1']
utterance_corpus['19770118_2']

Utterance({'obj_type': 'utterance', 'meta': {'speech_index': 2}, 'vectors': [], 'speaker': Speaker({'obj_type': 'speaker', 'meta': {'is_chair': False, 'is_vice_chair': False}, 'vectors': [], 'owner': <convokit.model.corpus.Corpus object at 0x7fd2cede4e50>, 'id': 'MR. WALLICH.'}), 'conversation_id': '19770118', 'reply_to': '19770118_1', 'timestamp': None, 'text': "The purpose of the agreement--which exists in principle, and its main components remain to be finalized in a few details--is to reduce official sterling balances. These have been a disturbing element due to their volatility. The agreement provides that the BIS [Bank for International Settlements] will finance the Bank of England to the extent that these balances are reduced, except by bond funding and to the extent that the British reserves simultaneously go down. The details are to be worked out. If the BIS cannot fully carry through that financing, it has a fallback with respect to the participating central banks. Now, the B

## Creating Corpus from List of Utterances

In [59]:
utterance_list = utterance_corpus.values()
fomc_corpus = Corpus(utterances=utterance_list)

## Updating Conversation and Corpus-Level Metadata (skipped)

## Processing Utterance Texts

In [43]:
from convokit.text_processing import TextParser

In [64]:
parser = TextParser(verbosity=10000)
fomc_corpus = parser.transform(fomc_corpus)


10000/108504 utterances processed
20000/108504 utterances processed
30000/108504 utterances processed
40000/108504 utterances processed
50000/108504 utterances processed
60000/108504 utterances processed
70000/108504 utterances processed
80000/108504 utterances processed
90000/108504 utterances processed
100000/108504 utterances processed
108504/108504 utterances processed


In [None]:
fomc_corpus.get_utterance('19770118_1').retrieve_meta('parsed')

## Saving Created Datasets

In [65]:
fomc_corpus.dump('fomc-corpus', base_path='/content/drive/My Drive/')

## Run stats and check corpus contents

In [48]:
fomc_corpus.print_summary_stats()

Number of Speakers: 364
Number of Utterances: 108504
Number of Conversations: 268


In [49]:
corpus = fomc_corpus

In [50]:
corpus.conversations[next(iter(corpus.conversations))]


Conversation({'obj_type': 'conversation', 'meta': {}, 'vectors': [], 'tree': None, 'owner': <convokit.model.corpus.Corpus object at 0x7fd2cede4e50>, 'id': '19770118'})

In [51]:
corpus.utterances[next(iter(corpus.utterances))]


Utterance({'obj_type': 'utterance', 'meta': {'speech_index': 1, 'parsed': [{'rt': 6, 'toks': [{'tok': 'Gentlemen', 'tag': 'NNS', 'dep': 'npadvmod', 'up': 6, 'dn': []}, {'tok': ',', 'tag': ',', 'dep': 'punct', 'up': 6, 'dn': []}, {'tok': 'this', 'tag': 'DT', 'dep': 'det', 'up': 3, 'dn': []}, {'tok': 'meeting', 'tag': 'NN', 'dep': 'nsubj', 'up': 6, 'dn': [2]}, {'tok': 'will', 'tag': 'MD', 'dep': 'aux', 'up': 6, 'dn': []}, {'tok': 'now', 'tag': 'RB', 'dep': 'advmod', 'up': 6, 'dn': []}, {'tok': 'come', 'tag': 'VB', 'dep': 'ROOT', 'dn': [0, 1, 3, 4, 5, 8, 11]}, {'tok': 'to', 'tag': 'TO', 'dep': 'aux', 'up': 8, 'dn': []}, {'tok': 'order', 'tag': 'VB', 'dep': 'advcl', 'up': 6, 'dn': [7, 9, 10]}, {'tok': ',', 'tag': ',', 'dep': 'punct', 'up': 8, 'dn': []}, {'tok': 'please', 'tag': 'UH', 'dep': 'intj', 'up': 8, 'dn': []}, {'tok': '.', 'tag': '.', 'dep': 'punct', 'up': 6, 'dn': []}]}, {'rt': 1, 'toks': [{'tok': 'There', 'tag': 'EX', 'dep': 'expl', 'up': 1, 'dn': []}, {'tok': 'are', 'tag': 'VBP'

In [52]:
next(corpus.iter_speakers()).meta

{'is_chair': True, 'is_vice_chair': False}

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

In [61]:
fomc_corpus.get_conversation('19771018').get_chronological_utterance_list()[:10]

[Utterance({'obj_type': 'utterance', 'meta': {'speech_index': 1}, 'vectors': [], 'speaker': Speaker({'obj_type': 'speaker', 'meta': {'is_chair': True, 'is_vice_chair': False}, 'vectors': [], 'owner': <convokit.model.corpus.Corpus object at 0x7fd17532f850>, 'id': 'CHAIRMAN BURNS.'}), 'conversation_id': '19771018', 'reply_to': None, 'timestamp': 1977101800001, 'text': "Mr. Gardner is absent today. As far as possible, I would like to have the full Federal Reserve family present at a meeting where quasi-final decisions with regard to monetary policy are made. And in view of that, we can get through as much business as we can this afternoon and stop short of trying to reach any decision of monetary policy; and we have a great deal of work to do. We'll start as we always do, with the minutes of the last meeting, and I take it there is no problem.", 'owner': <convokit.model.corpus.Corpus object at 0x7fd17532f850>, 'id': '19771018_1'}),
 Utterance({'obj_type': 'utterance', 'meta': {'speech_ind

In [60]:
fomc_corpus.get_conversation('19771018').get_chronological_speaker_list()[:10]


[Speaker({'obj_type': 'speaker', 'meta': {'is_chair': True, 'is_vice_chair': False}, 'vectors': [], 'owner': <convokit.model.corpus.Corpus object at 0x7fd17532f850>, 'id': 'CHAIRMAN BURNS.'}),
 Speaker({'obj_type': 'speaker', 'meta': {'is_chair': False, 'is_vice_chair': False}, 'vectors': [], 'owner': <convokit.model.corpus.Corpus object at 0x7fd17532f850>, 'id': 'MR. COLDWELL.'}),
 Speaker({'obj_type': 'speaker', 'meta': {'is_chair': False, 'is_vice_chair': False}, 'vectors': [], 'owner': <convokit.model.corpus.Corpus object at 0x7fd17532f850>, 'id': 'MR. MAYO.'}),
 Speaker({'obj_type': 'speaker', 'meta': {'is_chair': True, 'is_vice_chair': False}, 'vectors': [], 'owner': <convokit.model.corpus.Corpus object at 0x7fd17532f850>, 'id': 'CHAIRMAN BURNS.'}),
 Speaker({'obj_type': 'speaker', 'meta': {'is_chair': False, 'is_vice_chair': False}, 'vectors': [], 'owner': <convokit.model.corpus.Corpus object at 0x7fd17532f850>, 'id': 'MR. KICHLINE.'}),
 Speaker({'obj_type': 'speaker', 'meta': {