# Question Typology module

This notebook demonstrates using the QuestionTypology module, implemented from the paper [Asking Too Much? The Rhetorical Role of Questions in Political Discourse](http://www.cs.cornell.edu/~cristian/Asking_too_much.html).

Note that QuestionTypology can be broken down into two parts:
* Feature extraction (i.e., extracting question motifs, section 4 of the above-referenced paper)
* Constructing latent question representations and clustering into types (section 5)
At present, the code is run end-to-end; a future release of convokit will decouple these two steps for greater flexibility. 

In [1]:
import os
import pkg_resources
import numpy as np
from convokit import Corpus, QuestionTypology, download

First, we load the corpus. As in the paper, we will run the module on the parliamentary questions dataset, which can either be downloaded or read from an existing folder:

In [48]:
# corpus = Corpus(filename='name-of-local-folder')

(replace the below cell with the above commented-out cell to load from an existing folder)

In [40]:
corpus = Corpus(filename=download("parliament-corpus"))

In [41]:
len(corpus.conversations)

216894

We now train a QuestionTypology object on the parliament corpus (note that this may take a while). For now, as in the paper, we choose to extract 8 clusters, or types of questions. (Note that this may take a while)

In [42]:
questionTypology = QuestionTypology(num_dims=25, 
                                    num_clusters=8, 
                                    verbose=10000, 
                                    random_seed=125)

In [43]:
corpus = questionTypology.fit_transform(corpus)

running motif extraction pipeline
loading spacy vocab
loading spacy vocab
getting question arcs
	10000
	20000
	30000
	40000
	50000
	60000
	70000
	80000
	90000
	100000
	110000
	120000
	130000
	140000
	150000
	160000
	170000
	180000
	190000
	200000
making motif tree
	counting itemsets
	first pass
	10000
	20000
	30000
	40000
	50000
	60000
	70000
	80000
	90000
	100000
	110000
	120000
	130000
	140000
	150000
	160000
	170000
	180000
	190000
	200000
	210000
	220000
	230000
	240000
	250000
	260000
	270000
	280000
	290000
	300000
	310000
	320000
	and then the rest
	 6 60984
	10000
	20000
	30000
	40000
	50000
	60000
	 7 16116
	10000
	 8 2311
	 9 136
	 10 26
	 11 5
	writing itemsets
	building tree
fitting motifs to questions
	fitting arcsets
	10000
	20000
	30000
	40000
	50000
	60000
	70000
	80000
	90000
	100000
	110000
	120000
	130000
	140000
	150000
	160000
	170000
	180000
	190000
	200000
	210000
	220000
	230000
	240000
	250000
	260000
	270000
	280000
	290000
	300000
	310000
	320000
handling red

	1030000
	1040000
	1050000
	1060000
	1070000
	1080000
	1090000
	1100000
	1110000
	1120000
	1130000
	1140000
	1150000
	1160000
	1170000
	1180000
	1190000
	1200000
	1210000
	1220000
	1230000
	1240000
	1250000
	1260000
	1270000
	1280000
	1290000
	1300000
	1310000
	1320000
	1330000
	1340000
	1350000
	1360000
	1370000
	1380000
	1390000
	1400000
	1410000
	1420000
	1430000
	1440000
	1450000
	1460000
	1470000
	1480000
	1490000
	1500000
	1510000
	1520000
	1530000
	1540000
	1550000
	1560000
	1570000
	1580000
	1590000
	1600000
	1610000
	1620000
	1630000
	1640000
	1650000
	1660000
	1670000
	1680000
	1690000
	1700000
	1710000
	1720000
	1730000
	1740000
	1750000
	1760000
	1770000
	1780000
	1790000
	1800000
	1810000
	1820000
	1830000
	1840000
	1850000
	1860000
	1870000
	1880000
	1890000
	1900000
	1910000
	1920000
	1930000
	1940000
	1950000
	1960000
	1970000
	1980000
	1990000
	2000000
	2010000
	2020000
	2030000
	2040000
	2050000
	2060000
	2070000
	2080000
	2090000
	2100000
	2110000
	2120000
	2130000
	

Once training is complete, we can inspect the output of the typology -- which question motifs, answer fragments, and questions are assigned to each type?

In [44]:
questionTypology.display_totals()

Total Motifs: 2478
Total Questions: 199079
Total Fragments: 2860
Number of Motifs in each cluster:  [281, 389, 324, 258, 246, 283, 251, 446]
Number of Questions of each type:  [14816, 38999, 39499, 14700, 15591, 21584, 28893, 24997]


In [45]:
print('10 examples for types 1-8:')
for i in range(8):
    questionTypology.display_motifs_for_type(i, num_egs=10)
    questionTypology.display_answer_fragments_for_type(i, num_egs=10)
    questionTypology.display_question_answer_pairs_for_type(corpus,i, num_egs=10)

10 examples for types 1-8:
	10 sample question motifs for type 0 (281 total motifs):
		1. ('be_*', 'be_will')
		2. ('be_*', 'be_will', 'will>*')
		3. ('be_*', 'will>*')
		4. ('expect_*',)
		5. ('give_*',)
		6. ('say_*', 'say_be')
		7. ('tell_*', 'tell_be')
		8. ('prepared_*',)
		9. ('expect_*', 'when>*')
		10. ('be_*', 'be_what')
	10 sample answer fragments for type 0 (360 total fragments) :
		1. expect_do
		2. be>*
		3. envisage_*
		4. asked_has
		5. have_wait
		6. expect_*
		7. examining_is
		8. apply_will
		9. until>*
		10. be_subject
	10 sample question-answer pairs that were assigned type 0 (14959 total questions with this type) :
		Question 1. I compliment my hon Friend on the pressure that she has exerted to ensure that the M25 is completed , but can she say whether any studies have been made on the introduction of flexibility and extra lanes , if required , as the traffic build - up , especially through my constituency , already suggests that that motorway will be heavily over 

	10 sample question-answer pairs that were assigned type 1 (44262 total questions with this type) :
		Question 1. As a significant proportion of all those who are convicted of terrorism are subsequently released only to commit fresh terrorist offences . , is it not common sense , as well as a means of discouraging violence in Northern Ireland , for them to be kept in prison for the duration of the emergency ?
		Answer 1. I see no justification for saying that people should be sentenced and that their release should be dependent on the general security position in the Province or on a political decision of this House rather than be linked to the offence that they have committed .
		Question 2. Notwithstanding his answer to the hon Member for Linlithgow ( Mr. Dalyell ) , does the Minister accept that a massive amount of incapacity benefit is not taken up ? Would he accept that there is therefore no reason to go ahead with the planned , but much opposed , cuts in the benefit ?
		Answer 2.

	10 sample question-answer pairs that were assigned type 2 (39581 total questions with this type) :
		Question 1. In 2007 funds were awarded under capital expenditure grants—the Bellwin formula—to Hull and Gloucestershire . Will similar moneys be awarded to repair bridges and roads that were severely damaged in the September floods in North Yorkshire ?
		Answer 1. My right hon Friend the Secretary of State for Environment , Food and Rural Affairs made a statement dealing with the Bellwin formula and some of the flooding . I will look at the suggestion my hon Friend has made .
		Question 2. My right hon and learned Friend will be aware that I am not a member of such an organisation and never have been . Will he also recognise that the important thing about structures of any type is that they must be sound , they must be based on what can be delivered and they must last ? In the present circumstances , it looks unlikely that any dates will ever be met .
		Answer 2. I share my hon Friend 

	10 sample question-answer pairs that were assigned type 3 (14795 total questions with this type) :
		Question 1. What actual progress has been made with the top three projects recommended by the northern electrification taskforce , which was chaired by the Under - Secretary of State for Transport , the hon Member for Harrogate and Knaresborough ( Andrew Jones ) ?
		Answer 1. The report was a cross - party report from the taskforce , which was chaired by my hon Friend . Much has obviously been learned about electrification since then , but the report forms part of the foundation for deciding how we will move forward with further electrification and how we will prioritise those particular schemes .
		Question 2. Given the state of the defence budget , the fact that we are fighting a war and the possible danger of duplication by investing sums of money in European alternatives to NATO defence structures , what possible justification can there be for spending any significant sums at all o

	10 sample question-answer pairs that were assigned type 5 (21017 total questions with this type) :
		Question 1. May I compliment the Government on their firm stand over the MFA negotiations ? Will my hon Friend ensure that pressure is kept up throughout the GATT negotiations , and bear in mind how important the textile industry is both to Scotland and to those employed in it ?
		Answer 1. I am grateful to my hon Friend for his kind remarks . Indeed , I know that he recently met the Minister for Trade . We are all aware of the textile industry 's importance to the Scottish economy .
		Question 2. May I support the Minister 's efforts to reduce delayed discharges as a way of reducing cancellations ? I draw her attention to the single most effective measure that is being taken in Milton Keynes to deal with delayed discharges , which is a single social care assessment protocol agreed by social services and the NHS. May I urge her to try to make sure that a national social care assessment

	10 sample question-answer pairs that were assigned type 6 (26246 total questions with this type) :
		Question 1. I thank the Minister for that reply . Does she agree with her colleague the Minister for Children and Families that all parents should have the right to request flexible working ? Indeed , given that there are so many reasons other than caring responsibilities for people wanting to manage their work - life balance differently , does she agree that there would be benefits for all of society if the right were extended to everyone ?
		Answer 1. My right hon Friend the Minister for Children and Families was giving her personal view on how we might build on successful policies to try to make work much more flexible for millions of people . For example , the right to request flexible working has led to 47 per cent . of new mothers working flexi - time , compared with just 17 per cent . in 2002—a massive change . From this April , as the hon Lady is , I am sure , well aware , we a

	10 sample question-answer pairs that were assigned type 7 (26765 total questions with this type) :
		Question 1. One of my constituents who works 16 hours a week and is a carer for a disabled relative has discovered that because of the living wage she no longer qualifies for carer ’s allowance , leaving her with a substantial shortfall . Why on earth have this Government forced her and thousands of others into this desperate situation ?
		Answer 1. We as a Government spend £ 2.3 billion a year in supporting the invaluable work that carers do in this country . The impact of the national living wage will always be reviewed .
		Question 2. Increasingly in London , young people are finding it impossible to afford to rent or buy a home , so why , under this Government , are we seeing the lowest number of housing starts since the 1920s and a housing bubble driven by wealthy overseas buyers ?
		Answer 2. On the last point , it is this Government who are introducing capital gains tax for over

We see that each individual utterance is now annotated with a type assignment and distances to the centroids of the corresponding clusters for each type:

In [67]:
utterance = corpus.utterances['2007-02-22c.404.6']
print(utterance.text)
print(utterance.meta['qtype'])
print(utterance.meta['qtype_dists'])

I thank the Minister for that reply . Does she agree with her colleague the Minister for Children and Families that all parents should have the right to request flexible working ? Indeed , given that there are so many reasons other than caring responsibilities for people wanting to manage their work - life balance differently , does she agree that there would be benefits for all of society if the right were extended to everyone ?
6
[1.26085632 0.86030356 1.03717211 1.24421135 1.30523081 1.05682812
 0.33531053 1.23572878]


By uncommenting the following cell, we can write these utterances, now annotated with the question type features, to disk.

In [69]:
# corpus.dump('name-of-local-folder')