# Linguistic Style in Political Discourse: A Quantitative Analysis of Part-of-Speech and Formality Distributions in Joe Biden's 2020 Speeches

This study explores the interplay between part-of-speech (POS) tags and formality scores in differentiating Joe Biden's formal and informal speeches. The analysis focuses on Heylighen and Dewaele's (1999) F-score to examine Biden's 2020 Democratic Convention Speech (formal) and Thanksgiving Day Speech (informal). The formality is measured by analyzing word type frequencies from different genres, aligning with the expectation that "the more formal the language excerpt, the higher the value of F" (Heylighen & Dewaele, 1999, p. 13). The analysis aims to contribute to understanding the role of language in shaping formality and how language is used in different contexts. The analysis reveals that Biden's Democratic Convention speech exhibited a higher F-score than his Thanksgiving speech. The difference suggests that Biden adapts his language style to suit each occasion's context or formality.

### Joe Biden's Formal Speech - Democratic Convention Speech

The dataset's source: https://www.kaggle.com/datasets/christianlillelund/2020-democratic-convention-speeches

In [1]:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# Read the data

dataIn = open('Joe_Biden.txt', 'r', encoding='utf-8')
Joe_Biden_Speech = dataIn.read()
dataIn.close()

In [3]:
# Process our string
doc = nlp(Joe_Biden_Speech)

In [4]:
# Let's tokenize the formal speech
Joe_Biden_Speech_TextTokens = [token.text for token in doc]

In [5]:
# Here we have a list of tokens that are alphabetic
Joe_Biden_Speech_TextTokensAlpha = [token.text for token in doc if token.text.isalpha()]

# Here we have a list of PoSes of the tokens that are alphabetic 
# We use token.text.isalpha(), NOT token.pos_.isalpha() to filter
Joe_Biden_Speech_TextPoSesAlpha_Tags = [token.pos_ for token in doc if token.text.isalpha()]

In [6]:
# Let's get the list of all the Part of Speeches and the frequencies
# Note that FreqDict = Frequency of Dictionary
# Note that PoS = Part of Speech

from collections import Counter

Joe_Biden_Speech_FreqDictPoSAlpha = Counter(Joe_Biden_Speech_TextPoSesAlpha_Tags)
print(Joe_Biden_Speech_FreqDictPoSAlpha)

Counter({'NOUN': 615, 'PRON': 465, 'VERB': 402, 'ADP': 348, 'DET': 290, 'ADJ': 238, 'AUX': 181, 'ADV': 170, 'CCONJ': 143, 'PROPN': 111, 'PART': 83, 'SCONJ': 52, 'NUM': 29, 'INTJ': 4})


In [7]:
# Let's use pprint to format the output nicely

import pprint

Joe_Biden_Speech_FreqDictPoSAlpha = Counter(Joe_Biden_Speech_TextPoSesAlpha_Tags)
pprint.pprint(Joe_Biden_Speech_FreqDictPoSAlpha)

Counter({'NOUN': 615,
         'PRON': 465,
         'VERB': 402,
         'ADP': 348,
         'DET': 290,
         'ADJ': 238,
         'AUX': 181,
         'ADV': 170,
         'CCONJ': 143,
         'PROPN': 111,
         'PART': 83,
         'SCONJ': 52,
         'NUM': 29,
         'INTJ': 4})


From the above analysis, we found the frequencies of different type of words are as follows:  

NOUN: 615,  
PRON: 465,  
VERB: 402,  
ADP: 348,  
DET: 290,  
ADJ: 238,  
AUX: 181,  
ADV: 170,  
CCONJ: 143,  
PROPN: 111,  
PART: 83,  
SCONJ: 52,  
NUM: 29,  
INTJ: 4

The PoS set is a carse-grained part-of-speech set from the [Universal POS tag set](https://universaldependencies.org/u/pos/).

    ADJ: adjective
    ADP: adposition
    ADV: adverb
    AUX: auxiliary
    CCONJ: coordinating conjunction
    DET: determiner
    INTJ: interjection
    NOUN: noun
    NUM: numeral
    PART: particle
    PRON: pronoun
    PROPN: proper noun
    PUNCT: punctuation
    SCONJ: subordinating conjunction
    SYM: symbol
    VERB: verb
    X: other



In [8]:
Joe_Biden_Speech_totalPoSCount = sum(Joe_Biden_Speech_FreqDictPoSAlpha.values())
print(Joe_Biden_Speech_totalPoSCount)

3131


Heylighen and Dewaele (1999, p.13) states:  
"In conclusion, the formal, non-deictic category of words, whose frequency is expected to increase with the formality of a text, includes the nouns, adjectives, prepositions and articles. The deictic category, whose frequency is expected to decrease with increasing formality of speech-styles, consists of the pronouns, verbs, adverbs, and interjections. The remaining category of conjunctions has no a priori correlation with formality. If we add up the frequencies of the formal categories, subtract the frequencies of the deictic categories and normalize to 100, we get a measure which will always increase with an increase of formality. This leads us to the following simple
formula:  

F = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq.
– verb freq. – adverb freq. – interjection freq. + 100)/2  

The frequencies are here expressed as percentages of the number of words belonging to a particular category with respect to the total number of words in the excerpt. F will then vary between 0 and 100% (but obviously never reach these limits). The more formal the language excerpt, the higher the value of F is expected to be."

In [9]:
# To measure the formality score, let's get the relative frequency of each of these 8 Part of Speeches

# noun: NOUN
RelFreqNounPerc = Joe_Biden_Speech_FreqDictPoSAlpha['NOUN']/Joe_Biden_Speech_totalPoSCount*100
print('Noun: ', RelFreqNounPerc)

# adjective: ADJ
RelFreqAdjPerc = Joe_Biden_Speech_FreqDictPoSAlpha['ADJ']/Joe_Biden_Speech_totalPoSCount*100
print('Adjective: ',RelFreqAdjPerc)

# preposition: ADP
RelFreqAdpPerc = Joe_Biden_Speech_FreqDictPoSAlpha['ADP']/Joe_Biden_Speech_totalPoSCount*100
print('Preposition: ',RelFreqAdpPerc)

# article: DET  
RelFreqDetPerc = Joe_Biden_Speech_FreqDictPoSAlpha['DET']/Joe_Biden_Speech_totalPoSCount*100
print('Article: ',RelFreqDetPerc)

# pronoun: PRON
RelFreqPronPerc = Joe_Biden_Speech_FreqDictPoSAlpha['PRON']/Joe_Biden_Speech_totalPoSCount*100
print('Pronoun: ', RelFreqPronPerc)

# verb: VERB
RelFreqVerbPerc = Joe_Biden_Speech_FreqDictPoSAlpha['VERB']/Joe_Biden_Speech_totalPoSCount*100
print('Verb: ',RelFreqVerbPerc)

# adverb: ADV
RelFreqAdvPerc = Joe_Biden_Speech_FreqDictPoSAlpha['ADV']/Joe_Biden_Speech_totalPoSCount*100
print('Adverb: ',RelFreqAdvPerc)

# interjection: INTJ
RelFreqIntjPerc = Joe_Biden_Speech_FreqDictPoSAlpha['INTJ']/Joe_Biden_Speech_totalPoSCount*100
print('Interjection: ',RelFreqIntjPerc)

Noun:  19.642286809326094
Adjective:  7.601405301820504
Preposition:  11.114659853082081
Article:  9.26221654423507
Pronoun:  14.85148514851485
Verb:  12.83934845097413
Adverb:  5.429575215586075
Interjection:  0.12775471095496646


In [10]:
# F = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2 
FormalityScore_DemocraticSpeech = (RelFreqNounPerc + RelFreqAdjPerc + RelFreqAdpPerc + RelFreqDetPerc - RelFreqPronPerc - RelFreqVerbPerc - RelFreqAdvPerc - RelFreqIntjPerc + 100)/2

print(FormalityScore_DemocraticSpeech)

57.18620249121686


### Joe Biden's Informal Speech - Thanksgiving Speech

The dataset's source: https://www.kaggle.com/datasets/vyombhatia/joe-bidens-speeches-of-this-week

In [11]:
# Read the data

dataIn = open('Joe_Biden_Thanksgiving2020.txt', 'r', encoding='utf-8')
Joe_Biden_Thanksgiving_Speech = dataIn.read()
dataIn.close()

In [12]:
# Process our string
document = nlp(Joe_Biden_Thanksgiving_Speech)

In [13]:
# Let's tokenize the informal speech.
Joe_Biden_Thanksgiving_Speech_TextTokens = [token.text for token in document]

In [14]:
# Here we have a list of tokens that are alphabetic
Joe_Biden_Thanksgiving_Speech_TextTokensAlpha = [token.text for token in document if token.text.isalpha()]

# Here we have a list of PoSes of the tokens that are alphabetic 
# We use token.text.isalpha(), NOT token.pos_.isalpha() to filter
Joe_Biden_Thanksgiving_Speech_TextPoSesAlpha_Tags = [token.pos_ for token in document if token.text.isalpha()]

In [15]:
# Let's get the list of all the Part of Speeches and the frequencies
# Note that FreqDict = Frequency of Dictionary
# Note that PoS = Part of Speech

from collections import Counter

Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha = Counter(Joe_Biden_Thanksgiving_Speech_TextPoSesAlpha_Tags)
print(Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha)

Counter({'NOUN': 410, 'VERB': 322, 'PRON': 318, 'ADP': 227, 'DET': 209, 'ADJ': 149, 'ADV': 132, 'AUX': 120, 'PROPN': 112, 'CCONJ': 89, 'PART': 83, 'SCONJ': 38, 'NUM': 15, 'INTJ': 2})


In [16]:
# Let's use pprint to format the output nicely

import pprint

Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha = Counter(Joe_Biden_Thanksgiving_Speech_TextPoSesAlpha_Tags)
pprint.pprint(Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha)

Counter({'NOUN': 410,
         'VERB': 322,
         'PRON': 318,
         'ADP': 227,
         'DET': 209,
         'ADJ': 149,
         'ADV': 132,
         'AUX': 120,
         'PROPN': 112,
         'CCONJ': 89,
         'PART': 83,
         'SCONJ': 38,
         'NUM': 15,
         'INTJ': 2})


From the above analysis, we found the frequencies of different type of words are as follows:  

NOUN: 410,  
VERB: 322,  
PRON: 318,  
ADP: 227,  
DET: 209,  
ADJ: 149,  
ADV: 132,  
AUX: 120,  
PROPN: 112,  
CCONJ: 89,  
PART: 83,  
SCONJ: 38,  
NUM: 15,  
INTJ: 2

In [17]:
Joe_Biden_Thanksgiving_Speech_totalPoSCount = sum(Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha.values())
print(Joe_Biden_Thanksgiving_Speech_totalPoSCount)

2226


In [18]:
# To measure the formality score, let's get the relative frequency of each of these 8 Part of Speeches

# noun: NOUN
RelFreqNounPerc_Thanksgiving = Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha['NOUN']/Joe_Biden_Thanksgiving_Speech_totalPoSCount*100
print('Noun: ', RelFreqNounPerc_Thanksgiving)

# adjective: ADJ
RelFreqAdjPerc_Thanksgiving = Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha['ADJ']/Joe_Biden_Thanksgiving_Speech_totalPoSCount*100
print('Adjective: ',RelFreqAdjPerc_Thanksgiving)

# preposition: ADP 
RelFreqAdpPerc_Thanksgiving = Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha['ADP']/Joe_Biden_Thanksgiving_Speech_totalPoSCount*100
print('Preposition: ',RelFreqAdpPerc_Thanksgiving)

# article: DET  
RelFreqDetPerc_Thanksgiving = Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha['DET']/Joe_Biden_Thanksgiving_Speech_totalPoSCount*100
print('Article: ',RelFreqDetPerc_Thanksgiving)

# pronoun: PRON
RelFreqPronPerc_Thanksgiving = Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha['PRON']/Joe_Biden_Thanksgiving_Speech_totalPoSCount*100
print('Pronoun: ', RelFreqPronPerc_Thanksgiving)

# verb: VERB
RelFreqVerbPerc_Thanksgiving = Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha['VERB']/Joe_Biden_Thanksgiving_Speech_totalPoSCount*100
print('Verb: ',RelFreqVerbPerc_Thanksgiving)

# adverb: ADV
RelFreqAdvPerc_Thanksgiving = Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha['ADV']/Joe_Biden_Thanksgiving_Speech_totalPoSCount*100
print('Adverb: ',RelFreqAdvPerc_Thanksgiving)

# interjection: INTJ
RelFreqIntjPerc_Thanksgiving = Joe_Biden_Thanksgiving_Speech_FreqDictPoSAlpha['INTJ']/Joe_Biden_Thanksgiving_Speech_totalPoSCount*100
print('Interjection: ',RelFreqIntjPerc_Thanksgiving)

Noun:  18.418688230008982
Adjective:  6.693620844564241
Preposition:  10.197663971248877
Article:  9.389038634321652
Pronoun:  14.285714285714285
Verb:  14.465408805031446
Adverb:  5.929919137466308
Interjection:  0.08984725965858043


In [19]:
# F = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2 
FormalityScore_ThanksgivingSpeech = (RelFreqNounPerc_Thanksgiving + RelFreqAdjPerc_Thanksgiving + RelFreqAdpPerc_Thanksgiving + RelFreqDetPerc_Thanksgiving - RelFreqPronPerc_Thanksgiving - RelFreqVerbPerc_Thanksgiving - RelFreqAdvPerc_Thanksgiving - RelFreqIntjPerc_Thanksgiving + 100)/2

print(FormalityScore_ThanksgivingSpeech)

54.96406109613656


### Conclusion

In [20]:
print('Democratic Convention Speech F-Score:', FormalityScore_DemocraticSpeech)
print('Thanksgiving Speech F-Score:', FormalityScore_ThanksgivingSpeech)

Democratic Convention Speech F-Score: 57.18620249121686
Thanksgiving Speech F-Score: 54.96406109613656


In [21]:
# Let's check which speech has higher Formality Score

if FormalityScore_DemocraticSpeech > FormalityScore_ThanksgivingSpeech:
    print("Joe Biden's Democratic Convention Speech has a higher Formality Score than his Thanksgiving Speech")
elif FormalityScore_ThanksgivingSpeech > FormalityScore_DemocraticSpeech:
    print("Joe Biden's Thanksgiving Speech has a higher Formality Score than his Democratic Convention Speech")

Joe Biden's Democratic Convention Speech has a higher Formality Score than his Thanksgiving Speech


This study examined the formality of Joe Biden's 2020 speeches using Heylighen and Dewaele's (1999) F-score and genre-specific word frequency analysis. As expected, Joe Biden's formal Democratic Convention speech exhibited a higher formality score than his informal Thanksgiving speech. The result aligns with expectations that Biden adapted his language style to suit the formality of each setting. These findings contribute to our understanding of Joe Biden's communication strategies by highlighting his linguistic choices to adapt his message to different contexts.