# Disclosure Tone

In this module, we will be learning how to compute disclosure tone using the Loughran and McDonald (2011) dictionary.

Before we begin, let's import the modules we'll be using in this tutorial.

In [None]:
import pandas as pd
import re
import requests
from nltk.tokenize import word_tokenize
from collections import Counter
from nltk.corpus import stopwords
from MyFunctions import html_to_text

#### Calculating Tone

To calculate disclosure tone, we first count the number of positive words (**# POS WORDS**) based on a positive word list and the number of negative words (**# NEG WORDS**) based on a negative word list. We can then compute positive and negative tone as:

$
\begin{align}
POS\ TONE = \frac{\#\ POS\ WORDS}{\#\ TOTAL\ WORDS}
\end{align}
$

$
\begin{align}
NEG\ TONE = \frac{\#\ NEG\ WORDS}{\#\ TOTAL\ WORDS}
\end{align}
$

Or, we can calculate a net tone as:

$
\begin{align}
NET\ TONE = \frac{\#\ POS\ WORDS - \#\ NEG\ WORDS}{\#\ POS\ WORDS + \#\ NEG\ WORDS}
\end{align}
$

#### Negative and Positive Sentiment Dictionaries
    
Tim Loughran and Bill McDonald, professors at Notre Dame, created dictionaries of positive and negative words and have posted their lists at https://sraf.nd.edu/textual-analysis/resources/.

Let's import these word lists into our program.

In [None]:
neg_words = pd.read_excel('LoughranMcDonald_SentimentWordLists_2018.xlsx', sheet_name='Negative', header=None)
neg_words = neg_words.rename(columns={0: "token"})
neg_words['token'] = neg_words['token'].str.lower()
neg_words.head()

In [None]:
pos_words = pd.read_excel('LoughranMcDonald_SentimentWordLists_2018.xlsx', sheet_name='Positive', header=None)
pos_words = pos_words.rename(columns={0: "token"})
pos_words['token'] = pos_words['token'].str.lower()
pos_words.head()

#### Apple's Earnings Announcement 8-K

We're going to practice this approach using Apple's April 30, 2020 earnings announcement 8-K (https://www.sec.gov/Archives/edgar/data/320193/000032019320000050/a8-kexhibit991q2202032.htm). Let's first download the html source code using the **requests.get** function and convert it to text using the **html_to_text** function.

In [None]:
# AUG 2021 UPDATE -- YOU HAVE TO DECLARE A HEADER TO ACCESS THE EDGAR WEBSITE

headers = {'User-Agent': 'ORGANIZATION youremail@yourinstitution.edu'}
url = 'https://www.sec.gov/Archives/edgar/data/320193/000032019320000050/a8-kexhibit991q2202032.htm'
apple_8K = requests.get(url,headers=headers).text
text = html_to_text(apple_8K)
print(text)

#### Pre-Processing and Tokenization

Next, let's convert the text to lower case, create word tokens, remove stop words and non-alphabetic tokens, and then create a DataFrame containing the counts of all remaining tokens.

In [None]:
def get_counts(text):
      
    # Convert text to lower case
    
    text = text.lower()
    
    # Create tokens
    
    tokens = word_tokenize(text)
    
    # Remove stop words
        
    tokens = [t for t in tokens if t not in stopwords.words('english')] 
    
    # Remove non-alphabetic (i.e., punctuation, numbers) tokens
    
    tokens = [t for t in tokens if t.isalpha()] 
    
    # Count tokens
    
    counts = Counter(tokens)

    # Create a DataFrame of our token counts
    
    df = pd.DataFrame.from_dict(counts, orient='index').reset_index()
    df = df.rename(columns={"index": "token", 0: "count"})
    df = df.sort_values(by=["count"],ascending=[False])
    
    return df

df = get_counts(text)
df.head(10)

#### Merge with Negative Words List

Let's merge the **neg_words** DataFrame with our **df** DataFrame to identify the negative words in the disclosure.

In [None]:
data = pd.merge(df, neg_words, on='token', how='left', indicator=True)
data[data._merge == 'both']

In [None]:
data['neg'] = 0
data.loc[data._merge == 'both', 'neg'] = 1
data = data.drop(columns=['_merge'])

data[data.neg == 1

#### Merge with Positive Words List

Let's do the same thing for the positive word list.

In [None]:
data = pd.merge(data, pos_words, on='token', how='left', indicator=True)
data['pos'] = 0
data.loc[data._merge == 'both', 'pos'] = 1
data = data.drop(columns=['_merge'])

#data[data.pos == 1]
data.head()

#### Total Negative Word Count

We can now sum up the counts of negative words in the disclosure using the **sum** function.

In [None]:
num_neg = data[data.neg == 1]['count'].sum()
print(num_neg)

#### Total Positive Word Count

We can do the same for positive counts.

In [None]:
num_pos = data[data.pos == 1]['count'].sum()
print(num_pos)

#### Total Word Count

We can also sum the entire **count** column in the **data** DataFrame to obtain the total word count of the disclosure.

In [None]:
total_wc = data['count'].sum()
print(total_wc)

#### Create Tone Measures

Let's now calculate net tone ([**num_pos** - **num_neg**]/[**num_pos** + **num_neg**]), positive tone (**num_pos/total_wc**), and negative tone (**num_neg/total_wc**).

In [None]:
net_tone = (num_pos - num_neg)/(num_pos + num_neg)
pos_tone = num_pos / total_wc
neg_tone = num_neg / total_wc

print('Net Tone is equal to '+'{:.3f}'.format(net_tone))
print('Positive Tone is equal to '+'{:.3f}'.format(pos_tone))
print('Negative Tone is equal to '+'{:.3f}'.format(neg_tone))

#### Exercise

1. Create a function called **get_tone** that takes the EDGAR URL as input and returns the **net_tone**, **pos_tone**, and **neg_tone** of the disclosure.
2. Calculate the **net_tone**, **pos_tone**, and **neg_tone** of American Airlines's July 21, 2010 earnings announcement 8-K (https://www.sec.gov/Archives/edgar/data/6201/000000620110000035/ar07218k.htm). Briefly open and skim the 8-K. Does the tone you calculated reflect the sentiment of the disclosure?
3. Calculate the **net_tone**, **pos_tone**, and **neg_tone** of American Airlines's April 18, 2013 earnings announcement 8-K (https://www.sec.gov/Archives/edgar/data/6201/000000620113000036/amrearningsreleaseq120138k.htm). Briefly open and skim the 8-K. Does the tone you calculated reflect the sentiment of the disclosure?

#### Solution for # 1

#### Solution for # 2

#### Solution for # 3