# Textual Analysis Project

In this project, you will calculate disclosure tone for firms' earnings announcement 8-Ks in 2019 Q1. You will then calculate cumulative abnormal returns for the [0,+1] window surrounding the earnings announcement date and test the relation between earnings announcement tone and cumulative abnormal returns.

# Instructions

#### Import Modules

1. Import the following modules:
    1. requests
    2. pandas as pd
    3. word_tokenize from nltk.tokenize
    4. Counter from collections
    5. statsmodels.api as sm 
    6. html_to_text from MyFunctions (you will need to download the **MyFunctions.py** file and save it in your current working directory)    

#### Solution - Import Modules

#### Import Negative and Positive Word Lists

1. Import the negative word list into a **pandas** DataFrame called **neg_words** using the **pd.read_excel** function.
2. Import the postive word list into a **pandas** DataFrame called **pos_words** using the **pd.read_excel** function.
3. Rename the column header in the **neg_words** and **pos_words** DataFrames to 'token'.
3. Convert the words in the **neg_words** and **pos_words** DataFrames into lower case.

The word lists included in the **LoughranMcDonald_SentimentWordLists** Excel file available at https://sraf.nd.edu/textual-analysis/resources/. 

#### Solution - Import Negative and Positive Word Lists

#### Create a Function to Obtain Tone

Create a new function called **get_tone** which takes the URL of an EDGAR filing and returns the **net_tone** of the filing, where **net_tone** is calculated as follows:

$
\begin{align}
NET\ TONE = \frac{\#\ POS\ WORDS - \#\ NEG\ WORDS}{\#\ POS\ WORDS + \#\ NEG\ WORDS}
\end{align}
$

See the **Disclosure Tone** module for additional instruction on how to create this function.

#### Solution - Create a Function to Obtain Tone

#### Import Earnings Announcement 8-K URLs

Import the '8-K URLs.txt' file into a **pandas** DataFrame called **urls** using the **pd.read_csv** function. Note: The columns are delimited with the '|' symbol.

#### Solution - Import Earnings Announcement 8-K URLs

#### Compute Tone

1. Create a new file called 'tone.txt' in your current working directory using the **open** function. Write a new header containing the following variables delimited with the '|' symbol: **cik**, **ticker**, **date_filed**, **url**, and **net_tone**.
2. Loop through the filings in the **urls** DataFrame using the **iterrows** function and compute the **net_tone** of each filing using the **get_tone** function. For each filing, write a new row to the 'tone.txt' file containing the following variables delimited with a '|' symbol: **cik**, **ticker**, **date_filed**, **url**, and **net_tone**.

#### Solution - Compute Tone

#### Import Tone Data

Import the 'tone.txt' file to a new **pandas** DataFrame called **data** using the **pd.read_csv** function. Use the **parse_dates** option to read in the **date_filed** variable as a **datetime** object.

#### Solution - Import Tone Data

#### Calculate Cumulative Abnormal Returns

1. Import the 'ret.csv' file to a new **pandas** DataFrame called **ret** using the **pd.read_csv** function. Use the **parse_dates** option to read in the **Date** variable as a datetime object. The 'ret.csv' file was created by scraping Yahoo! Finance (see the Web Scraping Yahoo! Finance Tutorials) and contains the following variables:
    1. **ticker**
    2. **Date**
    3. **ret** - Raw one-day return
    4. **ewret** - Equal-weighted index return
2. Calculate **abnret** as **ret** - **ewret**.
3. Sort the **ret** DataFrame by **ticker** (ascending order) and **Date** (descending order) using the **sort_values** function and the **ascending** option.
4. Use the **groupby** and **shift** functions to create a new variable called **abnret_lead1** equal to the return for each ticker on the subsequent trading date (i.e., date *t+1*). 
5. Create a new variable called **car01** equal to the cumulative abnormal return from day 0 to day +1 using the following formula:

    $
    \begin{align}
    CAR\ [0,+1]\ =\ [\ (1 + r_{t})\ \times\ (1 + r_{t+1})\ ] - 1
    \end{align}
    $

    where $r$ is the abnornal return on day t (current date) and day t+1 (subsequent trading date).

#### Solution - Calculate Cumulative Abnormal Returns

#### Merge the Data and Ret DataFrames

1. Merge the **data** DataFrame with the **ret** DataFrame and call the resulting DataFrame **data**. Merge on **ticker** and **date_filed** in the **data** DataFrame and on **ticker** and **Date** in the **ret** DataFrame. Use an inner merge.
2. Drop all N/A values in the **data** DataFrame using the **dropna** function.

#### Solution - Merge the Data and Ret DataFrames

#### Run a Basic OLS Regression of Cumulative Abnormal Returns on Tone

1. Examine the summary statistics of the **net_tone** and **car01** variables using the **describe** function. What is the median value of **net_tone**?
2. Use the **OLS** function from the **statsmodels.api** module to run a basic OLS regression with the **car01** as the dependent variable and **net_tone** as the independent variable. Include a constant term in the regression. Print the OLS model summary output using the **summary** function.
3. Comment on the relation between tone and the cumulative abnormal return.

#### Solution - Run a Basic OLS Regression of Cumulative Abnormal Returns on Tone