# Index Return data

- using yahoo finance for historical stock return data 
- going to need the following
    - Dates of the announcement
    - Index returns 10 days before the announcement
    - Index returns the day of the announcement
    - Index returns 10 days after the announcement 

In [20]:
import pandas as pd
import yfinance as yf
from datetime import datetime, timedelta

In [33]:

fomc_statements = pd.read_csv('dates/fomc_statements.csv')
fomc_statements['Date'] = pd.to_datetime(fomc_statements['Date'])
fomc_statements['document_type'] = 'statement' 

tickers = [
    '^GSPC',     # S&P 500
    '^IXIC',     # NASDAQ Composite
    '^DJI',      # Dow Jones Industrial Average
    '^RUT',      # Russell 2000
    '^W5000',    # Wilshire 5000
    'XLF',       # Financials Sector (ETF)
    'XLRE',      # Real Estate Sector (ETF)
    'XLU',       # Utilities Sector (ETF)
    'XLY',       # Consumer Discretionary Sector (ETF)
    'XLP',       # Consumer Staples Sector (ETF)
    'XLE',       # Energy Sector (ETF)
    'XLV',       # Healthcare Sector (ETF)
    'XLI',       # Industrials Sector (ETF)
    'XLB',       # Materials Sector (ETF)
    'XLK',       # Information Technology Sector (ETF)
    'XLC',       # Communication Services Sector (ETF)
    '^IRX',      # Three-month Treasury Bill Yield
    '^TNX',      # Ten-year Treasury Yield
]

start_date = fomc_statements['Date'].min() - pd.Timedelta(days=15)
end_date = fomc_statements['Date'].max() + pd.Timedelta(days=15)

all_indices_data = {}
for ticker in tickers:
    #print(f"Downloading data for {ticker}...")
    data = yf.download(ticker, start=start_date, end=end_date)
    data.columns = data.columns.get_level_values(0)
    data['return'] = data['Close'].pct_change()
    all_indices_data[ticker] = data[['return']].dropna()
    #print(f"Data for {ticker} downloaded.")

rows = []

for index, row_fomc in fomc_statements.iterrows():
    date = row_fomc['Date']
    document_type = row_fomc['document_type']
    for ticker in tickers:
        row = {'announcement_date': date, 'ticker': ticker, 'document_type': document_type} 
        for t in range(-15, 16):
            target_date = date + pd.Timedelta(days=t)
            if target_date in all_indices_data[ticker].index:
                row[f'T{t:+}'] = all_indices_data[ticker].loc[target_date, 'return']
            else:
                row[f'T{t:+}'] = pd.NA
        rows.append(row)

final_df = pd.DataFrame(rows)

column_order = ['announcement_date', 'ticker', 'document_type'] + [f'T{t:+}' for t in range(-15, 16)]
final_df = final_df[column_order]

final_df

final_df.to_csv('raw_data/long_format_output.csv', index=False)

  fomc_statements['Date'] = pd.to_datetime(fomc_statements['Date'])
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%*********

### Problem
Need to figure out how to get the days for around the intermeeting dates now because 
the dates in the html links are the same as the fed statements for those, which would cause a problem 
when getting return data.

How to get around this?
Find a dataset with the dates of the intermeeting or make our own (just manually do it - would take an hour maybe)

# Sentiment Analysis 