# Historical News Data Collection and Processing

## Overview:
This notebook is designed to fetch and store historical news data related to stock tickers using the EODHD API. The news data can then be used for further analysis, such as sentiment analysis or correlation with stock price movements.

## Purpose:
The primary goal of this notebook is to automate the process of retrieving and storing news articles for specified stock tickers over a given date range. The notebook processes the news data and exports it to CSV format for ease of analysis.

## Steps:
1. **Setup and Imports**: The notebook starts by importing the required libraries, such as `pandas` for data manipulation and `requests` for making API calls.
2. **API Configuration**: We define the API key and base URL for fetching news data from the EODHD API. The user specifies parameters like the ticker symbol, date range, and maximum number of articles to fetch.
3. **Data Fetching**: The `fetch_news_data` function is responsible for making API requests and retrieving news data for the specified stock ticker. It handles pagination to retrieve large datasets and checks for errors in the API response.
4. **Data Saving**: The `save_to_csv` function exports the news data into a CSV file format, with the file name based on the ticker symbol and stored in a specified folder.
5. **Workflow Execution**: The `run_workflow` function ties everything together, fetching the data and saving it to CSV. It is designed to be easy to run for multiple tickers and date ranges.
6. **Execution Example**: The final step in the notebook demonstrates how to define a stock ticker and date range, and execute the workflow to fetch the news data for that ticker.

## Output:
- CSV files containing the fetched news data are saved in a specified folder. The file names are formatted as `<ticker>_news.csv` (e.g., `SPY.US_news.csv` for S&P 500 data).

## Use Cases:
- **Historical Sentiment Analysis**: Use the historical news data for performing sentiment analysis to understand market sentiment around a specific stock or index.
- **Correlation Studies**: Analyze the correlation between stock price movements and specific news events over time.
- **Backtesting**: Incorporate the news data into backtesting frameworks for trading strategies that rely on news sentiment or historical events.

## Customization:
The user can modify the following:
- **API Key**: Replace the placeholder API key with a valid key.
- **Ticker and Date Range**: Update the ticker symbol and date range to fetch news data for different stocks or time periods.
- **Maximum Articles**: Adjust the `max_articles` parameter to control how much data to fetch per ticker.

## Requirements:
- **EODHD API Key**: A valid API key for the EODHD service is required to fetch news data.
- **Libraries**: Ensure that `pandas` and `requests` are installed in the environment to handle data manipulation and API requests.

## Conclusion:
This notebook provides an efficient and scalable way to fetch historical news data for multiple stock tickers over extended time periods. It can be easily integrated into larger workflows for financial analysis, trading strategies, or research projects.


#### Importing libraries
This cell imports the necessary libraries for fetching news data and saving it as a CSV file.
- `pandas`: Used for handling data and saving it to a CSV file.
- `requests`: Used for making HTTP requests to fetch data from the EODHD API.

In [41]:
import pandas as pd
import requests

#### Setup API key and basic configuration
This cell sets up the API key and other parameters needed for fetching news data from the EODHD API. It defines:
- `api_key`: The API key for accessing the EODHD API (replace with your own).
- `base_url`: The base URL for fetching news data.
- `limit`: Maximum number of articles per request (100 by default).
- `max_articles`: The maximum number of articles to fetch for each ticker (500 in this case).


In [42]:
# Setup and Imports
api_key = " 66848eff49cbd1.37105331"  # Replace with your EODHD API key
base_url = "https://eodhd.com/api/news"
limit = 100  # The number of articles per request (max 1000, default 50)
max_articles = 500  # Define the maximum number of articles to fetch


#### Function to fetch news data from EODHD API
This function fetches historical news articles for a given stock ticker and date range. It makes multiple requests if necessary to get all the data up to `max_articles`.

**Parameters:**
- `ticker`: The stock ticker symbol.
- `start_date`: Start date for fetching news (format: 'YYYY-MM-DD').
- `end_date`: End date for fetching news (format: 'YYYY-MM-DD').
- `limit`: The number of articles per request.
- `api_key`: The API key for authentication.
- `max_articles`: Maximum number of articles to fetch.

**Returns:**
- A list of all news articles fetched from the API.

In [43]:
def fetch_news_data(ticker, start_date, end_date, limit, api_key, max_articles=500):
    offset = 0
    all_news_data = []

    while True:
        url = f"{base_url}?s={ticker}&from={start_date}&to={end_date}&limit={limit}&offset={offset}&api_token={api_key}&fmt=json"
        response = requests.get(url)

        if response.status_code == 200:
            news_data = response.json()
            if not news_data:
                break  # Break the loop if no more data is returned

            all_news_data.extend(news_data)

            # Increment the offset for the next batch of data
            offset += limit

            # Check if we've reached the max number of articles to fetch
            if len(all_news_data) >= max_articles:
                break  # Stop fetching more if we reach the maximum limit
        else:
            print(f"Failed to fetch data: {response.status_code}")
            break

    return all_news_data

### Function to save news data to a CSV file
This function saves the fetched news data into a CSV file for further analysis.

**Parameters:**
- `data`: The DataFrame containing the fetched news data.
- `ticker`: The stock ticker symbol.
- `folder`: The folder path where the file will be saved (default: 'data').

**Returns:**
- None: The file is saved in the specified folder.

In [44]:
def save_to_csv(data, ticker, folder='data'):
    """
    Save the news data to a CSV file.
    
    Parameters:
    data (pd.DataFrame): News Data
    ticker (str): Stock ticker symbol.
    folder (str): Folder path to save the CSV.
    
    Returns:
    None
    """
    filename = f"../{folder}/processed/{ticker}_news.csv"
    data.to_csv(filename)
    print(f"Data saved to {filename}")

#### Main workflow function
This function orchestrates the entire process of fetching the news data, saving it to a CSV file, and handling any necessary logic.

**Parameters:**
- `ticker`: The stock ticker symbol.
- `start_date`: The start date for the data collection.
- `end_date`: The end date for the data collection.

**Returns:**
- None: Saves the news data to a CSV file if data is fetched successfully.

In [45]:
def run_workflow(ticker, start_date, end_date):
    
    news_data = fetch_news_data(ticker, start_date, end_date, limit=500, api_key=api_key, max_articles=500)
    
    if news_data:
        save_to_csv(pd.DataFrame(news_data), ticker, folder='data')
    
    

#### Define ticker and date range, then run the workflow
This cell defines the stock ticker symbol (e.g., `SPY.US` for S&P 500) and the date range for fetching news data. It then calls the `run_workflow` function to fetch, process, and save the news data.

**Parameters defined:**
- `ticker`: The stock ticker symbol (e.g., 'SPY.US').
- `start_date`: The start date of the news data ('2010-01-01').
- `end_date`: The end date of the news data ('2024-09-10').

In [46]:
# Define tickers (example for S&P 500 or other stocks)
ticker = 'SPY.US'  # Use the normalized ticker for S&P 500 (example: SPX for S&P 500)
start_date = '2010-01-01'
end_date = '2024-09-10'

run_workflow(ticker, start_date, end_date)

Data saved to ../data/processed/SPY.US_news.csv
