# Stock Data Download and Preparation

This notebook implements the data download and preparation process for stock analysis. The script:
1. Downloads historical stock data from Yahoo Finance
2. Retrieves news and press releases
3. Prepares and formats the data for further processing

## Key Parameters
- `stock_symbol`: Target stock symbol (e.g., "CRM")
- `start_date`: Historical data start date
- `end_date`: Historical data end date

## Data Collection Process
1. Downloads historical stock data:
   - Daily price data (Open, High, Low, Close, Volume)
   - Adjusted closing prices
   - Trading volume
2. Retrieves associated news and press releases:
   - Financial news articles
   - Company press releases

## Output Format
Data is saved in CSV format with columns:
- Date
- Stock price data (Adj Close Price, Returns, Bin Label)
- News articles (timestamp, headline, summary)
- Press releases (timestamp, headline, description)

## Usage
This data will be used as input for the PrimoGPT model to generate NLP features in subsequent analysis steps.

### Note
I manually transferred the generated data from the temporary folder to data_for_train_primogpt for better organization

### This cell bellow is for package installation on Google Colab

In [1]:
# Import required modules and set up paths
import sys
sys.path.append('../../')

import json
import os
from primogpt.create_prompt import *
from primogpt.prepare_data import *

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Define stock symbol and date range
stock_symbol = "AAPL"
start_date = "2022-04-01"
end_date = "2025-02-28"

# Create directory for storing generated features
data_dir = f"data/{stock_symbol}_{start_date}_{end_date}"
os.makedirs(data_dir, exist_ok=True)

In [3]:
# Download and prepare raw data
# This includes stock prices, news, and press releases
prepare_data_for_symbol(stock_symbol, data_dir, start_date, end_date)

[*********************100%***********************]  1 of 1 completed


Returns done
Skipping news item with invalid timestamp: -62135596800
Skipping news item with invalid timestamp: -62135596800
Skipping news item with invalid timestamp: -62135596800
Skipping news item with invalid timestamp: -62135596800
News done
Press releases done


Unnamed: 0,Date,Adj Close Price,Returns,Bin Label,News,PressReleases
1,2022-04-04,175.594757,0.023693,U3,[],[]
2,2022-04-05,172.268631,-0.018942,D2,[],"[{""date"": ""2022-04-05 12:00:00"", ""headline"": ""..."
3,2022-04-06,169.090149,-0.018451,D2,[],"[{""date"": ""2022-04-06 19:30:00"", ""headline"": ""..."
4,2022-04-07,169.395187,0.001804,U1,[],[]
5,2022-04-08,167.377869,-0.011909,D2,"[{""date"": ""20220410052100"", ""headline"": ""Huawe...",[]
...,...,...,...,...,...,...
724,2025-02-21,245.550003,-0.001139,D1,"[{""date"": ""20250221164515"", ""headline"": ""Super...",[]
725,2025-02-24,247.100006,0.006312,U1,"[{""date"": ""20250224182619"", ""headline"": ""Unity...","[{""date"": ""2025-02-24 06:00:00"", ""headline"": ""..."
726,2025-02-25,247.039993,-0.000243,D1,"[{""date"": ""20250225202117"", ""headline"": ""Alpha...",[]
727,2025-02-26,240.360001,-0.027040,D3,"[{""date"": ""20250226161300"", ""headline"": ""Apple...",[]


In [4]:
# Load the raw data file
csv_file_name = f"{stock_symbol}_{start_date}_{end_date}.csv"
csv_file_path = os.path.join(data_dir, csv_file_name)

# Display the first 50 rows of the data
df = pd.read_csv(csv_file_path)
df.head(50)

Unnamed: 0,Date,Adj Close Price,Returns,Bin Label,News,PressReleases
0,2022-04-04,175.594757,0.023693,U3,[],[]
1,2022-04-05,172.268631,-0.018942,D2,[],"[{""date"": ""2022-04-05 12:00:00"", ""headline"": ""..."
2,2022-04-06,169.090149,-0.018451,D2,[],"[{""date"": ""2022-04-06 19:30:00"", ""headline"": ""..."
3,2022-04-07,169.395187,0.001804,U1,[],[]
4,2022-04-08,167.377869,-0.011909,D2,"[{""date"": ""20220410052100"", ""headline"": ""Huawe...",[]
5,2022-04-11,163.107086,-0.025516,D3,"[{""date"": ""20220411161916"", ""headline"": ""Marke...",[]
6,2022-04-12,164.986649,0.011523,U2,"[{""date"": ""20220412163000"", ""headline"": ""4 Rea...",[]
7,2022-04-13,167.682953,0.016343,U2,"[{""date"": ""20220413162802"", ""headline"": ""Alpha...",[]
8,2022-04-14,162.654449,-0.029988,D3,"[{""date"": ""20220414160132"", ""headline"": ""Dow J...","[{""date"": ""2022-04-14 08:00:00"", ""headline"": ""..."
9,2022-04-18,162.437943,-0.001331,D1,"[{""date"": ""20220418192500"", ""headline"": ""CEO p...",[]


In [7]:
# Display sample data to verify content
news_content = df.loc[155, 'News']
news_content_json = json.loads(news_content)

print("News:")
for news_item in news_content_json:
    print(f"Date: {news_item['date']}, Headline: {news_item['headline']}, Summary: {news_item['summary']}\n")

News:
Date: 20221114160000, Headline: Apple-Epic App Store Legal Battle Worries US Antitrust Enforcers, Summary: (Bloomberg) -- The US Justice Department is taking a stance in the latest showdown between Apple Inc. and Epic Games Inc. over the iPhone maker’s dominance of the marketplace for mobile applications.Most Read from BloombergFTX’s Balance Sheet Was BadChina Plans Property Rescue in Latest Surprise Policy ShiftBiden, Xi Chart Path to Warmer Ties With Blinken China VisitMusk Publicly Punishes Twitter Engineers Who Call Him Out OnlineUS Stocks, Bonds Drop as Fed Signals Further Hikes: Markets WrapThe

Date: 20221114162800, Headline: Apple CEO Tim Cook says company is still hiring but being ‘very deliberate’, Summary: Amid a backdrop of hiring freezes and layoffs across the tech industry, Apple's Tim Cook said the smartphone giant was bringing on new employees in some roles but taking a more restrained approach to overall hiring.

Date: 20221114163000, Headline: Apple Stock Slides

In [9]:
# Display sample data to verify content
press_releases = df.loc[156, 'PressReleases']
press_releases_json = json.loads(press_releases)

print("Press Releases:")
for release in press_releases_json:
    print(f"Date: {release['date']}, Headline: {release['headline']}, Description: {release['description']}\n")

Press Releases:
Date: 2022-11-15 08:00:00, Headline: Emergency SOS via Satellite Available Today on the iPhone 14 Lineup in the US and Canada, Description: Apple ® today announced its groundbreaking safety service Emergency SOS via satellite is now available to customers in the US and Canada. Emergency SOS via satellite is available in the US and Canada starting today, November 15, and will come to France, Germany, Ireland, and the UK in December. Apple’ s groundbreaking safety service Emergency SOS via satellite...

