# Exploring Stock Market Data

This notebook provides step-by-step instructions for replicating what Brian did in the third demo, to analyze Apple's stock closing prices between October 24, 2024 and October 23, 2025. 

Please follow along, and feel free to play with different variations too! 

In [1]:
# jupyter ai configure openai \
#   --api-key not-needed \
#   --base-url http://localhost:11434/v1

## Step 1: Define the Dates and Ticker Variables

These variables will be used at the end to generate a summary of the insights.

In [2]:
start_date = '2024-10-24'
end_date = '2025-10-23'
ticker = 'AAPL'

## Step 2: Download Apple Stock Data

- Open Jupyter chat by clicking on the chat bubble icon on the left sidebar of Jupyter Lab
- Create a new chat by clicking `+Chat` (choose any name for the file)
- In the chat window, attach the Markdown file `yfinance_docs.md` using the `@` symbol
  - Type `@`, then select `file` from the autocomplete menu
  - You'll see a list of available files in the lesson's directory, choose `yfinance_docs.md` 
- To download Apple stock data, use a prompt like this:
   > Use yfinance to download Apple (AAPL) stock data for this period:
   > - start date: October 24, 2024
   > - end date: October 23, 2025
   >
   > Save the returned results in a DataFrame called `aapl`
- Transfer the generated code to the cell below and run it

In [3]:
import yfinance as yf

# Define start and end dates
start_date = "2024-10-24"
end_date = "2025-10-23"

# Download Apple stock data for the specified period
aapl = yf.download("AAPL", start=start_date, end=end_date)

# Display the first few rows to confirm
print(aapl.head())

ModuleNotFoundError: No module named '_cffi_backend'

<span style="color:green; font-weight:bold;">Note:</span> If you see a message saying that cached data was used due to rate limit errors, this is a temporary issue in this learning environment that occurs when too many requests are sent simultaneously. You can run the notebook locally if you wish to download data for different stocks.

- Print the columns of the DataFrame `aapl`: `aapl.columns`

In [None]:
# Display the first few rows to confirm
print(aapl.head())

- In the chat window, ask how you can flatten the columns of the DataFrame using a prompt like this:
  > The DataFrame aapl has multiIndexed columns [('Close','AAPL'),('High','AAPL'),('Low','AAPL'),('Open','AAPL'),('Volume','AAPL')]. Flatten the columns by removing 'AAPL'.

In [None]:
# Flatten multi-indexed columns by taking only the first level names
aapl.columns = aapl.columns.get_level_values(0)

# Verify the change
print(aapl.columns)
print(aapl.head())

## Step 3: Calculate Basic Statistics & Metrics

- In the same chat window, use a prompt like this to calculate the basic descriptive statistics of the DataFrame:
  > Display the shape and statistical summary of the DataFrame aapl.

In [None]:
# Display the shape of the DataFrame
print("Shape of aapl:", aapl.shape)

# Display descriptive statistics
print("\nStatistical summary:")
print(aapl.describe())

- To calculate the total return, use a prompt like this:
  > Use the Close column of DataFrame aapl to find the total return in percentage (`total_return`) based on the start price and end price.

In [None]:
# Get the starting and ending prices
start_price = aapl['Close'].iloc[0]
end_price = aapl['Close'].iloc[-1]

# Calculate total return in percentage
total_return = ((end_price - start_price) / start_price) * 100

print(f"Total Return: {total_return:.2f}%")

## Step 4: Visualize the Closing Price

- To visualize Apple's Closing price, use a prompt like this:
  > Create a line chart showing the closing price trend using the column 'Close' of the DataFrame `aapl`.
  >
  > Use matplotlib to create a professional-looking chart with:
  > - Clear title and axis labels
  > - Grid for readability
  > - Appropriate colors and styling

In [None]:
import matplotlib.pyplot as plt

# Set up the plot size and style
plt.figure(figsize=(12, 6))
plt.style.use('seaborn-v0_8-darkgrid')

# Plot the closing price
plt.plot(aapl['Close'], color='royalblue', linewidth=2)

# Add title and labels
plt.title("Apple Inc. (AAPL) Closing Price Trend", fontsize=16, fontweight='bold')
plt.xlabel("Date", fontsize=12)
plt.ylabel("Closing Price (USD)", fontsize=12)

# Add grid lines for better readability
plt.grid(True, linestyle='--', alpha=0.6)

# Display the chart
plt.tight_layout()
plt.show()

- To find the dates that correspond to the peak and lowest prices, use a prompt like this:
  > Use the Close column of aapl dataFrame to find and print:
  > - the peak date (in a variable called `peak_date`) that corresponds to the maximum closing price `peak_price`
  > - the lowest date (in a variable called `lowest_date`) that corresponds to the minimum closing price `lowest_price`
  >
  > Update the above code to show the peak and low prices in the line chart.

In [None]:
import matplotlib.pyplot as plt

# --- Find peak and lowest prices and corresponding dates ---
peak_price = aapl['Close'].max()
lowest_price = aapl['Close'].min()

peak_date = aapl['Close'].idxmax()
lowest_date = aapl['Close'].idxmin()

print(f"Peak Date: {peak_date}, Peak Price: {peak_price:.2f}")
print(f"Lowest Date: {lowest_date}, Lowest Price: {lowest_price:.2f}")

# --- Plot the closing price trend ---
plt.figure(figsize=(12, 6))
plt.style.use('seaborn-v0_8-darkgrid')

plt.plot(aapl['Close'], color='royalblue', linewidth=2, label='Closing Price')

# Mark the peak and lowest points
plt.scatter(peak_date, peak_price, color='green', s=100, label='Peak Price')
plt.scatter(lowest_date, lowest_price, color='red', s=100, label='Lowest Price')

# Annotate the peak and lowest points
plt.annotate(f'Peak: {peak_price:.2f}', 
             xy=(peak_date, peak_price), 
             xytext=(peak_date, peak_price + 10),
             arrowprops=dict(facecolor='green', shrink=0.05),
             fontsize=10, color='green')

plt.annotate(f'Low: {lowest_price:.2f}', 
             xy=(lowest_date, lowest_price), 
             xytext=(lowest_date, lowest_price - 15),
             arrowprops=dict(facecolor='red', shrink=0.05),
             fontsize=10, color='red')

# Add title, labels, and grid
plt.title("Apple Inc. (AAPL) Closing Price Trend with Peak and Lowest Points", fontsize=16, fontweight='bold')
plt.xlabel("Date", fontsize=12)
plt.ylabel("Closing Price (USD)", fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()

plt.tight_layout()
plt.show()

- To find the context related to the peak and lowest date, use a prompt like this:
  > For the `peak_date` and `lowest_date`, search for related Apple news using Serper. The Serper API key is saved in a .env file. Store the snippets of the found articles in a json string `news_snippets` that has these fields: peak_date, lowest_date, peak_news_snippets, lowest_news_snippets.

<span style="color:green; font-weight:bold;">Note:</span> The `SERPER_API_KEY` variable is already defined in this environment, you do not need to create  an `.env` file. 

In [None]:
import os
import json
import requests
from dotenv import load_dotenv

# --- Load API key from .env ---
load_dotenv()
SERPER_API_KEY = os.getenv("SERPER_API_KEY")

# --- Prepare search queries using the found dates ---
peak_query = f"Apple stock news around {peak_date}"
lowest_query = f"Apple stock news around {lowest_date}"

# --- Set up Serper API endpoint and headers ---
serper_url = "https://google.serper.dev/news"
headers = {
    "X-API-KEY": SERPER_API_KEY,
    "Content-Type": "application/json"
}

# --- Function to perform search and extract snippets ---
def get_news_snippets(query, max_results=3):
    payload = json.dumps({"q": query})
    response = requests.post(serper_url, headers=headers, data=payload)
    data = response.json()
    
    snippets = []
    if "news" in data:
        for article in data["news"][:max_results]:
            snippet_text = article.get("snippet", "")
            snippets.append(snippet_text)
    return snippets

# --- Get news for both dates ---
peak_news_snippets = get_news_snippets(peak_query)
lowest_news_snippets = get_news_snippets(lowest_query)

# --- Combine results into a single JSON string ---
news_snippets = json.dumps({
    "peak_date": str(peak_date),
    "lowest_date": str(lowest_date),
    "peak_news_snippets": peak_news_snippets,
    "lowest_news_snippets": lowest_news_snippets
}, indent=4)

# Display the JSON string
print(news_snippets)

## Step 5: Analyze Volatility

- To calculate the signal's volatility, use a prompt like this: 
  >In the DataFrame aapl, find the overall volatility in percentage using the column Close. Volatility is the standard deviation of the daily percentage changes. Save the result in a variable called `volatility`.

In [None]:
# --- Calculate daily percentage change ---
daily_returns = aapl['Close'].pct_change()

# --- Compute volatility as standard deviation of daily returns (in percentage) ---
volatility = daily_returns.std() * 100

print(f"Overall Volatility: {volatility:.2f}%")

- To find and plot the rolling volatility, use a prompt like this:
  > Calculate the rolling volatility as the as 20-day standard deviation of the daily percentage change and plot it. Identify days of high volatility where volatility is greater than mean + std. Save the days of high volatility in a DataFrame called `high_vol_days`.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# --- Compute daily returns ---
daily_returns = aapl['Close'].pct_change()

# --- Compute 20-day rolling volatility (in percentage) ---
rolling_volatility = daily_returns.rolling(window=20).std() * 100

# --- Plot the rolling volatility ---
plt.figure(figsize=(12, 6))
plt.plot(rolling_volatility, label="20‑Day Rolling Volatility", color='blue')
plt.title("Apple 20‑Day Rolling Volatility (%)")
plt.xlabel("Date")
plt.ylabel("Volatility (%)")
plt.legend()
plt.grid(True)
plt.show()

# --- Identify high-volatility days using threshold mean + std ---
vol_mean = rolling_volatility.mean()
vol_std = rolling_volatility.std()
threshold = vol_mean + vol_std

high_vol_days = aapl.loc[rolling_volatility > threshold].copy()
high_vol_days['Rolling_Volatility'] = rolling_volatility[rolling_volatility > threshold]

# Display the resulting DataFrame
high_vol_days.head()

## Step 6: Report Generation

- To generate a report summarizing the insights, use a prompt like this:
  
  > Use gpt-4.1-mini to generate a summary that takes in these variables:
  > - ticker: stock ticker (string)
  > - start_date: analysis starting period (string)
  > - end_date: analysis end period (string)
  > - numerical metrics: total_return & volatility (in percentage)
  > - peak_date, peak_price
  > - lowest_date, lowest_price
  > - high_vol_days: pandas DataFrame showing high volatility days
  > - news_snippets: string containing snippet of news for the peak and lowest dates
  >
  > The OpenAI API key is stored in the .env file. The variables are already defined in the notebook.

In [None]:
from openai import OpenAI
from dotenv import load_dotenv
import os

# --- Load the API Key from .env ---
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# --- Prepare input prompt for GPT‑4.1‑mini ---
prompt = f"""
You are a financial analyst. Write a clear, concise summary of the following stock analysis.

- **Ticker:** {ticker}
- **Analysis Period:** {start_date} to {end_date}
- **Total Return:** {total_return:.2f}%
- **Overall Volatility:** {volatility:.2f}%
- **Peak Date:** {peak_date} (Price: {peak_price})
- **Lowest Date:** {lowest_date} (Price: {lowest_price})

The stock experienced periods of high volatility on the following days:
{high_vol_days[['Close', 'Rolling_Volatility']].to_string(index=True)}

Relevant news snippets for context:
{news_snippets}

Write the summary as a short narrative for a financial report, using professional but accessible language.
"""

# --- Generate summary with GPT‑4.1‑mini ---
response = client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[{"role": "user", "content": prompt}],
)

# --- Extract and display the generated summary ---
summary = response.choices[0].message.content
print(summary)

In [None]:
import requests

prompt = f"""
You are a financial analyst. Write a clear, concise summary of the following stock analysis.

- **Ticker:** {ticker}
- **Analysis Period:** {start_date} to {end_date}
- **Total Return:** {total_return:.2f}%
- **Overall Volatility:** {volatility:.2f}%
- **Peak Date:** {peak_date} (Price: {peak_price})
- **Lowest Date:** {lowest_date} (Price: {lowest_price})

The stock experienced periods of high volatility on the following days:
{high_vol_days[['Close', 'Rolling_Volatility']].to_string(index=True)}

Relevant news snippets for context:
{news_snippets}

Write the summary as a short narrative for a financial report, using professional but accessible language.
"""

# --- Send request to local Ollama model ---
response = requests.post(
    "http://localhost:11434/api/chat",
    json={
        "model": "qwen2.5:7b-instruct-q4_0",
        "messages": [{"role": "user", "content": prompt}],
        "stream": False,  # disable streaming to get a single JSON object
    },
)

# --- Parse model output ---
result = response.json()

# In non-stream mode, Ollama wraps output under result["message"]["content"]
summary = result["message"]["content"]
print(summary)

In [None]:
import requests

def generate_financial_summary(
    model: str,
    ticker: str,
    start_date: str,
    end_date: str,
    total_return: float,
    volatility: float,
    peak_date: str,
    peak_price: float,
    lowest_date: str,
    lowest_price: float,
    high_vol_days,
    news_snippets: str,
):
    """Generate a professional stock summary using a local Ollama model."""
    
    prompt = f"""
    You are a financial analyst. Write a clear, concise summary of the following stock analysis.

    - **Ticker:** {ticker}
    - **Analysis Period:** {start_date} to {end_date}
    - **Total Return:** {total_return:.2f}%
    - **Overall Volatility:** {volatility:.2f}%
    - **Peak Date:** {peak_date} (Price: {peak_price})
    - **Lowest Date:** {lowest_date} (Price: {lowest_price})

    The stock experienced periods of high volatility on the following days:
    {high_vol_days[['Close', 'Rolling_Volatility']].to_string(index=True)}

    Relevant news snippets for context:
    {news_snippets}

    Write the summary as a short narrative for a financial report, using professional but accessible language.
    """

    response = requests.post(
        "http://localhost:11434/api/chat",
        json={
            "model": model,
            "messages": [{"role": "user", "content": prompt}],
            "stream": False,
        },
    )

    result = response.json()
    return result["message"]["content"]

In [None]:
summary = generate_financial_summary(
    "qwen2.5:7b-instruct-q4_0",
    ticker,
    start_date,
    end_date,
    total_return,
    volatility,
    peak_date,
    peak_price,
    lowest_date,
    lowest_price,
    high_vol_days,
    news_snippets,
)

print(summary)