# Using Standard Deviations and Z-Scores to Predict the Future

# What is a Standard Deviation?
To get to the z-score, we must first understand what a standard deviation is, because this is the basis for the underlying calculation. Basically, standard deviation measures the amount of variation in a dataset or set of values. You can use standard deviation to measure variances in anything, really.

For instance, say you take a trip to Antarctica and come across a colony of adult Emperor penguins. Thousands and thousands of them all clustered together, and almost all of them are relatively the same size at about 4 feet tall. But you notice a few of them seem to be shorter, some taller, and some much shorter or much taller than the rest. And right there you can see the variances of heights of Emperor penguins. Your data set (the colony in front of you) can be added up and averaged and then the likelihood of of a certain variance (difference in heights) can be calculated, the standard deviation.

After lots of counting and measuring, you calculate that the average height is 45 inches tall. Furthermore, about 68% of the penguins are between 42 and 48 inches tall (one standard deviation of +/- three inches), 95% are between 39 and 51 inches (two standard deviations), and over 99.7% are between 36 and 54 inches tall (three standard deviations). Just a handful of penguins are shorter than 36 inches or taller than 54 inches.

This is known as the Empirical Rule (no relation to Emperor Penguins 😉) which states that 99.7% of values observed within a normal distribution lie within 3 standard deviations of the mean.

In other words, 68% of the data falls within one standard deviation, 95% percent within two standard deviations, and 99.7% within three standard deviations from the average or mean.

# What’s a Z-Score?
A z-score is just the number of standard deviations a value is from the mean, positive or negative.

In other words, if a value has a z-score of 0, then that the data point's value is exactly the same as the average value. A z-score of 1.0 would mean the value is one standard deviation from the mean, 2.0 is two standard deviations, etc., and a positive z-score shows the value is above the mean, a negative z-score shows it is below the mean.

Using the penguin example, say you choose a penguin and want to assign a z-score to it’s height. If the penguin is 49.5 inches tall, then its z-score would be +1.5 (because it is 4.5 inches, or 1.5 standard deviations of 3 inches from the average). And a penguin that is 39 inches tall would have a z-score of -2, because it is two standard deviations below the mean of 45.

# How Do Traders Use Z-Scores?
Traders will use z-scores in a host of different ways, but usually the scores have to do with the volatility of a stock and the standard deviation of the stock’s price over a period of time. A favorite of technical analysts is called the Bollinger Bands.

Choosing a time period, then using a simple moving average of a stock’s price in that period, traders will plot trend lines above and below that moving average that are typically two standard deviations of the average price above and below. When a stock has a z-score that is either approaching or has crossed through these Bollinger Bands, then that is taken as an indicator that the stock is either overbought (above the Band) or oversold (below the Band). The z-score tells the trader exactly how much it is above or below the mean.

So, if the trader is using a two standard deviation Bollinger Band channel, then a stock with a z-score of +2.1 would be overbought, and if the stock’s z-score was -2.1, it would be seen as oversold and a buying opportunity.

Then, using other indicators, such as the Relative Strength Index (RSI), paired with the Bollinger Bands, they confirm the overbought or oversold indicators, and this may give them enough confidence to act on a trade.

# Bollinger Bands
a popular tool among investors and traders, helps gauge the volatility of stocks and other securities to determine if they are over- or undervalued. Developed in the 1980s by financial analyst John Bollinger, the bands appear on stock charts as three lines that move with the price. The center line is the stock price's 20-day simple moving average (SMA). The upper and lower bands are set at a certain number of standard deviations, usually two, above and below the middle line.

The bands widen when a stock's price becomes more volatile and contract when it is more stable. Many traders see stocks as overbought as their price nears the upper band and oversold as they approach the lower band, signaling an opportune time to trade.


In [None]:
! pip install matplotlib -q

In [None]:
from numpy import mean , std
import numpy as np
import pandas as pd
import matplotlib  as plt

In [None]:
arr1 = np.array([5,5,5,5,5])
arr2 = np.array([21,1,1,1,1])

In [None]:
arr1.mean(), arr1.std()

In [None]:
arr2.mean(), arr2.std()

In [None]:
pd.DataFrame(arr1).hist()

In [None]:
pd.DataFrame(arr2).hist()

In [None]:
import yfinance as yf

In [None]:
# Fetch AAPL stock data with a 1-hour timeframe
aapl = yf.Ticker("UBS")
data = aapl.history(period="180d", interval="1h")  # Adjust the period as needed

In [None]:
data.tail(5)

In [None]:
# Calculate the 20-period Simple Moving Average (SMA)
data['SMA'] = data['Close'].rolling(window=50).mean()

# Calculate the 20-period Standard Deviation (SD)
data['SD'] = data['Close'].rolling(window=50).std()

# Calculate the Upper Bollinger Band (UB) and Lower Bollinger Band (LB)
data['UB'] = data['SMA'] + 2 * data['SD']
data['LB'] = data['SMA'] - 2 * data['SD']

In [None]:
import plotly.graph_objs as go

# Create a Plotly figure
fig = go.Figure()

# Add the price chart
fig.add_trace(go.Scatter(x=data.index, y=data['Close'], mode='lines', name='Price'))

# Add the Upper Bollinger Band (UB) and shade the area
fig.add_trace(go.Scatter(x=data.index, y=data['UB'], mode='lines', name='Upper Bollinger Band', line=dict(color='red')))
fig.add_trace(go.Scatter(x=data.index, y=data['LB'], fill='tonexty', mode='lines', name='Lower Bollinger Band', line=dict(color='green')))

# Add the Middle Bollinger Band (MA)
fig.add_trace(go.Scatter(x=data.index, y=data['SMA'], mode='lines', name='Middle Bollinger Band', line=dict(color='blue')))

# Customize the chart layout
fig.update_layout(title='Tesla Stock Price with Bollinger Bands',
                  xaxis_title='Date',
                  yaxis_title='Price',
                  showlegend=True)

# Show the chart
fig.show()

In [None]:
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
import os
from google.oauth2 import service_account
from dotenv import dotenv_values
import json
import vertexai
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings, GoogleGenerativeAI
from langchain_openai import ChatOpenAI

In [None]:
config = dotenv_values("./keys/.env")
with open("./keys/complete-tube-421007-208a4862c992.json") as source:
    info = json.load(source)

vertex_credentials = service_account.Credentials.from_service_account_info(info)
vertexai.init(
    project=config["PROJECT"],
    location=config["REGION"],
    credentials=vertex_credentials,
)
google_api_key = config["GEMINI-API-KEY"]
os.environ["GEMINI_API_KEY"] = google_api_key

In [None]:
OPENAI_API_KEY = config.get("OPENAI_API_KEY")

In [None]:
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

In [None]:
columns = list(data.columns)

In [None]:
data.head()

In [None]:
# Fetch AAPL stock data with a 1-hour timeframe
aapl = yf.Ticker("AAPL")
data = aapl.history(period="180d", interval="1h")  # Adjust the period as needed

In [None]:
type(data)

In [None]:
# llm = ChatGoogleGenerativeAI(
#                     model="gemini-1.5-pro-001", credentials=vertex_credentials
#                 )

In [None]:
context = """
Bollinger Bands consist of three lines:

Middle Band (MA): This is the simple moving average (SMA) of the closing prices over a specified period. A common choice is a 20-period SMA.
Upper Band (UB): This is the sum of the 20-period SMA and two times the 20-period standard deviation (SD) of the closing prices.
Lower Band (LB): This is the 20-period SMA minus two times the 20-period SD of the closing prices.

How to calculate  Bollinger Bands in Python with a pandas dataframe

# Calculate the 20-period Simple Moving Average (SMA)
data['SMA'] = data['Close'].rolling(window=20).mean()

# Calculate the 20-period Standard Deviation (SD)
data['SD'] = data['Close'].rolling(window=20).std()

# Calculate the Upper Bollinger Band (UB) and Lower Bollinger Band (LB)
data['UB'] = data['SMA'] + 2 * data['SD']
data['LB'] = data['SMA'] - 2 * data['SD']



Step 3: Interpreting Bollinger Bands

Bollinger Bands can be interpreted as follows:

When the price moves close to the upper band (UB), it may indicate overbought conditions, suggesting a potential price reversal to the downside.
Conversely, when the price approaches the lower band (LB), it may indicate oversold conditions, suggesting a potential price reversal to the upside.
The middle band (MA) represents the average price over the specified period and can serve as a reference point.
Traders often look for price crossovers of the bands or significant price deviations from the bands as potential trading signals.

Bollinger Z-Score
The z-score (Z) measures the difference (direction) between the closing price (C) from the mean (n-period moving average) given the n-period standard deviation(s). Thus, the formula becomes: Positive or negative values show that the closing price (C) is above (C>µ) or below (C<µ) the mean, respectively.

Output return a Json Dictionary with a key--> Bollinger-z-Score and the value the calculated Z-score
"""

In [None]:
# I will pass you the instruction of how to create Bollinger Bands and how to interpretate it.
# Context = {context}

In [None]:
instruction =f"""You are a bot specialized in trading and creating Alpha. I will pass you a Dataframe of the historical prices with the following columns {columns}.


can you calculate the Bollinger Z-Score for a n-period of 20?

"You must always return valid JSON fenced by a markdown code block. Do not return any additional text."

"""

In [None]:
pandas_df_agent = create_pandas_dataframe_agent(
ChatOpenAI(temperature=0, model="gpt-4o"),
df=data,
verbose=True,
agent_type="tool-calling",
    return_intermediate_steps=True,
    handle_parsing_errors=True,
     allow_dangerous_code=True
)

In [None]:
results = pandas_df_agent.invoke(instruction)

In [None]:
results.keys()

In [None]:
results['output']