# High/Low Probability Analysis

This notebook calculates the probability that the Day's High or Low has already been established at any given bar (5-minute interval) of the trading day.

**Logic:**
1. **Load Data:** Read 5-minute OHLC data.
2. **Identify Day Extremes:** For each date, find the actual High and Low of the entire session.
3. **Track Progress:** For every bar, calculate the "High So Far" and "Low So Far".
4. **Compare:** Check if the "High So Far" equals the "Day High". If yes, the High has been seen.
5. **Aggregate:** Group by bar number (1st bar, 2nd bar, etc.) and calculate the % of days where the extreme was already seen.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Configuration
DATA_PATH = "nifty50_minute_complete-5min.csv"

# Optional: Set specific date range (Set to None to use all data)
# START_DATE = "2023-01-01"
# END_DATE = "2023-12-31"
START_DATE = None
END_DATE = None

In [2]:
print("Loading data...")
df = pd.read_csv(DATA_PATH)

# 1. Parse Dates
# Assuming 'date' column exists and is in a parseable format
df['datetime'] = pd.to_datetime(df['date'])
df['date_only'] = df['datetime'].dt.date

# 2. Sort Data
df = df.sort_values('datetime').reset_index(drop=True)

# 3. Filter Range (if configured)
if START_DATE:
    df = df[df['datetime'] >= pd.to_datetime(START_DATE)]
if END_DATE:
    df = df[df['datetime'] <= pd.to_datetime(END_DATE)]

print(f"Data loaded. Rows: {len(df)}, Days: {df['date_only'].nunique()}")
print(f"Range: {df['date_only'].min()} to {df['date_only'].max()}")
df.head()

Loading data...
Data loaded. Rows: 186482, Days: 2489
Range: 2015-01-09 to 2025-02-07


Unnamed: 0,date,open,high,low,close,open-s,high-s,low-s,close-s,datetime,date_only
0,2015-01-09 09:15:00,8285.45,8301.3,8285.45,8301.2,,,,,2015-01-09 09:15:00,2015-01-09
1,2015-01-09 09:20:00,8300.5,8303.0,8293.25,8301.0,15.05,1.7,7.8,-0.2,2015-01-09 09:20:00,2015-01-09
2,2015-01-09 09:25:00,8301.65,8302.55,8286.8,8294.15,1.15,-0.45,-6.45,-6.85,2015-01-09 09:25:00,2015-01-09
3,2015-01-09 09:30:00,8294.1,8295.75,8280.65,8288.5,-7.55,-6.8,-6.15,-5.65,2015-01-09 09:30:00,2015-01-09
4,2015-01-09 09:35:00,8289.1,8290.45,8278.0,8283.45,-5.0,-5.3,-2.65,-5.05,2015-01-09 09:35:00,2015-01-09


In [3]:
# --- CORE LOGIC ---

# 1. Calculate the 'True' High and Low for each Day
df['day_high'] = df.groupby('date_only')['high'].transform('max')
df['day_low'] = df.groupby('date_only')['low'].transform('min')

# 2. Calculate the 'Running' High and Low (what we see up to that moment)
df['high_so_far'] = df.groupby('date_only')['high'].cummax()
df['low_so_far'] = df.groupby('date_only')['low'].cummin()

# 3. Determine if the High/Low has been established
# Since high_so_far can never exceed day_high, equality checks if we've reached the peak.
df['is_day_high_set'] = (df['high_so_far'] >= df['day_high'])
df['is_day_low_set'] = (df['low_so_far'] <= df['day_low'])
df['is_either_set'] = df['is_day_high_set'] | df['is_day_low_set']

# 4. Assign Bar Indices (1st bar of the day, 2nd bar, etc.)
df['bar_index'] = df.groupby('date_only').cumcount() + 1

df[['datetime', 'bar_index', 'high', 'day_high', 'high_so_far', 'is_day_high_set']].head(10)

Unnamed: 0,datetime,bar_index,high,day_high,high_so_far,is_day_high_set
0,2015-01-09 09:15:00,1,8301.3,8303.0,8301.3,False
1,2015-01-09 09:20:00,2,8303.0,8303.0,8303.0,True
2,2015-01-09 09:25:00,3,8302.55,8303.0,8303.0,True
3,2015-01-09 09:30:00,4,8295.75,8303.0,8303.0,True
4,2015-01-09 09:35:00,5,8290.45,8303.0,8303.0,True
5,2015-01-09 09:40:00,6,8288.3,8303.0,8303.0,True
6,2015-01-09 09:45:00,7,8287.65,8303.0,8303.0,True
7,2015-01-09 09:50:00,8,8284.25,8303.0,8303.0,True
8,2015-01-09 09:55:00,9,8283.6,8303.0,8303.0,True
9,2015-01-09 10:00:00,10,8287.35,8303.0,8303.0,True


In [4]:
# --- PROBABILITY CALCULATION ---

stats = df.groupby('bar_index').agg(
    total_days=('date_only', 'nunique'),
    high_set_count=('is_day_high_set', 'sum'),
    low_set_count=('is_day_low_set', 'sum'),
    either_set_count=('is_either_set', 'sum')
)

# Calculate Probabilities
stats['prob_high_set'] = stats['high_set_count'] / stats['total_days']
stats['prob_low_set'] = stats['low_set_count'] / stats['total_days']
stats['prob_either_set'] = stats['either_set_count'] / stats['total_days']

# Display first 20 bars
stats[['bar_index', 'prob_high_set', 'prob_low_set', 'prob_either_set']].head(20)

KeyError: "['bar_index'] not in index"

In [5]:
# --- VISUALIZATION ---

plt.figure(figsize=(12, 6))

plt.plot(stats['bar_index'], stats['prob_high_set'], label='Probability High is Set', color='green', linewidth=2)
plt.plot(stats['bar_index'], stats['prob_low_set'], label='Probability Low is Set', color='red', linewidth=2)
plt.plot(stats['bar_index'], stats['prob_either_set'], label='Prob High OR Low is Set', color='blue', linestyle='--')

plt.title('Probability that Day High/Low is Established by Bar N', fontsize=14)
plt.xlabel('Bar Index (5-min intervals)', fontsize=12)
plt.ylabel('Probability (0.0 - 1.0)', fontsize=12)
plt.legend()
plt.grid(True, alpha=0.3)
plt.xlim(1, 75) # Assuming standard 75 bars in a trading day
plt.ylim(0, 1.05)

plt.show()

KeyError: 'bar_index'

<Figure size 1200x600 with 0 Axes>