# Volatility Scan Analysis

This notebook analyzes the results from the volatility scanner to identify optimal trading candidates with high volatility and low spread.
We filter for symbols with:
1. Spread% < 0.15% (and > 0%)
2. Avg Volume in the top 60% (>= 40th percentile)


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import glob
import os

# Set style
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = [12, 6]

## Load Data

Find the latest volatility scan CSV file.

In [None]:
# Find latest scan file
list_of_files = glob.glob('volatility_scan_*.csv') 
if not list_of_files:
    print("No scan files found.")
else:
    latest_file = max(list_of_files, key=os.path.getctime)
    print(f"Loading: {latest_file}")
    
    df = pd.read_csv(latest_file)
    # Ensure Avg Volume exists, if not compatible with old scans fill with 0
    if 'Avg Volume' not in df.columns:
        df['Avg Volume'] = 0
        
    print(f"Total symbols: {len(df)}")
    display(df.head())

## Data Overview

In [None]:
df.describe()

## Exploratory Data Analysis

Let's look at the distribution of Volatility and Spread.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Volatility Distribution
sns.histplot(data=df, x='Volatility%', bins=50, kde=True, ax=axes[0])
axes[0].set_title('Distribution of Volatility%')

# Spread Distribution (filtering outliers for better view)
spread_view = df[df['Spread%'] < 2.0]  # Filter extreme spreads for visualization
sns.histplot(data=spread_view, x='Spread%', bins=50, kde=True, ax=axes[1], color='orange')
axes[1].set_title('Distribution of Spread% (View < 2.0%)')

plt.tight_layout()
plt.show()

## Volume Analysis

Analyze the volume distribution to filter out illiquid symbols.
We will filter for the top 45% of symbols by volume (>= 55th percentile).

In [None]:
# Calculate Thresholds
vol_cutoff = df['Volatility%'].quantile(0.55)
spread_cutoff = df['Spread%'].quantile(0.45)
volume_cutoff = df['Avg Volume'].quantile(0.55)

print(f"Volatility Cutoff (Top 45%): >= {vol_cutoff:.4f}%")
print(f"Spread Cutoff (Bottom 45%): <= {spread_cutoff:.4f}%")
print(f"Volume Cutoff (Top 45%): >= {volume_cutoff:.0f}")

plt.figure(figsize=(10, 5))
# Filter extreme volume outliers for better visualization if needed
sns.histplot(data=df, x='Avg Volume', bins=50, log_scale=True, kde=False)
plt.axvline(x=volume_cutoff, color='r', linestyle='--', label=f'55th Percentile ({volume_cutoff:.0f})')
plt.title('Distribution of Avg Volume (Log Scale)')
plt.legend()
plt.show()

### Volatility vs Spread Scatter Plot

Ideally, we want symbols in the **top-left corner** (High Volatility, Low Spread).
We color code by Volume to see liquidity.

In [None]:
plt.figure(figsize=(12, 8))
# Filter for better visualization focus
plot_df = df[(df['Spread%'] < 1.0) & (df['Volatility%'] < 5.0)]

sns.scatterplot(
    data=plot_df, 
    x='Spread%', 
    y='Volatility%', 
    hue='Avg Volume',
    size='Avg Volume',
    sizes=(10, 200),
    alpha=0.6,
    palette='viridis'
)

plt.title('Volatility% vs Spread% (Sized/Colored by Volume)')
plt.axvline(x=0.15, color='r', linestyle='--', label='Spread Target < 0.15%')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()

## Filter Candidates

We apply the intersection of the "Best 45%" for each metric to find the **Sweet Spot**:
1. **Volatility%**: Top 45% (>= 55th percentile)
2. **Spread%**: Bottom 45% (<= 45th percentile)
3. **Avg Volume**: Top 45% (>= 55th percentile)

In [None]:
# Apply Filters
candidates = df[
    (df['Volatility%'] >= vol_cutoff) &
    (df['Spread%'] <= spread_cutoff) &
    (df['Avg Volume'] >= volume_cutoff)
].copy()

# --- Filter Analysis ---
print(f"--- Filter Efficiency Analysis ---")
print(f"Total Symbols: {len(df)}")

pass_vol = len(df[df['Volatility%'] >= vol_cutoff])
pass_spread = len(df[df['Spread%'] <= spread_cutoff])
pass_volume = len(df[df['Avg Volume'] >= volume_cutoff])

print(f"Pass Volatility (>= {vol_cutoff:.4f}%): {pass_vol} ({pass_vol/len(df):.1%})")
print(f"Pass Spread     (<= {spread_cutoff:.4f}%): {pass_spread} ({pass_spread/len(df):.1%})")
print(f"Pass Volume     (>= {volume_cutoff:.0f}):   {pass_volume} ({pass_volume/len(df):.1%})")

# Calculate drops
print(f"Intersection (All 3): {len(candidates)} ({len(candidates)/len(df):.1%})")
print("-" * 30)

# Sort by Volatility
candidates = candidates.sort_values(by='Volatility%', ascending=False)

print(f"Found {len(candidates)} candidates in the 'Sweet Spot'.")

# Display Top 20
display_cols = ['Symbol', 'Volatility%', 'Spread%', 'Avg Volume', 'Price', 'Path']
display(candidates[display_cols].head(20))

## Export Results

In [None]:
output_file = 'filtered_candidates.csv'
candidates.to_csv(output_file, index=False)
print(f"Saved {len(candidates)} candidates to {output_file}")