# Stock Chart Pattern Recognition Using Deep Learning (CRISP-DM)

This notebook follows the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology to prepare stock market data for Head and Shoulders (H&S) pattern recognition using Deep Learning.

## 1Ô∏è‚É£ Business Understanding & Data Understanding

Objective: Use Deep Learning to identify and predict outcomes of Head and Shoulders (H&S) and Inverse Head and Shoulders (IH&S) patterns, leveraging comprehensive Technical and Fundamental features.



1.1 Import Libraries and Setup

In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from cassandra.cluster import Cluster
from datetime import datetime
import time
import importlib

# Technical Analysis Libraries
from ta.trend import EMAIndicator
from sklearn.preprocessing import MinMaxScaler


## 1.2 Data Understanding
The data comes from the SETTRADE API and is stored in a Cassandra database.

In [2]:
# ==========================================
# 1Ô∏è‚É£ ‡πÄ‡∏ä‡∏∑‡πà‡∏≠‡∏°‡∏ï‡πà‡∏≠ Cassandra ‡πÅ‡∏•‡∏∞‡πÄ‡∏ï‡∏£‡∏µ‡∏¢‡∏° Keyspace/Table
# ==========================================
try:
    cluster = Cluster(['127.0.0.1'], port=9042)
    session = cluster.connect()
    session.execute("""
        CREATE KEYSPACE IF NOT EXISTS stock_data
        WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
    """)
    session.set_keyspace('stock_data')
    
    # ‡∏ï‡∏£‡∏ß‡∏à‡∏™‡∏≠‡∏ö‡πÅ‡∏•‡∏∞‡∏™‡∏£‡πâ‡∏≤‡∏á Table ‡∏´‡∏≤‡∏Å‡∏¢‡∏±‡∏á‡πÑ‡∏°‡πà‡∏°‡∏µ
    session.execute("""
        CREATE TABLE IF NOT EXISTS candlestick_data (
            symbol text,
            time timestamp,
            open float,
            high float,
            low float,
            close float,
            volume bigint,
            value float,
            PRIMARY KEY (symbol, time)
        ) WITH CLUSTERING ORDER BY (time ASC);
    """)
    print("‚úÖ Keyspace ‡πÅ‡∏•‡∏∞ Table ‡∏û‡∏£‡πâ‡∏≠‡∏°‡πÉ‡∏ä‡πâ‡∏á‡∏≤‡∏ô!")

except Exception as e:
    print(f"‚ùå Error during Cassandra connection/setup: {e}")
    print("‡πÇ‡∏õ‡∏£‡∏î‡∏ï‡∏£‡∏ß‡∏à‡∏™‡∏≠‡∏ö‡∏ß‡πà‡∏≤ Cassandra Server (127.0.0.1:9042) ‡πÑ‡∏î‡πâ‡∏£‡∏±‡∏ô‡∏≠‡∏¢‡∏π‡πà‡∏´‡∏£‡∏∑‡∏≠‡πÑ‡∏°‡πà")

‚úÖ Keyspace ‡πÅ‡∏•‡∏∞ Table ‡∏û‡∏£‡πâ‡∏≠‡∏°‡πÉ‡∏ä‡πâ‡∏á‡∏≤‡∏ô!


## 1.3 Data Extraction (OHLCV)


‚ùå ‡πÄ‡∏Å‡∏¥‡∏î‡∏Ç‡πâ‡∏≠‡∏ú‡∏¥‡∏î‡∏û‡∏•‡∏≤‡∏î‡∏ó‡∏µ‡πà‡πÑ‡∏°‡πà‡∏Ñ‡∏≤‡∏î‡∏Ñ‡∏¥‡∏î: name 'get_candlestick_data' is not defined


## 2Ô∏è‚É£ Data Preparation

‡∏Ç‡∏±‡πâ‡∏ô‡∏ï‡∏≠‡∏ô‡∏ô‡∏µ‡πâ‡∏à‡∏∞‡∏£‡∏ß‡∏° Market Cap, Technical Grouping ‡πÅ‡∏•‡∏∞ Fundamental Data ‡πÄ‡∏Ç‡πâ‡∏≤‡∏î‡πâ‡∏ß‡∏¢‡∏Å‡∏±‡∏ô

2.1 Feature Engineering: Market Cap & Technical Grouping

In [4]:
def create_technical_features_and_grouping(df: pd.DataFrame) -> pd.DataFrame:
    """‡∏Ñ‡∏≥‡∏ô‡∏ß‡∏ì Market Cap (Proxy), EMA, RSI ‡πÅ‡∏•‡∏∞‡∏Å‡∏≥‡∏´‡∏ô‡∏î‡∏Å‡∏•‡∏∏‡πà‡∏°‡∏™‡∏±‡∏ç‡∏ç‡∏≤‡∏ì‡∏ó‡∏≤‡∏á‡πÄ‡∏ó‡∏Ñ‡∏ô‡∏¥‡∏Ñ"""
    
    # --- 1. Add Market Cap column ---
    # Market Cap Proxy = Close Price * Volume (‡∏°‡∏π‡∏•‡∏Ñ‡πà‡∏≤‡∏Å‡∏≤‡∏£‡∏ã‡∏∑‡πâ‡∏≠‡∏Ç‡∏≤‡∏¢)
    df['MarketCap_Proxy'] = df['close'] * df['volume']
    
    # --- 2. ‡∏Ñ‡∏≥‡∏ô‡∏ß‡∏ì Indicators ---
    df['EMA5'] = EMAIndicator(close=df['close'], window=5, fillna=False).ema_indicator()
    df['EMA15'] = EMAIndicator(close=df['close'], window=15, fillna=False).ema_indicator()
    df['EMA35'] = EMAIndicator(close=df['close'], window=35, fillna=False).ema_indicator()
    df['EMA89'] = EMAIndicator(close=df['close'], window=89, fillna=False).ema_indicator()
    df['EMA200'] = EMAIndicator(close=df['close'], window=200, fillna=False).ema_indicator()
    df['RSI'] = RelativeStrengthIndex(close=df['close'], window=14, fillna=False).rsi()
    
    # --- 3. ‡∏Å‡∏≥‡∏´‡∏ô‡∏î‡∏Å‡∏•‡∏∏‡πà‡∏°‡∏™‡∏±‡∏ç‡∏ç‡∏≤‡∏ì (Categorization) ---
    conditions = [
        # a: Strong Momentum / Overbought
        (df['close'] >= df['EMA5']) & (df['RSI'] >= 70),
        
        # b: Clear Uptrend
        (df['close'] >= df['EMA35']) & (df['EMA35'] >= df['EMA89']),
        
        # c: Sideways above EMA89 (Short-term EMAs close together)
        (df['close'] >= df['EMA89']) & 
        (np.abs(df['EMA5'] - df['EMA35']) / df['close'] < 0.01), # ‡πÉ‡∏ä‡πâ EMA5, EMA35
        
        # d: Downtrend
        (df['close'] < df['EMA89']) & (df['close'] < df['EMA200']) & (df['EMA89'] < df['EMA200']),
        
        # e: Crash (Strong descending order and oversold)
        (df['close'] < df['EMA5']) & (df['EMA5'] < df['EMA15']) & (df['EMA15'] < df['EMA35']) & 
        (df['EMA35'] < df['EMA89']) & (df['EMA89'] < df['EMA200']) & (df['RSI'] <= 30)
    ]
    
    choices = ['a_Overbought', 'b_ClearUptrend', 'c_SidewaysAbove89', 'd_Downtrend', 'e_Crash']
    
    df['Technical_Group'] = np.select(conditions, choices, default='f_Neutral')
    
    # --- 4. Drop NAN Data ---
    # Drop rows ‡∏ó‡∏µ‡πà‡πÄ‡∏õ‡πá‡∏ô NaN (‡πÄ‡∏Å‡∏¥‡∏î‡∏à‡∏≤‡∏Å‡∏Å‡∏≤‡∏£‡∏Ñ‡∏≥‡∏ô‡∏ß‡∏ì EMA200)
    df_cleaned = df.dropna().copy()
    
    print(f"‚úÖ NaN Data Dropped: {len(df) - len(df_cleaned)} rows removed (Initial trading period / Indicator lookback)")
    
    return df_cleaned

## 2.2 Feature Engineering: Fundamental Data (Mock)

‡πÄ‡∏ô‡∏∑‡πà‡∏≠‡∏á‡∏à‡∏≤‡∏Å‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏• Fundamental (EPS, PE, PBV, Yield) ‡πÑ‡∏°‡πà‡πÑ‡∏î‡πâ‡∏£‡∏ß‡∏°‡∏≠‡∏¢‡∏π‡πà‡πÉ‡∏ô Table candlestick_data ‡πÄ‡∏£‡∏≤‡∏à‡∏∞‡∏™‡∏£‡πâ‡∏≤‡∏á Mock-up Data ‡πÇ‡∏î‡∏¢‡∏≠‡∏¥‡∏á‡∏à‡∏≤‡∏Å Interpretation ‡∏Ç‡∏≠‡∏á‡∏Ñ‡∏∏‡∏ì

In [5]:
def add_fundamental_data(df: pd.DataFrame) -> pd.DataFrame:
    """‡∏à‡∏≥‡∏•‡∏≠‡∏á‡∏Å‡∏≤‡∏£‡πÄ‡∏û‡∏¥‡πà‡∏°‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏• Fundamental (PE, PBV, Yield) ‡∏ï‡∏≤‡∏°‡πÄ‡∏á‡∏∑‡πà‡∏≠‡∏ô‡πÑ‡∏Ç‡∏ó‡∏µ‡πà‡∏Å‡∏≥‡∏´‡∏ô‡∏î"""
    
    # 1. EPS (Negative, indicating a loss) - ‡πÉ‡∏ä‡πâ‡∏Ñ‡πà‡∏≤‡∏™‡∏∏‡πà‡∏°‡πÄ‡∏•‡πá‡∏Å‡∏ô‡πâ‡∏≠‡∏¢
    df['EPS'] = np.random.uniform(-0.5, 0.5, size=len(df))
    
    # 2. PE (Zero due to company losses)
    # ‡∏ñ‡πâ‡∏≤ EPS <= 0 ‡πÉ‡∏´‡πâ PE = 0.0, ‡∏ñ‡πâ‡∏≤‡∏°‡∏µ‡∏Å‡∏≥‡πÑ‡∏£ (EPS > 0) ‡πÉ‡∏´‡πâ‡∏Ñ‡∏≥‡∏ô‡∏ß‡∏ì PE ‡∏à‡∏£‡∏¥‡∏á
    df['PE'] = df.apply(
        lambda row: 0.0 if row['EPS'] <= 0 else (row['close'] / row['EPS']), 
        axis=1
    )
    
    # 3. PBV (Relatively high: 1.5 - 5.0)
    df['PBV'] = np.random.uniform(1.5, 5.0, size=len(df))
    
    # 4. PercentYield (Dividend per share / stock price)
    df['PercentYield'] = np.random.uniform(0.0, 0.05, size=len(df))
    
    # Clean up PE where it might be Inf
    df['PE'].replace([np.inf, -np.inf], 0.0, inplace=True)
    
    print("‚úÖ Fundamental Data Added (Mocked based on Interpretation)")
    return df

if not df_raw.empty:
    df_temp_features = create_technical_features_and_grouping(df_raw)
    df_final_features = add_fundamental_data(df_temp_features)
    
    df_model_ready = df_final_features.copy()
    
    print("\n--- Summary of Data Preparation (Ready for DL) ---")
    print(df_model_ready[['close', 'EMA89', 'RSI', 'MarketCap_Proxy', 'Technical_Group', 'PE']].tail(5).to_markdown(index=True))


NameError: name 'df_raw' is not defined

## 3Ô∏è‚É£ Modeling & 4Ô∏è‚É£ Evaluation (Data Labeling Stage)

‡πÉ‡∏ô‡∏™‡πà‡∏ß‡∏ô‡∏ô‡∏µ‡πâ ‡πÄ‡∏£‡∏≤‡∏à‡∏∞‡πÅ‡∏™‡∏î‡∏á‡∏Å‡∏≤‡∏£‡πÉ‡∏ä‡πâ Logic Head and Shoulders ‡πÄ‡∏û‡∏∑‡πà‡∏≠‡πÄ‡∏õ‡πá‡∏ô Data Labeling (Target) ‡∏™‡∏≥‡∏´‡∏£‡∏±‡∏ö‡∏Å‡∏≤‡∏£‡∏ù‡∏∂‡∏Å‡πÇ‡∏°‡πÄ‡∏î‡∏• Deep Learning ‡∏ã‡∏∂‡πà‡∏á‡πÄ‡∏õ‡πá‡∏ô‡∏™‡πà‡∏ß‡∏ô‡∏™‡∏≥‡∏Ñ‡∏±‡∏ç‡∏Ç‡∏≠‡∏á‡∏Å‡∏≤‡∏£‡∏ô‡∏≥‡πÄ‡∏™‡∏ô‡∏≠

## 3.1 Head and Shoulders Detection and Visualization

In [None]:
# ==========================================================
# ‡∏ü‡∏±‡∏á‡∏Å‡πå‡∏ä‡∏±‡∏ô‡∏ß‡∏≤‡∏î H&S Pattern ‡∏î‡πâ‡∏ß‡∏¢ Plotly
# ==========================================================
def plot_classic_pattern(df: pd.DataFrame, patterns: list, symbol: str):
    
    # 1. ‡∏™‡∏£‡πâ‡∏≤‡∏á Candlestick Figure ‡∏û‡∏∑‡πâ‡∏ô‡∏ê‡∏≤‡∏ô
    fig = go.Figure()
    fig.add_trace(go.Candlestick(
        x=df.index, open=df['open'], high=df['high'], low=df['low'], close=df['close'], name='Price'
    ))
    
    # ‡∏õ‡∏£‡∏±‡∏ö Layout
    fig.update_layout(
        title=f'üìà {symbol} - Head & Shoulders Detection (Data Labeling)',
        xaxis_rangeslider_visible=False,
        height=700,
        template='plotly_white'
    )

    # 2. ‡∏ß‡∏≤‡∏î Pattern ‡πÅ‡∏•‡∏∞ Neckline
    for i, pattern in enumerate(patterns):
        l_idx, h_idx, r_idx = pattern['left_idx'], pattern['head_idx'], pattern['right_idx']
        
        l_time, h_time, r_time = df.index[l_idx], df.index[h_idx], df.index[r_idx]
        
        line_color = '#EF553B' if pattern['type'] == 'H&S' else '#00CC96'
        
        # ‡∏ß‡∏≤‡∏î‡πÄ‡∏™‡πâ‡∏ô Pattern (‡πÑ‡∏´‡∏•‡πà‡∏ã‡πâ‡∏≤‡∏¢-‡∏´‡∏±‡∏ß-‡πÑ‡∏´‡∏•‡πà‡∏Ç‡∏ß‡∏≤)
        fig.add_trace(go.Scatter(
            x=[l_time, h_time, r_time],
            y=[df['close'].iloc[l_idx], df['close'].iloc[h_idx], df['close'].iloc[r_idx]],
            mode='lines+markers',
            line=dict(color=line_color, width=3),
            marker=dict(size=8, symbol='circle'),
            name=f"{pattern['type']} {i+1}",
            showlegend=True
        ))
        
        # 3. Annotations
        fig.add_annotation(
            x=h_time, y=df['high'].iloc[h_idx] * 1.01,
            text=f"{pattern['type']} Detected",
            showarrow=True,
            arrowhead=2,
            font=dict(color=line_color, size=10, weight='bold'),
            yshift=10 if pattern['type'] == 'H&S' else -10
        )
        
    fig.show()

# --- Main Detection Logic ---
if 'df_model_ready' in locals() and not df_model_ready.empty:
    
    print("\n--- üìâ Applying Classic Detector for DL Label Generation ---")
    
    # üö® ‡πÉ‡∏ä‡πâ‡∏ü‡∏±‡∏á‡∏Å‡πå‡∏ä‡∏±‡∏ô‡∏ï‡∏£‡∏ß‡∏à‡∏à‡∏±‡∏ö
    classic_patterns = detect_head_shoulders(df_model_ready, distance=10, tolerance=0.04)
    
    print(f"‚úÖ ‡∏û‡∏ö‡∏£‡∏π‡∏õ‡πÅ‡∏ö‡∏ö H&S/IH&S ‡∏ó‡∏±‡πâ‡∏á‡∏´‡∏°‡∏î: {len(classic_patterns)} ‡∏à‡∏∏‡∏î")
    
    if classic_patterns:
        plot_classic_pattern(df_model_ready, classic_patterns, symbol=STOCK_SYMBOL)
    else:
        print("üí° ‡πÑ‡∏°‡πà‡∏û‡∏ö‡∏£‡∏π‡∏õ‡πÅ‡∏ö‡∏ö Head & Shoulders ‡∏´‡∏£‡∏∑‡∏≠ Inverse Head & Shoulders ‡πÉ‡∏ô‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏ó‡∏µ‡πà‡∏Å‡∏≥‡∏´‡∏ô‡∏î")


## 5Ô∏è‚É£ Deployment (Final Feature Preparation for DL)

‡∏Ç‡∏±‡πâ‡∏ô‡∏ï‡∏≠‡∏ô‡∏ô‡∏µ‡πâ‡πÄ‡∏õ‡πá‡∏ô‡∏Å‡∏≤‡∏£‡πÅ‡∏õ‡∏•‡∏á‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏ó‡∏µ‡πà‡πÄ‡∏ï‡∏£‡∏µ‡∏¢‡∏°‡πÑ‡∏ß‡πâ (Price, Volume, Technical, Fundamental) ‡πÉ‡∏´‡πâ‡∏≠‡∏¢‡∏π‡πà‡πÉ‡∏ô‡∏£‡∏π‡∏õ‡πÅ‡∏ö‡∏ö 3D Array (Time Series Sequence) ‡∏ó‡∏µ‡πà‡∏û‡∏£‡πâ‡∏≠‡∏°‡∏õ‡πâ‡∏≠‡∏ô‡πÄ‡∏Ç‡πâ‡∏≤‡πÇ‡∏°‡πÄ‡∏î‡∏• Deep Learning (CNN-LSTM ‡∏´‡∏£‡∏∑‡∏≠ Transformer)

## 5.1 Data Scaling and Sequence Creation

In [None]:
# ------------------------------------------------------------------
# ü§ñ Final Step: Preparing 3D Array for Deep Learning Model
# ------------------------------------------------------------------
if 'df_model_ready' in locals() and not df_model_ready.empty:
    
    print("\n--- üß† Deployment Setup: Data Scaling and Sequence Creation ---")
    
    # 1. ‡πÄ‡∏•‡∏∑‡∏≠‡∏Å Features ‡∏ó‡∏±‡πâ‡∏á‡∏´‡∏°‡∏î‡∏ó‡∏µ‡πà‡∏à‡∏∞‡πÉ‡∏ä‡πâ‡∏õ‡πâ‡∏≠‡∏ô‡πÄ‡∏Ç‡πâ‡∏≤‡πÇ‡∏°‡πÄ‡∏î‡∏• DL
    features = [
        'open', 'high', 'low', 'close', 'volume', 'MarketCap_Proxy', 
        'EMA5', 'EMA15', 'EMA35', 'EMA89', 'EMA200', 'RSI',
        'EPS', 'PE', 'PBV', 'PercentYield' 
    ]
    
    df_dl = df_model_ready[features].copy()
    
    # 2. Normalization (MinMaxScaler)
    print(f"üìê Scaling {len(features)} features...")
    scaler = MinMaxScaler(feature_range=(0, 1))
    df_scaled_values = scaler.fit_transform(df_dl)
    df_scaled = pd.DataFrame(df_scaled_values, columns=features, index=df_dl.index)
    
    # 3. Creating Sequences (Sliding Window)
    # ‡πÇ‡∏°‡πÄ‡∏î‡∏•‡∏à‡∏∞‡∏°‡∏≠‡∏á‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏¢‡πâ‡∏≠‡∏ô‡∏´‡∏•‡∏±‡∏á 30 ‡∏ß‡∏±‡∏ô (SEQUENCE_LENGTH) ‡πÄ‡∏û‡∏∑‡πà‡∏≠‡∏ó‡∏≥‡∏ô‡∏≤‡∏¢‡∏ß‡∏±‡∏ô‡∏ñ‡∏±‡∏î‡πÑ‡∏õ
    def create_sequences(data, sequence_length):
        xs = []
        for i in range(len(data) - sequence_length):
            x = data.iloc[i:(i + sequence_length)]
            xs.append(x.values)
        return np.array(xs)

    X_sequences = create_sequences(df_scaled, SEQUENCE_LENGTH)
    
    # 4. Creating Labels (Target Y) - Mock for Demo
    # ‡∏Å‡∏≤‡∏£‡∏ó‡∏≥‡∏ô‡∏≤‡∏¢‡∏á‡πà‡∏≤‡∏¢‡πÜ: ‡∏£‡∏≤‡∏Ñ‡∏≤‡∏õ‡∏¥‡∏î‡πÉ‡∏ô 5 ‡∏ß‡∏±‡∏ô‡∏Ç‡πâ‡∏≤‡∏á‡∏´‡∏ô‡πâ‡∏≤‡∏™‡∏π‡∏á‡∏Å‡∏ß‡πà‡∏≤‡∏ß‡∏±‡∏ô‡∏ô‡∏µ‡πâ‡∏´‡∏£‡∏∑‡∏≠‡πÑ‡∏°‡πà (Binary Classification)
    FUTURE_PREDICT_DAYS = 5
    y_raw = (df_dl['close'].shift(-FUTURE_PREDICT_DAYS) > df_dl['close']).astype(int)
    
    # ‡∏õ‡∏£‡∏±‡∏ö Label ‡πÉ‡∏´‡πâ‡πÄ‡∏Ç‡πâ‡∏≤‡∏Å‡∏±‡∏ö Sequence length
    y_labels = y_raw.iloc[SEQUENCE_LENGTH:].values
    y_labels = y_labels[:-FUTURE_PREDICT_DAYS].copy() 
    
    # ‡∏ï‡∏±‡∏î X_sequences ‡πÉ‡∏´‡πâ‡∏°‡∏µ‡∏à‡∏≥‡∏ô‡∏ß‡∏ô Sample ‡πÄ‡∏ó‡πà‡∏≤‡∏Å‡∏±‡∏ö Y_labels
    X_sequences = X_sequences[:-FUTURE_PREDICT_DAYS]

    # 5. ‡∏™‡∏£‡∏∏‡∏õ‡∏ú‡∏•‡∏•‡∏±‡∏û‡∏ò‡πå
    print("\n--- DL Model Input Dimensions ---")
    print(f"Sequence Length (Time Steps): {SEQUENCE_LENGTH} ‡∏ß‡∏±‡∏ô")
    print(f"Number of Features: {len(features)} ‡∏ï‡∏±‡∏ß")
    print(f"Input Data Shape (Samples, Time Steps, Features): {X_sequences.shape}")
    print(f"Label Data Shape (Samples): {y_labels.shape}")     
    print("\nüí° ‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏ñ‡∏π‡∏Å‡πÅ‡∏õ‡∏•‡∏á‡πÄ‡∏õ‡πá‡∏ô 3D Array ‡πÄ‡∏£‡∏µ‡∏¢‡∏ö‡∏£‡πâ‡∏≠‡∏¢‡πÅ‡∏•‡πâ‡∏ß ‡∏û‡∏£‡πâ‡∏≠‡∏°‡∏™‡∏≥‡∏´‡∏£‡∏±‡∏ö‡∏Å‡∏≤‡∏£‡∏ù‡∏∂‡∏Å‡πÇ‡∏°‡πÄ‡∏î‡∏• Deep Learning.")
