## DIFFERENCE BETWEEN MODEL 2 AND MODEL 1
### 1. Data Preprocessing  
| **Aspect**             | **Model 2**                          | **Model 1**                             |  
|------------------------|----------------------------------------------------|------------------------------------------------------|  
| **Normalization**       | ✅ Correct: Split first → normalize training data only | ❌ Flawed: Normalizes entire dataset before splitting |  
| **Signal Reshaping**    | Uses `GlobalAveragePooling1D` (retains temporal trends) | Uses `Flatten()` (loses temporal structure)          |  

---

### 2. Architecture  
| **Component**           | **Model 2**                                        | **Model 1**                                   |  
|-------------------------|----------------------------------------------------|-----------------------------------------------|  
| **CNN Branch**          | - 3 Conv1D layers (128 → 256 → 512)<br>- BatchNorm after each Conv<br>- GlobalAveragePooling1D | - 2 Conv1D layers (32 → 64)<br>- MaxPooling + Flatten |  
| **Code Branch**         | - Uses `Embedding` layer<br>- BatchNorm + L2 regularization | - Simple Dense layers (64 → 32)              |  
| **Fusion**              | - Deeper: Dense(512 → 256 → 128)<br>- L2 regularization | - Simpler: Dense(64 → 32)                   |  
| **Regularization**      | ✅ Strong: Dropout + BatchNorm + L2                | ✅ Basic: Dropout only                        |  

---

### 3. Training Configuration  
| **Parameter**           | **Model 2**                                        | **Model 1**                                   |  
|-------------------------|----------------------------------------------------|-----------------------------------------------|  
| **Learning Rate**       | 0.0001 (safer for complex models)                 | 0.001 (riskier for deep nets)                |  
| **Batch Size**          | 64 (better generalization)                        | Not specified (defaults to 32)               |  
| **Early Stopping**      | Patience=20 (avoids premature stops)              | Patience=10 (may stop too early)             |  

In [1]:
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv('../dataset/dataset.csv')

# --------------------------------------------------
# 1.1 Signal Decoding
# --------------------------------------------------
def parse_signal(signal_str):
    cleaned = signal_str.strip('[]').replace(' ', '')
    parts = cleaned.split('),(')
    complex_samples = []
    for p in parts:
        p = p.replace('(', '').replace(')', '')
        try:
            # Convert directly to complex number
            complex_samples.append(complex(p))
        except ValueError:
            # Skip any invalid entries
            continue
    return np.array(complex_samples[:208])

# Apply to all rows
X_signal = np.array([parse_signal(s) for s in df['received_signal']])
# print(X_signal[:1])
X_signal = np.hstack([X_signal.real, X_signal.imag])  # (15000, 416)

# --------------------------------------------------
# 1.2 Secret Code Handling
# --------------------------------------------------
def parse_secret_code(code_str):
    return np.array([int(x) for x in code_str.strip('[]').split(', ')])

X_secret = np.array([parse_secret_code(c) for c in df['secret_code']])  # (15000, 13)

# --------------------------------------------------
# 1.3 Target Preparation
# --------------------------------------------------
y = df[['jet1_x', 'jet1_y', 'jet1_z', 
        'jet2_x', 'jet2_y', 'jet2_z']].values  # (15000, 6)

In [2]:
print(X_signal[:1])

[[ 0.11931579  0.85771666 -1.33474694 -0.50391545 -2.29890648 -2.36311255
  -1.73337179 -2.09075436 -1.68077154 -2.06851403 -2.23007812 -1.68373539
  -0.99208445 -1.52579929 -2.13198495 -1.542111   -1.572531   -1.17894242
  -0.43579253 -0.37435658 -0.20439189 -0.34568808 -0.3036721   0.78714854
  -0.13114397  0.91876952  0.59695031  0.46554019  0.30731446  0.57871762
   1.3063269   1.32965779  1.84256292  2.62761689  3.8078487   2.76445951
   2.48637366  2.88149825  2.38199167  2.72596637  1.39522053  0.8044313
   0.23753253 -0.8312159  -2.08004069 -2.24298221 -2.99989836 -3.15122902
  -3.34715525 -3.41449118 -1.9398927  -2.12972652 -1.46899167 -0.57031672
  -0.59802137 -0.68256009  0.22612962  0.72654863 -0.07467601 -0.28339311
  -1.24302304 -0.46743044 -1.8572562  -2.77289709 -3.06969035 -3.11564192
  -3.84457788 -3.57200481 -3.62208702 -3.10760311 -2.81848032 -1.19532471
  -0.52973206  1.028434    2.24372655  3.21794968  3.74768431  4.09193022
   3.67424067  1.20070218  0.9215773  -

In [3]:
X_signal

array([[ 0.11931579,  0.85771666, -1.33474694, ...,  1.35653706,
         0.09700217,  0.68696531],
       [-0.035028  , -0.9321786 , -1.07103858, ..., -0.31218129,
        -0.86626219, -1.67845885],
       [-0.01784629, -0.30454525, -0.63100547, ...,  2.48600477,
         2.57460476,  2.38237365],
       ...,
       [ 0.18681767,  0.2127872 , -0.10902881, ..., -1.86084518,
        -2.68116411, -3.5353162 ],
       [ 0.28481301,  1.10820823,  1.38517485, ...,  1.33907262,
         1.06573864,  0.82789565],
       [ 0.19803341,  0.78227132,  0.95982124, ...,  1.25057382,
         1.13393457,  1.30820358]])

In [4]:
X_signal.shape

(15000, 416)

In [5]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# --------------------------------------------------
# 1. Data Loading & Initial Processing
# --------------------------------------------------
# [Keep your existing parsing code for X_signal, X_secret, y]
# But REMOVE all normalization here

# --------------------------------------------------
# 2. Initial Splitting (RAW DATA)
# --------------------------------------------------
# First split: 80% train, 20% temp (using raw unnormalized data)
X_sig_train_raw, X_sig_temp_raw, X_sec_train_raw, X_sec_temp_raw, y_train_raw, y_temp_raw = train_test_split(
    X_signal,  # Raw (15000, 416)
    X_secret,   # Raw (15000, 13)
    y,          # Raw (15000, 6)
    test_size=0.2, 
    random_state=42
)

# Second split: 50% validation, 50% test
X_sig_val_raw, X_sig_test_raw, X_sec_val_raw, X_sec_test_raw, y_val_raw, y_test_raw = train_test_split(
    X_sig_temp_raw, 
    X_sec_temp_raw, 
    y_temp_raw, 
    test_size=0.5, 
    random_state=42
)

# --------------------------------------------------
# 3. Proper Normalization (AFTER Splitting)
# --------------------------------------------------
# 3.1 Signal Normalization
scaler_signal = StandardScaler()
# Flatten training signals for proper scaling
X_sig_train_flat = X_sig_train_raw.reshape(-1, 2)  # (12000*208, 2)
scaler_signal.fit(X_sig_train_flat)

# Transform all sets
def scale_and_reshape(X_raw, scaler):
    X_flat = X_raw.reshape(-1, 2)
    X_scaled = scaler.transform(X_flat)
    return X_scaled.reshape(-1, 208, 2)

X_sig_train = scale_and_reshape(X_sig_train_raw, scaler_signal)
X_sig_val = scale_and_reshape(X_sig_val_raw, scaler_signal)
X_sig_test = scale_and_reshape(X_sig_test_raw, scaler_signal)

# 3.2 Secret Code Normalization
scaler_secret = StandardScaler()
scaler_secret.fit(X_sec_train_raw)

X_sec_train = scaler_secret.transform(X_sec_train_raw)
X_sec_val = scaler_secret.transform(X_sec_val_raw)
X_sec_test = scaler_secret.transform(X_sec_test_raw)

# 3.3 Target Normalization
scaler_target = StandardScaler()
scaler_target.fit(y_train_raw)

y_train = scaler_target.transform(y_train_raw)
y_val = scaler_target.transform(y_val_raw)
y_test = scaler_target.transform(y_test_raw)

# --------------------------------------------------
# 4. Verification
# --------------------------------------------------
print("Training shapes:")
print(f"Signals: {X_sig_train.shape}, Codes: {X_sec_train.shape}, Targets: {y_train.shape}")
print("\nValidation shapes:")
print(f"Signals: {X_sig_val.shape}, Codes: {X_sec_val.shape}, Targets: {y_val.shape}")
print("\nTest shapes:")
print(f"Signals: {X_sig_test.shape}, Codes: {X_sec_test.shape}, Targets: {y_test.shape}")

Training shapes:
Signals: (12000, 208, 2), Codes: (12000, 13), Targets: (12000, 6)

Validation shapes:
Signals: (1500, 208, 2), Codes: (1500, 13), Targets: (1500, 6)

Test shapes:
Signals: (1500, 208, 2), Codes: (1500, 13), Targets: (1500, 6)


In [12]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv1D, MaxPooling1D, Flatten, Dense, concatenate
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.layers import BatchNormalization, GlobalAveragePooling1D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Embedding


# --------------------------------------------------
# 3.1 Dual Input Branches
# --------------------------------------------------
# Branch 1: Radar Signal Processor (CNN)
signal_input = Input(shape=(208, 2))
x = Conv1D(128, 5, activation='relu', padding='same')(signal_input)  # Wider kernel
x = BatchNormalization()(x)
x = MaxPooling1D(2)(x)  # 104 timesteps
x = Conv1D(256, 3, activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling1D(2)(x)  # 52 timesteps
x = Conv1D(512, 3, activation='relu', padding='same')(x)
x = GlobalAveragePooling1D()(x)  # Better than Flatten for temporal data
x = Dense(256, activation='relu', kernel_regularizer='l2')(x)
x = Dropout(0.3)(x)  # Reduced dropout

# Branch 2: Code Breaker (Dense Network)
# Treat secret codes as categorical features
code_input = Input(shape=(13,))
y = Embedding(input_dim=1000, output_dim=8)(code_input)  # Adjust input_dim
y = Flatten()(y)
y = Dense(128, activation='relu', kernel_regularizer='l2')(y)
y = BatchNormalization()(y)
y = Dropout(0.2)(y)
y = Dense(64, activation='relu')(y)


# --------------------------------------------------
# 3.2 Combined Strike Force
merged = concatenate([x, y])
z = Dense(512, activation='relu')(merged)
z = Dense(256, activation='relu')(z)
z = Dense(128, activation='relu')(z)
outputs = Dense(6, activation='linear')(z)


# --------------------------------------------------
# War Machine Assembly
# --------------------------------------------------
model = Model(inputs=[signal_input, code_input], outputs=outputs)

# --------------------------------------------------
# Reconnaissance Report (Model Summary)
# --------------------------------------------------
model.summary()

Model: "model_2"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_13 (InputLayer)       [(None, 208, 2)]             0         []                            
                                                                                                  
 conv1d_12 (Conv1D)          (None, 208, 128)             1408      ['input_13[0][0]']            
                                                                                                  
 batch_normalization_12 (Ba  (None, 208, 128)             512       ['conv1d_12[0][0]']           
 tchNormalization)                                                                                
                                                                                                  
 max_pooling1d_8 (MaxPoolin  (None, 104, 128)             0         ['batch_normalization_12

2. Global Average Pooling (GAP)
Purpose:

Summarizes each feature map by taking the average of all timesteps.

Outputs one value per filter (e.g., 512 filters → 512 values).

Why Replace Flatten() with GAP:

Retains temporal context: Instead of flattening into 52×64=3328 values (losing sequence info), GAP collapses each filter’s output into a single meaningful average.

Reduces overfitting: Fewer parameters → less risk of memorizing noise.

Analogy:
Instead of listing every radar ping’s strength over time, you report the average intensity for each frequency band.



3. Batch Normalization
Purpose:

Stabilizes training by normalizing the outputs of a layer to have zero mean and unit variance.

Applied after Conv/Dense layers but before activation functions.

Why Add It:

Faster convergence: Prevents exploding/vanishing gradients.

Reduces dependency on careful weight initialization.

Acts as a regularizer (reduces overfitting).

Analogy:
Like calibrating radar sensors before each mission to ensure consistent signal strength readings.



# 🍵 Tea Strainer Analogy  

## 1. Radar Signal Branch (Advanced Tea Brewing)  
**Input**: A pot of tea with:  
- **208 tea leaves** (timesteps)  
- **2 leaf types** (real/imaginary radar channels).  

**Steps**:  
1. **First Strainer (`Conv1D(128, 5`))**  
   - Use **128 ultra-sensitive strainers** (filters) with **5-leaf windows** (kernel size=5).  
   - Detects broader flavor patterns (e.g., bitter-sweet waves).  

2. **Quality Control (`BatchNorm`)**  
   - Standardizes the tea’s pH/sugar levels → ensures consistent flavor batches.  

3. **Concentrate (`MaxPooling`)**  
   - Boil down tea to **104 leaves** → retain strongest flavors.  

4. **Second Strainer (`Conv1D(256, 3`))**  
   - **256 precision strainers** with **3-leaf windows** → detect subtle spice notes.  
   - Another `BatchNorm` → stabilize flavors.  
   - Concentrate again → **52 leaves**.  

5. **Third Strainer (`Conv1D(512, 3`))**  
   - **512 nano-strainers** → extract microscopic flavor compounds.  

6. **Final Taste Test (`GlobalAveragePooling1D`)**  
   - Instead of listing all 52×512 flavors, take the **average intensity** of each flavor type.  
   - Output: **512 signature tastes** (e.g., "smoky-7", "sweet-42").  

7. **Blend & Reduce (`Dense(256`))**  
   - Mix 512 tastes → 256 elite flavors.  
   - **Dropout(0.3)**: Randomly block 30% of flavors to prevent over-reliance.  

---

## 2. Code Branch (Spice Lab)  
**Input**: A **13-digit secret code** (spice recipe).  

**Steps**:  
1. **Spice Decoder (`Embedding`)**  
   - Convert each code digit into an **8D spice vector** (e.g., digit "5" → `[0.2, -0.7, ...]`).  
   - Like grinding spices into **aromatic powders** for better mixing.  

2. **Flatten & Mix**  
   - Spread powders into a **104D spice paste** (13 digits × 8D → 104 features).  

3. **Refine (`Dense(128`))**  
   - Extract **128 aromatic compounds** (e.g., cinnamon essence).  
   - `BatchNorm` → stabilize acidity.  
   - **Dropout(0.2)**: Remove 20% of compounds to avoid overpowering.  

4. **Final Extract (`Dense(64`))**  
   - Condense to **64 pure spices** (e.g., "smoky-essence-X").  

---

## 3. Fusion (Master Brew)  
1. **Combine Flavors (`concatenate`)**  
   - Mix **256 tea flavors** + **64 spices** → **320-dimensional super-blend**.  

2. **Master Brewing (`Dense(512 → 256 → 128`))**  
   - Three-step refinement:  
     - **512** → "Harmonize bitter-sweet balance"  
     - **256** → "Adjust aroma intensity"  
     - **128** → "Perfect the aftertaste"  

3. **Serve (`Dense(6`))**  
   - Pour into **6 cups** (coordinates: x,y,z for two jets).  

---

# 🚨 What the Analogy Misses  
1. **Mathematical Precision**:  
   - `GlobalAveragePooling1D` averages values mathematically → no tea equivalent.  
   - `BatchNorm` uses mean/variance normalization → not just "quality control".  

2. **Dynamic Learning**:  
   - Strainers (**filters**) auto-adjust their holes (**weights**) during training.  
   - The `Embedding` layer learns spice vectors from data, unlike fixed grinding.  

3. **Regularization**:  
   1. `L2 regularization` subtly penalizes complex recipes → not just dropout. 
   2. Why Use It in Your Model?
      - Prevents Overfitting:
         Stops the model from relying too heavily on any single feature (e.g., one radar signal spike or code digit).

      - Smooths Predictions:
         Encourages smaller weights → less "jumpy" predictions (critical for radar coordinate estimation).
   3. Imagine you’re crafting a tea recipe:

      - Without L2: You might use 10kg of cinnamon to mask all other flavors (overfitting to cinnamon).

      - With L2: Forces you to use balanced spice quantities (smaller weights), ensuring no single flavor dominates.



 

4. **Cross-Modal Fusion**:  
   - Mixing radar signals (tea) and codes (spices) has no real-world culinary parallel.  

5. **Training Nuances**:  
   - **Lower learning rate (0.0001)**: Simmering instead of boiling → avoids burning.  
   - **Larger batch size (64)**: Tasting 64 teas at once → faster, stable learning.  
   - **Patience=20**: Waits longer for flavor perfection before stopping.  

In [13]:
optimizer = Adam(
    learning_rate=0.0001,  # Reduced from 0.001
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-07
)

early_stop = EarlyStopping(
    monitor='val_mae',  # Direct metric
    patience=20,  # Increased patience
    mode='min',
    restore_best_weights=True
)

model.compile(
    optimizer=optimizer,
    loss='mse',
    metrics=['mae']
)



In [14]:
history = model.fit(
    [X_sig_train, X_sec_train],
    y_train,
    validation_data=([X_sig_val, X_sec_val], y_val),
    epochs=200,  # Increased capacity
    batch_size=64,  # Larger batch size
    callbacks=[early_stop],
    verbose=1
)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

In [15]:
test_loss, test_mae = model.evaluate(  
    [X_sig_test, X_sec_test],   
    y_test,  
    verbose=0  
)  
print(f"Test MSE: {test_loss:.4f} (Normalized)")  
print(f"Test MAE: {test_mae:.4f} (Normalized)")  

# Inverse-transform for real-world error  
y_pred_normalized = model.predict([X_sig_test, X_sec_test])  
y_pred_real = scaler_target.inverse_transform(y_pred_normalized)  
y_test_real = scaler_target.inverse_transform(y_test)  

# Calculate real-world MAE  
mae_real = np.mean(np.abs(y_pred_real - y_test_real))  
print(f"Real-World MAE: {mae_real:.2f} meters")  

Test MSE: 0.6973 (Normalized)
Test MAE: 0.6562 (Normalized)
Real-World MAE: 6566.90 meters
