# **Signal Simulation Framework**

## **1. Introduction**

### **1.0 Assessment Description**

Quantitative research in finance relies heavily on the analysis of historical market data to evaluate the performance of trading strategies before live deployment. However, raw market data frequently contains **noise, missing values, and inconsistent quotes**, which can distort strategy performance if not properly handled. Moreover, naive simulations that assume **perfect trade execution** often lead to **overstated and unrealistic returns**.

This assessment presents a **modular Python framework** developed to simulate trading activity under more realistic market conditions. The framework emphasizes **data integrity, execution realism, and analytical reproducibility**, providing a foundation for evaluating signal-driven trading strategies with statistical rigor.

---

### **1.2 Framework Overview**

The framework is composed of four main components:

#### 1. **Data Validation**
- Cleans and aligns quote and signal datasets.  
- Detects and removes common data issues such as **inverted spreads**, **null entries**, and **extreme outliers**.

#### 2. **Signal-Based Simulation**
- Executes trading signals using a **latency-aware fill engine**.  
- Tracks order types (open/close), executed prices, slippage, and PnL.  
- Incorporates **execution delay** to better reflect real-world trading latency.

#### 3. **Performance Metrics and Logging**
- Generates detailed **execution logs** for each simulated trade.  
- Computes essential performance metrics including:  
  - Total trade count  
  - Gross and Net PnL  
  - Average trade PnL  
  - Slippage  
  - Maximum drawdown

#### 4. **Framework Extensibility**
- Supports flexible configuration of simulation parameters such as **latency**, **signal thresholds**, and **execution models**.  

- Implements a **probabilistic execution model** that simulates **realistic fill uncertainty** using random draws based on **execution probability**, derived from **price aggressiveness** and the **relative bid–ask spread**.  

- The **relative spread** is calculated as the normalized difference between the best ask and bid prices, allowing the model to remain consistent across assets with different price scales.   

---

### **1.3 Purpose and Contribution**

This framework serves as a **lightweight yet realistic research tool** for quantitative analysts and developers.  
It enables:
- Rapid testing and validation of trading signals.  
- Insight into market impact.  
- Transparent reporting of results with reproducible performance analytics.

By integrating careful data validation with realistic execution modeling, the project bridges the gap between **theoretical signal research** and **practical trading implementation**.

---

## **2. Project Flow**

![Flow chart](Project_Flowchart.png)

### **2.0 Folder Structure**


<pre>
SignalSim/
│
├── config/
│   └── config.json
│
├── data/
│   ├── processed_data/
│   └── raw_data/
│
├── output/
│   ├── csvs/
│   ├── logs/
│   └── plots/
│
├── report/
│   ├── notebook_report.ipynb
│   └── Project_Flowchart.png
│
├── src/
│   ├── __init__.py
│   ├── logger_config.py
│   ├── main.py
│   ├── metrics.py
│   ├── plotting.py
│   ├── signal_integration.py
│   ├── simulator.py
│   └── validation.py
│
├── .gitignore
├── README.md
├── requirements.txt
├── setup_and_run.bat
└── setup_and_run.sh
</pre>

---
### **2.1 Data Validation Step**

#### **Signals data validation**

1. Timestamps Must Be Valid & Comparable<br>
    - Signal and quote data both contain a timestamp column. Time alignment is crucial since signals must map to market quotes at the correct moment.<br>
    - All timestamps should be convertible into datetime objects.<br>
    - If formatting issues (e.g., commas in fractional seconds) exist, they are corrected.<br>

2. No Null Values<br>
    - Missing values in signals would invalidate the trading logic. In trading, incomplete data can cause incorrect signals → better to discard such rows.<br>
    - Rows with NaN are dropped.<br>

3. Timestamp Alignment Between Signals and Quotes<br>
    - Each signal must have a corresponding quote at the same timestamp. Because generating an order requires price information, which is included in the quotes.<br>
    - Any signals not aligned with quotes are removed.<br>


#### **Quotes data validation**

1. Timestamps Must Be Valid<br>
    - Same conversion as signals.<br>

2. No Duplicate Rows<br>
    - Duplicates bias spread/volume statistics and can double-count liquidity.<br>
    - Duplicate rows are dropped.<br>

- **Assumption**: <br>
Each timestamp should have a single unique bid/ask record, as the data is sampled at 1-second precision.<br>

3. Timestamps Should Be Strictly Increasing.<br>
    - Market data is sequential.<br>
    - After removing duplicates, if the timestamps are still not increasing, sort the data based on the timestamp.<br>
    - Ensures chronological integrity for time-series analysis and backtesting.<br>

4. No Null Values<br>
    - Quote data should always contain valid bid/ask prices and volumes.<br>
    - Nulls imply incomplete snapshots → invalid for trading simulations.<br>
    - Dropped rows with NaN.<br>

5. Bid-Ask Relationship Must Hold bid_price <= ask_price always.<br>
    - A market with bid > ask indicates corrupted or inverted order book data.<br>
    - Removed rows violating this.<br>

6. Spread Should Be Reasonable<br>
    - Extremely wide spreads often indicate market anomalies, bad ticks, or illiquidity.<br>
    - Rows flagged with `spread_flag = 1` for downstream analysis.<br>

    - Spread defined as:
$$
\text{Spread}_{\text{relative}} = \frac{\text{ask\_price} - \text{bid\_price}}{\text{mid\_price}}
$$

- **Assumption**: <br>
A spread is considered abnormal if it exceeds the mean by more than k times the standard deviation.<br>
$$
\text{spread\_threshold} = \text{mean}_{\text{spread}} + k \times \text{std}_{\text{spread}}
$$


7. Volumes Must Be Positive<br>
    - bid_qty > 0 and ask_qty > 0.<br>
    - Orders with non-positive volume do not exist in real markets → they indicate bad data.<br>
    - Dropped rows with zero/negative volumes.<br>
    
---


### **2.2 Signal Integration Step**


1. Threshold-Based Discretization (Decision Rule)<br>
    - The trading system converts continuous model outputs into discrete trading actions using a threshold function.<br>
    - This converts noisy continuous signals into actionable decisions: by converting categorical actions (“Buy”, “Sell”, “Hold”) into numerical values (+1, −1, 0)<br>
    - Action defined as:<br>
Action =  
{  
 **Buy**, if s > +θ  
 **Sell**, if s < −θ  
 **Hold**, otherwise  
}<br>

    where, <br>
    - s = signal strength  
    - θ = strength threshold

- **Assumption:**<br>
Small signal fluctuations near zero do not carry strong predictive power, so they are treated as noise (Hold).


2. Categorical Encoding for Downstream Simulation<br>
    - Converting categorical actions (“Buy”, “Sell”, “Hold”) into numerical values (+1, −1, 0) facilitates quantitative modeling, backtesting, and simulation.<br>



---

### **2.3 Order Generation Step**

This step models order execution behavior in a simplified yet realistic way, translating a **signal** into actionable trading orders based on current market conditions and position state.

1. A trading signal represents directional intent:
    - **+1 → Buy bias**
    - **−1 → Sell bias**
    - **0 → Hold / No trade**

2. The function `order_generator()` transforms this abstract intent into specific executable orders — determining whether to:
    - Open a new long or short position.
    - Close an existing position.
    - Flip the position (close one side and open the opposite).

3. **Position flipping logic**:
    - If the net position is already long and another Buy signal arrives, it adds to the long position.  
    - If the net position is short and a Buy signal arrives, it first closes the short position before opening a new long.  
    - The same logic applies symmetrically for Sell signals.

4. **Order Pricing**:
    - **Buy Orders:** Sent at the **best ask price**, consuming existing sell-side liquidity (*taker-style execution*).  
    - **Sell Orders:** Sent at the **best bid price**, consuming existing buy-side liquidity (*taker-style execution*).


**Assumptions:**

- **Position symmetry:** Long and short positions are perfectly symmetrical (1 long = −1 short in net exposure).  
- **Latency:** Orders are created instantly upon signal arrival.
- **Close-First Position Rule:**  
   The simulator always closes existing positions before opening new ones in the opposite direction,  
   ensuring clean transitions and no overlapping long and short exposures.
- **Order type:** Only limit orders are sent.


---

### **2.4 Exchange Simulation**

The `simulation()` function models a **signal-driven trading process** that mimics live market dynamics by introducing:

- **Fixed latency** between order submission and execution, 
- **Probabilistic order fills** based on price aggressiveness and spread penalty.
- **Position and PnL tracking** for both realized and unrealized performance.

It iterates through timestamped quote data, and evaluates whether received orders would have executed in a realistic exchange environment.


#### Core Assumptions

1. **Fixed Latency**  
   Each order experiences a constant **1-second delay** between submission and execution.  
   That is, an order sent at time *t* is matched against the market snapshot at *t + 1 s*.

2. **Market Stability**  
   The quote at *t + 1 s* is assumed to represent actual market conditions when the order is executed.

3. **Execution is Probabilistic**  
   Orders are not guaranteed to fill. Execution depends on:
   - **Price aggressiveness** relative to market price.  
   - **Spread-based penalty** under wide-spread conditions.  
   - **Random draw** that determines whether the order executes.  
   Together, these create a **stochastic fill model** that introduces realistic execution uncertainty.

4. **Commision**
   - The commission per trade is assumed to be 0.001.

5. **Capital**
   - The capital is assumed to be unlimited for this simulation.  


#### **(a) Price Aggressiveness**

Measures how favorable the trader’s order price is compared to the current market price.  
Higher aggressiveness increases execution probability.

For each order side, aggressiveness is scaled using coefficients *(ca, cb)* and a minimum floor *(m = min_price_aggressiveness)*:

$$
\text{price\_aggressiveness} =
\begin{cases}
\displaystyle
\frac{((1 - m)\,p_{sent}) + (p_{bid}\,((m \cdot c_b) - 1))}{p_{bid}\,(c_b - 1)}, & \text{for sell-side orders} \\\\
\displaystyle
\frac{((1 - m)\,p_{sent}) + (p_{ask}\,((m \cdot c_a) - 1))}{p_{ask}\,(c_a - 1)}, & \text{for buy-side orders}
\end{cases}
$$

- **More aggressive (market-taking)** → higher value → higher fill chance.  
- **More passive** → lower value → lower fill probability.  
- Result is **clipped to [0, 1]** to ensure normalized scale.


#### **(b) Spread Penalty**

Wide spreads indicate **low liquidity** or **volatile** conditions.  
A multiplicative penalty (α) reduces fill probability under these conditions:

$$
P_{exec,final} = \alpha \times P_{exec,base}
$$

where  

$$
\alpha =
\begin{cases}
1.0, & \text{for normal (liquid) spreads} \\\\
\text{spread\_penalty\_factor}, & \text{for wide (illiquid) spreads (spread\_flag = 1)}
\end{cases}
$$

This models reduced match probability during stressed or unstable markets.


#### **(c) Probabilistic Fill Decision**

The final execution probability is:

$$
P_{exec} = \alpha \times \text{price\_aggressiveness}
$$

A **uniform random draw** determines whether the order is filled:

$$
\text{rand\_val} \sim U(0, 1) \\
\text{Order is filled if } \text{rand\_val} < P_{exec}.
$$

- Introduces **stochastic uncertainty** — even highly aggressive orders may occasionally fail.  
- Ensures execution outcomes are probabilistic, not deterministic.


#### **(d) Behavior Summary**

- **Buy signals (1):** Orders sent at **ask price** (*taker-style*).  
- **Sell signals (-1):** Orders sent at **bid price** (*taker-style*).  
- **Hold signals (0):** No order generated.  
- Orders are evaluated using the **next timestamp (t + dt)** to simulate latency.(dt = 1 sec)  
- **No queuing:** Each signal is processed independently.  
- Fill decisions depend on:
  - `price_aggressiveness`  
  - `spread_penalty_factor`  
  - **random draw (rand_val)**  
- **PnL** is updated dynamically using both **realized** and **unrealized** components via `pnl_obj.update_pnl()`.


#### **Simplifying Assumptions**

- Execution probability depends **only** on price aggressiveness and spread penalty.  
- **Randomness** determines final fill outcome, capturing market uncertainty.  
- **Spread penalty** simulates reduced execution under wide spreads.  
- **Latency = 1 second** ensures deterministic timing for each order event.  
- **No queue priority or partial-fill modeling:** Each signal triggers an atomic decision — filled or not filled.
- **Order Cancelling:** Orders that are not filled will be cancelled immediately.
---

## **3. Future Improvements**

### **3.1 Missing Price Approximation**
If the best bid price is missing but the best ask price is available, we can approximate it as:<br>
```best_bid_price = best_ask_price - tick_size```

If the best ask price is missing but the best bid price is available, we can approximate it as:<br>
```best_ask_price = best_bid_price + tick_size```

### **3.2 Execution Probability Model Enhancements**

##### **(a) Volatility-Aware Execution**
Integrate **short-term realized volatility** to dynamically scale the execution probability:

$$ P_{exec} = \alpha \times \text{price\_aggressiveness} \times e^{-\beta \, \sigma_t} $$

where:  
- $\sigma_t$ = short-horizon volatility estimate (e.g., rolling standard deviation of mid-price returns)  
- $\beta$ = volatility sensitivity coefficient  

*Effect:* Orders become harder to execute during volatile periods, mimicking real-world market microstructure.


##### **(b) Adaptive Spread Penalty**
Instead of binary spread flags, use a **continuous penalty function**:

$$ \alpha = e^{-\gamma \cdot (\text{relative\_spread})} $$

where $\text{relative\_spread} = \frac{(\text{ask} - \text{bid})}{\text{mid}}$  

*Effect:* This avoids sharp transitions and better reflects liquidity degradation as spreads widen.


### **3.3 Market Microstructure Realism**

##### **(a) Queue Position Modeling**
Add **order queue simulation** to differentiate between:
- Immediate fills  
- Passive queue placement  
- Partial fills due to limited depth  

Use **queue-based latency** and **FIFO modeling** to determine realistic time-to-fill and execution priority.

##### **(b) Capital Constraint Modeling**
Currently, the simulation assumes unlimited capital.
As a future improvement, a capital parameter can be introduced.

### **3.4 Risk Management**
Future versions of the simulator can incorporate **risk management mechanisms** to better control portfolio exposure and losses.

##### **(a) Exposure Limits**
Define **maximum allowable position sizes** or **notional exposure** per instrument.  
This prevents excessive concentration in a single asset and ensures consistent capital allocation across trades.

##### **(b) Stop-Loss Mechanism**
Automatically **close positions** once unrealized losses exceed a predefined threshold.  
This helps protect the Capital from large adverse price movements and reduces drawdown risk.

*Combined Effect:* These features make the simulation framework more realistic by enforcing **capital discipline** and **loss control**, aligning with **professional trading system standards**.


---
## **4. PnL Metrics**


### **4.1 Gross PnL**

Gross PnL represents the **total profit or loss** before deducting commissions.  
It combines both realized and unrealized components from long and short positions.

$$
\text{Gross PnL} = (\text{Realized Long PnL} + \text{Unrealized Long PnL}) + (\text{Realized Short PnL} + \text{Unrealized Short PnL})
$$

or equivalently,

$$
\text{Gross PnL} = \text{Total Long PnL} + \text{Total Short PnL}
$$


### **4.2 Net PnL**

Net PnL accounts for **all trading commissions** and represents the final profit or loss after transaction costs:

$$
\text{Net PnL} = \text{Gross PnL} - \text{Total Commission}
$$

where:

$$
\text{Total Commission} = N_{\text{trades}} \times \text{Commission per Trade}
$$


### **4.3 Average Trade PnL**

Average trade PnL indicates the **average realized profit or loss per closed trade**, excluding unrealized components:

$$
\text{Average Trade PnL} = \frac{\text{Total Realized PnL}}{N_{\text{closed trades}}}
$$

This metric reflects the **average profitability per completed trade** and is useful for evaluating trade-level efficiency.


### **4.4 Average Slippage per Trade**

Average slippage measures the **mean execution cost deviation** (difference between sent price and fill price) across all executed trades:

$$
\text{Average Slippage per Trade} = \frac{\sum_{i=1}^{N_{\text{trades}}} \text{Slippage}_i}{N_{\text{trades}}}
$$

It helps assess **execution quality** — smaller average slippage implies better order placement and market timing.




---
## **5. Plots**

### **5.1 Relative Bid-Ask Spread Distribution**

![](../output/plots/spread_distribution.png)

### **5.2 Summary Metrics**

![](../output/plots/pnl_slippage_dd_summary_grid.png)