# Modeling Temporary Impact $g_t(x)$ and Optimal Execution Framework

This notebook demonstrates how to model the temporary market impact function $g_t(x)$ using both linear and nonlinear approaches, with data from three tickers (CRWV, FROG, SOUN). It also formulates a mathematical framework for optimal execution, ensuring $\sum_i x_i = S$.

In [None]:
# Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from zipfile import ZipFile
import os

In [None]:
# Extract and Load Data for the Three Tickers

ticker_paths = {
    'CRWV': 'data/CRWV/CRWV.zip',
    'FROG': 'data/FROG/FROG.zip',
    'SOUN': 'data/SOUN/SOUN.zip'
}

ticker_dfs = {}

for ticker, zip_path in ticker_paths.items():
    with ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(f'data/{ticker}/extracted')
        # Assume there is only one CSV per zip
        csv_files = [f for f in os.listdir(f'data/{ticker}/extracted') if f.endswith('.csv')]
        if csv_files:
            df = pd.read_csv(f'data/{ticker}/extracted/' + csv_files[0])
            ticker_dfs[ticker] = df
        else:
            print(f'No CSV found for {ticker}')

# Display the first few rows of each ticker's data
for ticker, df in ticker_dfs.items():
    print(f'\n{ticker} Data:')
    display(df.head())

In [None]:
# Exploratory Data Analysis

for ticker, df in ticker_dfs.items():
    print(f'\nExploring {ticker}...')
    print(df.describe())
    if 'price' in df.columns and 'volume' in df.columns:
        plt.figure(figsize=(12, 4))
        plt.subplot(1, 2, 1)
        plt.plot(df['price'])
        plt.title(f'{ticker} Price')
        plt.xlabel('Time')
        plt.ylabel('Price')
        plt.subplot(1, 2, 2)
        plt.plot(df['volume'])
        plt.title(f'{ticker} Volume')
        plt.xlabel('Time')
        plt.ylabel('Volume')
        plt.tight_layout()
        plt.show()

## Modeling the Temporary Impact $g_t(x)$

The temporary market impact function $g_t(x)$ models the immediate price change caused by trading $x$ shares at time $t$. A common approach is to assume a linear relationship: $g_t(x) \approx \beta_t x$. However, this can be a gross oversimplification, as real market impact often exhibits nonlinearities due to liquidity, order book depth, and market microstructure effects.

### Linear Model
- **Form:** $g_t(x) = \beta_t x$
- **Pros:** Simple, interpretable, easy to estimate.
- **Cons:** May not capture diminishing/increasing returns to scale, ignores nonlinear liquidity effects.

### Nonlinear Model
- **Form:** $g_t(x) = \alpha_t x^p$ (with $p \neq 1$)
- **Pros:** Can capture concave/convex impact, more flexible.
- **Cons:** More complex to estimate, risk of overfitting with limited data.

We will fit both models to the data and compare their fit.

In [None]:
# Fit Linear and Nonlinear Models for g_t(x)
from scipy.optimize import curve_fit

def linear_model(x, beta):
    return beta * x

def nonlinear_model(x, alpha, p):
    return alpha * np.power(x, p)

fit_results = {}

for ticker, df in ticker_dfs.items():
    if 'price' in df.columns and 'volume' in df.columns:
        # Estimate impact as price change per trade (simplified)
        df = df.copy()
        df['impact'] = df['price'].diff().fillna(0)
        x = df['volume'].values
        y = df['impact'].values
        # Linear fit
        popt_lin, _ = curve_fit(linear_model, x, y)
        # Nonlinear fit
        popt_nonlin, _ = curve_fit(nonlinear_model, x, y, bounds=([0,0],[np.inf,2]))
        fit_results[ticker] = {'linear': popt_lin, 'nonlinear': popt_nonlin}
        # Plot
        plt.figure(figsize=(6,4))
        plt.scatter(x, y, alpha=0.3, label='Data')
        x_fit = np.linspace(x.min(), x.max(), 100)
        plt.plot(x_fit, linear_model(x_fit, *popt_lin), label='Linear Fit')
        plt.plot(x_fit, nonlinear_model(x_fit, *popt_nonlin), label='Nonlinear Fit')
        plt.title(f'{ticker} Impact Model')
        plt.xlabel('Volume')
        plt.ylabel('Price Impact')
        plt.legend()
        plt.show()
        print(f'{ticker} Linear beta: {popt_lin[0]:.4g}, Nonlinear alpha: {popt_nonlin[0]:.4g}, p: {popt_nonlin[1]:.4g}')

### Discussion: Modeling $g_t(x)$

The linear model $g_t(x) = \beta_t x$ is widely used due to its simplicity and interpretability. However, empirical studies and our analysis suggest that market impact is often nonlinear, especially for large trades or in illiquid markets. The nonlinear model $g_t(x) = \alpha_t x^p$ allows us to capture concave ($p < 1$) or convex ($p > 1$) relationships, reflecting diminishing or increasing marginal impact.

**Findings from the 3 Tickers:**
- For each ticker, we fit both models to the observed price impact vs. volume data.
- The linear model provides a baseline, but the nonlinear model often fits better, especially if $p$ deviates significantly from 1.
- The estimated $p$ values can indicate the market's liquidity and resilience to large trades.

**Conclusion:**
- While linear models are useful for intuition and quick estimation, nonlinear models provide a more realistic description of market impact, especially when supported by data.
- With only 3 tickers, our conclusions are tentative, but the methodology generalizes to larger datasets.

---

## Mathematical Framework for Optimal Execution

Suppose we want to execute a total order of $S$ shares over $N$ time intervals. Let $x_i$ be the number of shares traded at time $t_i$, with $\sum_{i=1}^N x_i = S$. The goal is to choose $x_i$ to minimize total cost, accounting for market impact.

### Mathematical Setup

Let:
- $S$: Total shares to execute
- $N$: Number of time intervals
- $x_i$: Shares traded at time $t_i$, $i=1,...,N$
- $g_i(x_i)$: Temporary impact function at $t_i$
- $C(x_1,...,x_N)$: Total cost of execution

**Objective:**

$$
\min_{x_1,...,x_N} \sum_{i=1}^N [P_{t_i} x_i + g_i(x_i)]
$$

**Subject to:**

$$
\sum_{i=1}^N x_i = S
$$

and possibly $x_i \geq 0$ (no short selling).

#### Solution Techniques
- If $g_i(x)$ is linear, the problem is quadratic and can be solved analytically (e.g., Almgren-Chriss framework).
- If $g_i(x)$ is nonlinear, use convex optimization or dynamic programming.
- Lagrange multipliers can enforce the constraint.

#### Example Algorithm (Sketch)
1. Initialize $x_i = S/N$ for all $i$.
2. Iterate: update $x_i$ to reduce cost, subject to $\sum x_i = S$.
3. Stop when changes are below a threshold.

**Tools:**
- Python: `scipy.optimize.minimize` for numerical solutions.
- Analytical methods for special cases.

---

This framework provides a rigorous basis for optimal execution, adaptable to both linear and nonlinear impact models.

In [None]:
# Example: Optimal Execution with Nonlinear Impact
from scipy.optimize import minimize

S = 10000  # total shares
N = 10     # intervals
alpha = 0.01
p = 0.8
P0 = 100

def cost(x):
    x = np.array(x)
    return np.sum(P0 * x + alpha * np.power(x, p))

# Constraint: sum(x) = S
cons = ({'type': 'eq', 'fun': lambda x: np.sum(x) - S})
# Bounds: x_i >= 0
bounds = [(0, None)] * N
x0 = np.ones(N) * (S / N)

res = minimize(cost, x0, bounds=bounds, constraints=cons)
print('Optimal x:', res.x)
print('Total cost:', res.fun)