# Research Skills Report: Analysis of EXP3 in OTC Market Pricing

## Chapter 4: Case Study – Analysis of EXP3 in OTC Market Pricing (EXP1 Experiment)

### 4.1 Introduction and Background
In over-the-counter (OTC) financial markets, liquidity providers (LPs) face significant challenges due to limited and asymmetric information. LPs typically do not know how many competitors are quoting prices, cannot observe rival quotes, and only receive information about their own trades. As a result, optimizing pricing strategies in such an environment is non-trivial.

The EXP1 experiment, as presented in the study *“AI Driven Liquidity Provision in OTC Financial Markets”*, investigates whether model-free reinforcement learning algorithms—specifically the EXP3 multi-armed bandit algorithm—can successfully guide LPs in setting optimal spreads in such an opaque market.

### 4.2 Market Structure and Mathematical Formulation
#### Liquidity Provider Quotes
Each LP posts a symmetric bid-ask quote around their own mid-price estimate of the true asset value:
$$b^{(i)}_t = p^{(i)}_t - \frac{s_i}{2}, \quad a^{(i)}_t = p^{(i)}_t + \frac{s_i}{2}$$

where $s_i$ is the spread and $p^{(i)}_t = p^*_t + m^{(i)}_t$ is the mid-price estimate.

#### Trader Behavior
Trader's estimate and reservation prices are defined similarly, with rewards for LPs only if selected for a trade.

#### Reward Function
The LP’s reward is:
$$\pi^{(i)}_t = \frac{1}{2}s_i \cdot |D^{(i)}_t| + m^{(i)}_t \cdot D^{(i)}_t$$

### 4.3 EXP3 Algorithm Deployment
To determine optimal spreads under incomplete information, each LP employs the EXP3 bandit algorithm:
```python
# Simplified EXP3 Algorithm
Initialize: weights = [1] * K, gamma = 0.1
for t in range(T):
    probabilities = [(1 - gamma) * (w / sum(weights)) + gamma / K for w in weights]
    action = choose_action(probabilities)
    reward = observe_reward(action)
    weights[action] *= exp(gamma * reward / (K * probabilities[action]))
```

### 4.4 Simulation Results and Visualization
#### Figure 1A: LP1 Expected Value Surface
A simulated plot of expected value based on spread combinations of LP1 and LP2 shows how LP1 adjusts its strategy in response to LP2’s behavior.

### 4.4 Simulation Results and Visualization
#### Figure 1A: LP1 Expected Value Surface
A simulated plot of expected value based on spread combinations of LP1 and LP2 shows how LP1 adjusts its strategy in response to LP2’s behavior.