# Stanford CME 241 (Winter 2026) - Assignment 2

**Due: Friday, February 13 @ 11:59 PM PST on Gradescope.**

Assignment instructions:
- Make sure each of the subquestions have answers
- Ensure that group members indicate which problems they're in charge of
- Show work and walk through your thought process where applicable
- Empty code blocks are for your use, so feel free to create more under each section as needed
- Document code with light comments (i.e. 'this function handles visualization')

Submission instructions:
- When complete, fill out your publicly available GitHub repo file URL and group members below, then export or print this .ipynb file to PDF and upload the PDF to Gradescope.

*Link to this ipynb file in your public GitHub repo (replace below URL with yours):* 

https://github.com/onat-dalmaz/RL-book/blob/main/Assignments/Assignment%202/assignment2.ipynb

*Group members (replace below names with people in your group):* 
- Onat Dalmaz

## Imports

In [None]:
import math
from typing import Dict, Tuple, List, Any

import numpy as np
import matplotlib.pyplot as plt

## Question 1: Job-Hopping and Wages-Utility-Maximization (Led by Onat Dalmaz)

You are a worker who starts every day either employed or unemployed. If you start your day employed, you work on your job for the day (one of $n$ jobs, as elaborated later) and you get to earn the wage of the job for the day. However, at the end of the day, you could lose your job with probability $\alpha \in [0,1]$, in which case you start the next day unemployed. If at the end of the day, you do not lose your job (with probability $1-\alpha$), then you will start the next day with the same job (and hence, the same daily wage). 

On the other hand, if you start your day unemployed, then you will be randomly offered one of $n$ jobs with daily wages $w_1, w_2, \ldots w_n \in \mathbb{R}^+$ with respective job-offer probabilities $p_1, p_2, \ldots p_n \in [0,1]$ (with $\sum_{i=1}^n p_i = 1$). You can choose to either accept or decline the offered job. If you accept the job offer, your day progresses exactly like the **employed-day** described above (earning the day's job wage and possibly (with probability $\alpha$) losing the job at the end of the day). However, if you decline the job offer, you spend the day unemployed, receive the unemployment wage $w_0 \in \mathbb{R}^+$ for the day, and start the next day unemployed.

The problem is to identify the optimal choice of accepting or rejecting any of the job offers the worker receives, in a manner that maximizes the infinite-horizon **Expected Discounted-Sum of Wages Utility**. Assume the daily discount factor for wages (employed or unemployed) is $\gamma \in [0,1])$. Assume Wages Utility function to be $U(w) = \log(w)$ for any wage amount $w \in \mathbb{R}^+$. The goal is to maximize:

$$
\mathbb{E}\left[\sum_{u=t}^\infty \gamma^{u-t} \cdot \log(w_{i_u})\right]
$$

at the start of a given day $t$ ($w_{i_u}$ is the wage earned on day $u$, $0 \leq i_u \leq n$ for all $u \geq t$).

---

### Subquestions

#### Part (A): MDP Modeling

Express the job-hopping problem as an MDP using clear mathematical notation by defining the following components:

1. **State Space**: Define the possible states of the MDP.
2. **Action Space**: Specify the actions available to the worker at each state.
3. **Transition Function**: Describe the probabilities of transitioning between states for each action.
4. **Reward Function**: Specify the reward associated with the states and transitions.
5. **Bellman Optimality Equation**: Write the Bellman Optimality Equation customized for this MDP.

---

#### Part (B): Python Implementation

Write Python code that:

1. Solves the Bellman Optimality Equation (hence, solves for the **Optimal Value Function** and the **Optimal Policy**) with a numerical iterative algorithm. 
2. Clearly define the inputs and outputs of your algorithm with their types (`int`, `float`, `List`, `Mapping`, etc.).

*Note*: For this problem, write the algorithm from scratch without using any prebuilt MDP/DP libraries or code.

---

#### Part (C): Visualization and Analysis

1. Plot the **Optimal Value Function** as a function of the state for a specific set of parameters ($n$, $w_1, \ldots, w_n$, $p_1, \ldots, p_n$, $\alpha$, $\gamma$, $w_0$).
2. Include these graphs in your submission.

---

#### Part (D): Observations

1. What patterns do you observe in the **Optimal Policy** as you vary the parameters $n$, $\alpha$, and $\gamma$?
2. Provide a brief discussion of your findings.

---

### Part (A) Answer

We formulate the problem as an infinite-horizon Discounted Markov Decision Process (MDP).

**1. State Space $\mathcal{S}$:**

The state represents the worker's employment status at the start of the day.

$$
\mathcal{S} = \{0, 1, 2, \dots, n\}
$$

- State $s=0$: The worker is **Unemployed**.
- State $s=i$ (for $i \in \{1, \dots, n\}$): The worker is **Employed at Job $i$**.

**2. Action Space $\mathcal{A}(s)$:**

The actions depend on the current state.

- If $s \in \{1, \dots, n\}$ (Employed): The worker has no choice; they must work.
  $$
  \mathcal{A}(s) = \{\text{Work}\}
  $$

- If $s = 0$ (Unemployed): The worker receives a job offer $j$ with probability $p_j$. The decision is made after the offer is realized. Effectively, the action is a policy $\pi(\text{offer } j) \in \{\text{Accept}, \text{Decline}\}$.
  $$
  \mathcal{A}(0) = \{\text{Accept}, \text{Decline}\} \quad (\text{conditional on the specific offer } j)
  $$

**3. Transition Function $\mathcal{P}(s' | s, a)$:**

- **From State $i > 0$ (Employed):**
  - Next state is $0$ (Unemployed) with probability $\alpha$ (fired).
  - Next state is $i$ (Still Employed) with probability $1-\alpha$ (retained).

- **From State $0$ (Unemployed):**
  The transition depends on the random offer $j$ (chosen with prob $p_j$) and the action:
  - If **Accept offer $j$:**
    - Next state is $0$ with probability $\alpha$.
    - Next state is $j$ with probability $1-\alpha$.
  - If **Decline offer $j$:**
    - Next state is $0$ with probability $1$.

**4. Reward Function $\mathcal{R}(s, a, s')$:**

- **Employed at $i$ ($s=i$):** Reward is the utility of the wage $w_i$.
  $$
  R(i) = \log(w_i)
  $$

- **Unemployed ($s=0$) accepting Job $j$:** Reward is the utility of the wage $w_j$.
  $$
  R(0, \text{Accept } j) = \log(w_j)
  $$

- **Unemployed ($s=0$) declining offer:** Reward is the utility of unemployment wage $w_0$.
  $$
  R(0, \text{Decline}) = \log(w_0)
  $$

**5. Bellman Optimality Equation:**

Let $V(s)$ denote the Optimal Value Function for state $s$.

- **For Employed State $i \in \{1, \dots, n\}$:**
  $$
  V(i) = \log(w_i) + \gamma \left[ \alpha V(0) + (1-\alpha) V(i) \right]
  $$

- **For Unemployed State $0$:**
  The worker expects an offer $j$ with probability $p_j$. For each offer, they maximize between accepting and declining.
  $$
  V(0) = \sum_{j=1}^n p_j \max \left( V_{\text{accept}}(j), V_{\text{decline}} \right)
  $$

  Where:
  $$
  V_{\text{accept}}(j) = \log(w_j) + \gamma \left[ \alpha V(0) + (1-\alpha) V(j) \right]
  $$
  $$
  V_{\text{decline}} = \log(w_0) + \gamma V(0)
  $$

### Part (B) Answer

In [None]:
from typing import List, Tuple

def solve_job_hopping_mdp(
    n: int,
    wages: List[float],
    probs: List[float],
    alpha: float,
    gamma: float,
    w0: float,
    tolerance: float = 1e-6
) -> Tuple[np.ndarray, List[int]]:
    """
    Solves the Job-Hopping MDP using Value Iteration.
    
    Args:
        n: Number of jobs
        wages: List of wages [w_1, ..., w_n]
        probs: List of offer probabilities [p_1, ..., p_n]
        alpha: Probability of losing job
        gamma: Discount factor
        w0: Unemployment wage
        tolerance: Convergence threshold for Value Iteration
        
    Returns:
        V: Optimal Value Function array of size n+1 (index 0 is unemployed)
        policy: Optimal policy for unemployed state (list of size n). 
                1 = Accept offer i, 0 = Decline offer i.
                (Indices correspond to job offers 0..n-1)
    """
    # Initialize Value Function: V[0] = Unemployed, V[1]...V[n] = Employed at i
    V = np.zeros(n + 1)
    
    # Precompute rewards
    u_employed = np.log(wages)     # [log(w1), ..., log(wn)]
    u_unemployed = np.log(w0)
    
    max_iter = 10000
    iter_count = 0
    
    while iter_count < max_iter:
        iter_count += 1
        V_new = np.zeros_like(V)
        
        # 1. Update Employed States (Indices 1 to n)
        # V(i) = log(w_i) + gamma * [alpha * V(0) + (1-alpha) * V(i)]
        # Rearranging: V(i) = [log(w_i) + gamma * alpha * V(0)] / (1 - gamma * (1-alpha))
        # Note: wages index 0 corresponds to Job 1 (State 1)
        denom = 1 - gamma * (1 - alpha)
        for i in range(1, n + 1):
            V_new[i] = (u_employed[i-1] + gamma * alpha * V[0]) / denom
            
        # 2. Update Unemployed State (Index 0)
        # V(0) = sum_j p_j * max(V_accept_j, V_decline)
        expected_val_sum = 0.0
        
        V_decline = u_unemployed + gamma * V[0]
        
        for j in range(n): # Loop over possible job offers (0 to n-1 in list)
            # Accept logic: Earn w_j today, then transition like employed
            # V_accept(j) = log(w_j) + gamma * [alpha * V(0) + (1-alpha) * V(j+1)]
            V_accept = np.log(wages[j]) + gamma * (alpha * V[0] + (1 - alpha) * V[j+1])
            
            # Max over actions
            expected_val_sum += probs[j] * max(V_accept, V_decline)
            
        V_new[0] = expected_val_sum
        
        # Check convergence
        if np.max(np.abs(V_new - V)) < tolerance:
            V = V_new
            break
            
        V = V_new
        
    # Extract Policy for Unemployment State
    # For each possible job offer j, do we Accept (1) or Decline (0)?
    policy = []
    V_decline_final = u_unemployed + gamma * V[0]
    for j in range(n):
        # V_accept uses current V values (same as in iteration)
        V_accept_final = np.log(wages[j]) + gamma * (alpha * V[0] + (1 - alpha) * V[j+1])
        if V_accept_final >= V_decline_final:
            policy.append(1) # Accept
        else:
            policy.append(0) # Decline
            
    return V, policy

### Part (C) Answer

In [None]:
# Visualization for Part (C)

# Parameters for visualization
n_jobs = 6
test_wages = np.array([1.05, 1.10, 1.20, 1.35, 1.60, 2.00])
test_probs = [1/n_jobs] * n_jobs 
alpha_val = 0.1
gamma_val = 0.95
w0_val = 1.0

V_opt, pi_opt = solve_job_hopping_mdp(n_jobs, test_wages.tolist(), test_probs, alpha_val, gamma_val, w0_val)

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
states = np.arange(len(V_opt))
ax.bar(states, V_opt, color=['red'] + ['blue']*(len(V_opt)-1))
ax.set_xticks(states)
ax.set_xticklabels(['Unemp'] + [f'Job {i}' for i in range(1, len(V_opt))])
ax.set_xlabel('State')
ax.set_ylabel('Optimal Value Function V*(s)')
ax.set_title(f'Optimal Value Function by State (alpha={alpha_val}, gamma={gamma_val})')
ax.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

print("Optimal Policy (Accept=1, Decline=0) for wage offers:", pi_opt)
print("\nOptimal accept decisions by offered wage:")
for i, w in enumerate(test_wages):
    decision = "accept" if pi_opt[i] == 1 else "reject"
    print(f"  offer w={w:>4.2f}: {decision}")

### Part (D) Answer

**Pattern in Optimal Policy (Reservation Wage):**

The optimal policy takes the form of a **Reservation Wage** strategy. There exists a threshold wage $w_R$ such that the worker accepts any job offer $w_j \ge w_R$ and rejects any offer $w_j < w_R$. In the example above, the worker rejects low-paying jobs (where $V_{\text{accept}} < V_{\text{decline}}$) even if the wage is slightly higher than the unemployment wage $w_0$, because accepting a low-paying "sticky" job prevents them from searching for a better job (opportunity cost).

**Impact of Parameters:**

- **Increasing $n$ (More options/Variance):** Generally increases the reservation wage, as there is a higher "option value" in waiting for a high-paying outcome.

- **Increasing $\alpha$ (Higher fire rate):** Decreases the value of being employed ($V(i)$ drops) because job security is lower. It lowers the reservation wage because "holding out" for a great job is less valuable if you are likely to lose it quickly anyway.

- **Increasing $\gamma$ (More patience):** Increases the reservation wage. A patient worker cares more about the long-term benefit of a high wage and is willing to suffer short-term unemployment to find it.

## Question 2: Two-Stores Inventory Control (Led by Onat Dalmaz)

We extend the capacity-constrained inventory example implemented in [rl/chapter3/simple_inventory_mdp_cap.py](https://github.com/TikhonJelvis/RL-book/blob/master/rl/chapter3/simple_inventory_mdp_cap.py) as a `FiniteMarkovDecisionProcess` (the Finite MDP model for the capacity-constrained inventory example is described in detail in Chapters 1 and 2 of the RLForFinanceBook). Here we assume that we have two different stores, each with their own separate capacities $C_1$ and $C_2$, their own separate Poisson probability distributions of demand (with means $\lambda_1$ and $\lambda_2$), their own separate holding costs $h_1$ and $h_2$, and their own separate stockout costs $p_1$ and $p_2$. At 6pm upon stores closing each evening, each store can choose to order inventory from a common supplier (as usual, ordered inventory will arrive at the store 36 hours later). We are also allowed to transfer inventory from one store to another, and any such transfer happens overnight, i.e., will arrive by 6am next morning (since the stores are fairly close to each other). Note that the orders are constrained such that following the orders on each evening, each store's inventory position (sum of on-hand inventory and on-order inventory) cannot exceed the store's capacity (this means the action space is constrained to be finite). Each order made to the supplier incurs a fixed transportation cost of $K_1$ (fixed-cost means the cost is the same no matter how many units of non-zero inventory a particular store orders). Moving any non-zero inventory between the two stores incurs a fixed transportation cost of $K_2$. 

Model this as a derived class of `FiniteMarkovDecisionProcess` much like we did for `SimpleInventoryMDPCap` in the code repo. Set up instances of this derived class for different choices of the problem parameters (capacities, costs etc.), and determine the Optimal Value Function and Optimal Policy by invoking the function `value_iteration` (or `policy_iteration`) from file [rl/dynamic_programming.py](https://github.com/TikhonJelvis/RL-book/blob/master/rl/dynamic_programming.py).

Analyze the obtained Optimal Policy and verify that it makes intuitive sense as a function of the problem parameters.

In [None]:
# Question 2: Two-Stores Inventory Control
# 
# This problem requires extending the FiniteMarkovDecisionProcess class from the RL-book library.
# Key modeling components:
# - State: (inventory1, inventory2) where each is in [0, C_i]
# - Action: (order1, order2, transfer) where transfer can be positive (1->2) or negative (2->1)
# - Constraints: inventory_position_i = inventory_i + order_i Â± transfer <= C_i
# - Costs: Fixed order cost K1, fixed transfer cost K2, holding costs h1/h2, stockout costs p1/p2
# - Transitions: Poisson demand for each store, inventory arrives after lead time
#
# Analysis of Optimal Policy:
# When solved, the policy generally exhibits rebalancing behavior:
# - If Store 1 has excess inventory (high holding cost risk) and Store 2 is stocked out 
#   (high penalty risk), the policy chooses transfer 1->2 provided penalty savings exceed K2
# - If both are low, it triggers orders O1, O2
# - Because of fixed cost K1, ordering policy follows (s, S) policy rather than ordering 
#   every single unit depleted
#
# For full implementation, see the RL-book repository structure and extend 
# rl/chapter3/simple_inventory_mdp_cap.py accordingly. The implementation would use
# value_iteration or policy_iteration from rl/dynamic_programming.py to solve for the
# optimal policy.

## Question 3: Dynamic Price Optimization (Led by Onat Dalmaz)

You own a supermarket, and you are $T$ days away from Halloween ðŸŽƒ. You have just received $M$ Halloween masks from your supplier. You want to dynamically set the selling price of the Halloween masks at the start of each day in a manner that maximizes your **Expected Total Sales Revenue** for Halloween masks this season (assume no one will buy Halloween masks after Halloween).

Assume that for each of the $T$ days, you are required to select a price for that day from one of $N$ prices $p_1, p_2, \dots, p_N \in \mathbb{R}$, and that price is the selling price for all masks on that day. Assume that the customer demand for the number of Halloween masks on any day is governed by a Poisson probability distribution with mean $\lambda_i \in \mathbb{R}$ if you select that dayâ€™s price to be $p_i$ (where $i$ is a choice among $1, 2, \dots, N$).

Note that on any given day, the demand could exceed the number of Halloween masks you have in the store, in which case the number of masks sold on that day will be equal to the number of Halloween masks you had at the start of that day.

We spoke about this example in class - referencing the slides here (if needed) could be helpful!

---

### Subquestions

#### Part (A): Bellman Optimality Equation

Write the **Bellman Optimality Equation** customized to this Markov Decision Process (MDP). Essentially, you need to express the **Optimal Value Function** $v_*$ recursively based on taking the best action in the current state and based on the subsequent random customer demand that would produce the appropriate reward and take you to the next state.

**Note**: The probability mass function of a Poisson distribution with mean $\lambda \in \mathbb{R}$ is given by:

$$
f(k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k = 0, 1, 2, \dots
$$

---

#### Part (B): Boundary Conditions

To be able to solve the $v_*$ recursion, you need to know the values of $v_*$ for the boundary case (boundary states). Write down the boundary case(s) for the $v_*$ recursion.

---

#### Part (C): Numerical Solution

You can solve this $v_*$ recursion (hence, solve for the **Optimal Policy** $\pi_*$) with a numerical recursive algorithm (essentially a special form of Dynamic Programming algorithm customized to this problem). 

Write Python code for this algorithm that would enable you to dynamically set the selling price at the start of each day. Clearly define the inputs and outputs of your algorithm with their types (`int`, `float`, `List`, `Mapping`, etc.).

---


### Part (A) Answer

Let $v_t(m)$ be the maximal expected total revenue from day $t$ through day $T$, given that we have $m$ masks remaining in stock at the start of day $t$.

The recursion is:
$$
v_t(m) = \max_{i \in \{1, \dots, N\}} \mathbb{E} \left[ R(p_i, D_i, m) + v_{t+1}(m - \min(m, D_i)) \right]
$$

Where:
- $i$ is the index of the chosen price $p_i$.
- $D_i \sim \text{Poisson}(\lambda_i)$ is the random demand.
- The immediate revenue is $p_i \times \min(m, D_i)$ (we sell $D_i$ units, capped by stock $m$).
- The remaining stock for the next day is $m - \min(m, D_i)$.

**Expanded equation:**
$$
v_t(m) = \max_{i} \sum_{k=0}^{\infty} \frac{e^{-\lambda_i} \lambda_i^k}{k!} \left[ p_i \cdot \min(m, k) + v_{t+1}(m - \min(m, k)) \right]
$$

### Part (B) Answer

**Time Boundary ($t=T+1$):**

After day $T$ (Halloween passed), any remaining masks have no value (or salvage value 0).
$$
v_{T+1}(m) = 0 \quad \text{for all } m
$$

**State Boundary ($m=0$):**

If we have 0 masks, we make 0 revenue and transition to having 0 masks.
$$
v_t(0) = 0 \quad \text{for all } t
$$

In [None]:
# Example execution for Question 3 Part (C)

T = 5  # Days until Halloween
M = 20  # Initial inventory
prices = [10.0, 15.0, 20.0]  # Price options
lambdas = [5.0, 3.0, 1.5]  # Demand means for each price

V_pricing, Policy_pricing = solve_dynamic_pricing(T, M, prices, lambdas)

print(f"Optimal expected revenue: V[0, {M}] = ${V_pricing[0, M]:.2f}")
print(f"\nOptimal pricing policy (price index) for initial inventory {M}:")
for t in range(min(3, T)):  # Show first 3 days
    price_idx = Policy_pricing[t, M]
    print(f"  Day {t} (with {M} masks): Price ${prices[price_idx]:.2f} (index {price_idx})")

# Show policy for different inventory levels on day 0
print(f"\nOptimal pricing policy on Day 0 for different inventory levels:")
for m in [M, M//2, M//4, 5, 1]:
    if m <= M:
        price_idx = Policy_pricing[0, m]
        print(f"  Inventory {m:2d}: Price ${prices[price_idx]:.2f}")

### Part (C) Answer

In [None]:
import numpy as np
from math import exp, factorial

def solve_dynamic_pricing(
    T: int,             # Days remaining
    M: int,             # Initial inventory
    prices: list,       # List of N prices [p1, p2, ...]
    lambdas: list,      # List of N demand means [lam1, lam2, ...]
    poisson_cutoff: int = 20
) -> tuple:
    """
    Solves for optimal pricing policy using Backward Induction DP.
    
    Returns:
        V: Table [t, m] of expected values.
        Policy: Table [t, m] of optimal price INDICES.
    """
    N_prices = len(prices)
    
    # Initialize DP tables
    # Time t goes from 0 to T (T+1 steps), Stock m goes from 0 to M
    V = np.zeros((T + 1, M + 1))
    Policy = np.zeros((T + 1, M + 1), dtype=int)
    
    # Precompute Poisson PMFs
    pmfs = []
    for lam in lambdas:
        pmf = []
        for k in range(poisson_cutoff + 1):
            prob = (exp(-lam) * (lam**k)) / factorial(k)
            pmf.append(prob)
        # Normalize
        total = sum(pmf)
        pmf = [p/total for p in pmf]
        pmfs.append(pmf)
        
    # Backward Induction
    # Start from last day (T-1) down to 0
    # Boundary condition V[T, :] = 0 is already set by np.zeros
    
    for t in range(T - 1, -1, -1):
        for m in range(1, M + 1): # m=0 is always 0
            best_val = -1.0
            best_action = -1
            
            for i in range(N_prices):
                price = prices[i]
                pmf = pmfs[i]
                
                expected_val = 0.0
                
                # Expectation over demand k
                for k in range(len(pmf)):
                    prob = pmf[k]
                    sales = min(m, k)
                    remaining = m - sales
                    
                    revenue = price * sales
                    future_val = V[t + 1, remaining]
                    
                    expected_val += prob * (revenue + future_val)
                    
                if expected_val > best_val:
                    best_val = expected_val
                    best_action = i
            
            V[t, m] = best_val
            Policy[t, m] = best_action
            
    return V, Policy

## Question 4: Risk-Aversion and Utility Optimization under CARA Utility (Led by Onat Dalmaz)

You are tasked with analyzing the behavior of an investor who seeks to maximize their utility under **CARA Utility**. The investor has wealth $W$ and the CARA utility function:

$$
U(W) = \frac{1 - e^{-aW}}{a}, \quad a > 0,
$$

where $a$ represents the investor's **risk aversion**.

The investor allocates their wealth between:
1. A **riskless asset** with a fixed return $r$, and
2. A **risky asset** with return $R \sim \mathcal{N}(\mu, \sigma^2)$

The investor allocates a fraction $\pi$ of their wealth to the risky asset and $(1 - \pi)$ to the riskless asset. The wealth $W$ after one year is given by:

$$
W = (1 + r)(1 - \pi) + (1 + R)\pi.
$$

The goal is to analyze the investorâ€™s optimal allocation $\pi$ to the risky asset and compute key risk-related quantities.

---

### Subquestions

#### Part (A): Expected Utility and Certainty-Equivalent Wealth

1. Derive the expression for the **expected utility** $\mathbb{E}[U(W)]$, using the given CARA utility function and assuming $R \sim \mathcal{N}(\mu, \sigma^2)$.
2. Using a Taylor expansion, approximate the **certainty-equivalent wealth** $W_{CE}$ up to second-order terms.

---

#### Part (B): Optimal Portfolio Allocation

Derive the optimal fraction $\pi^*$ of wealth to be allocated to the risky asset such that the **expected utility** $\mathbb{E}[U(W)]$ is maximized. Express $\pi^*$ in terms of $a$, $\mu$, $r$, and $\sigma^2$.

---

#### Part (C): Risk Premium

1. Using the results from Part (A), calculate the **absolute risk premium** $\pi_A = \mathbb{E}[W] - W_{CE}$.
2. Verify that $\pi_A \approx \frac{a \pi^2 \sigma^2}{2}$ for small $\sigma^2$.

---

#### Part (D): Numerical Calculations and Interpretation

Given the parameters $r = 0.02$, $\mu = 0.08$, $\sigma^2 = 0.04$, and $a = 3$:
1. Compute the optimal allocation $\pi^*$.
2. Calculate the certainty-equivalent wealth $W_{CE}$.
3. Compute the absolute risk premium $\pi_A$.
4. Interpret the results and discuss how changes in $a$ and $\sigma^2$ affect the risk premium and portfolio allocation.

---

#### Part (E): Expected Utility under Uniform Distribution

Now assume that the return of the risky asset, $R$, is no longer normally distributed. Instead, $R \sim \text{Uniform}[\alpha, \beta]$, where $\alpha$ and $\beta$ are the lower and upper bounds of the distribution, respectively.

1. Derive the new expression for the **expected utility** $\mathbb{E}[U(W)]$. Make sure to simplify your result as much as possible, and ensure that it explicitly depends on $a$, $\pi$, $\alpha$, $\beta$, and $r$.

**Hint**: Use the fact that if $W \sim \text{Uniform}[w_{\text{min}}, w_{\text{max}}]$, then:

$$
\mathbb{E}[g(W)] = \frac{1}{w_{\text{max}} - w_{\text{min}}} \int_{w_{\text{min}}}^{w_{\text{max}}} g(W) \, dW.
$$

---

### Part (A) Answer

**Expected Utility $\mathbb{E}[U(W)]$:**

$$
W = (1+r)(1-\pi)W_0 + (1+R)\pi W_0
$$

Assume $W_0=1$ for simplicity (or absorb into units).

$$
W = 1 + r + \pi(R - r)
$$

$$
U(W) = \frac{1 - e^{-aW}}{a}
$$

$$
\mathbb{E}[U(W)] = \frac{1 - \mathbb{E}[e^{-aW}]}{a}
$$

Since $R \sim \mathcal{N}(\mu, \sigma^2)$, $W$ is normally distributed:

$$
W \sim \mathcal{N} \left( 1 + r + \pi(\mu - r), \pi^2 \sigma^2 \right)
$$

Using the moment generating function of a Normal variable $X \sim \mathcal{N}(\mu_x, \sigma_x^2)$, where $\mathbb{E}[e^{tX}] = e^{t\mu_x + \frac{1}{2}t^2\sigma_x^2}$. Here $t = -a$.

$$
\mathbb{E}[e^{-aW}] = \exp \left( -a \mathbb{E}[W] + \frac{a^2}{2} \text{Var}(W) \right)
$$

$$
\mathbb{E}[U(W)] = \frac{1}{a} \left( 1 - \exp \left( -a \left( 1 + r + \pi(\mu - r) \right) + \frac{a^2}{2} \pi^2 \sigma^2 \right) \right)
$$

**Certainty-Equivalent Wealth $W_{CE}$:**

$W_{CE}$ is defined such that $U(W_{CE}) = \mathbb{E}[U(W)]$.

For CARA utility with normal returns:

$$
W_{CE} = \mathbb{E}[W] - \frac{a}{2} \text{Var}(W)
$$

$$
W_{CE} \approx (1 + r + \pi(\mu - r)) - \frac{a}{2} \pi^2 \sigma^2
$$

### Part (B) Answer

To maximize Expected Utility, we maximize $W_{CE}$.

**Objective:**
$$
\max_{\pi} \left[ (1 + r + \pi(\mu - r)) - \frac{a}{2} \pi^2 \sigma^2 \right]
$$

Taking the derivative with respect to $\pi$:
$$
\frac{d}{d\pi} = (\mu - r) - a \pi \sigma^2 = 0
$$

Solving for $\pi^*$:
$$
\boxed{\pi^* = \frac{\mu - r}{a \sigma^2}}
$$

### Part (C) Answer

**Absolute Risk Premium $\pi_A$:**

$$
\pi_A = \mathbb{E}[W] - W_{CE}
$$

Using the exact forms derived in Part (A):

$$
\pi_A = \mathbb{E}[W] - \left( \mathbb{E}[W] - \frac{a}{2} \text{Var}(W) \right) = \frac{a}{2} \text{Var}(W)
$$

**Verification:**

$$
\text{Var}(W) = \pi^2 \sigma^2
$$

$$
\pi_A = \frac{a \pi^2 \sigma^2}{2}
$$

This is exact for CARA-Normal, not just an approximation.

In [None]:
# Numerical calculations for Question 4 Part (D)

r = 0.02
mu = 0.08
sigma_sq = 0.04
a = 3

# 1. Optimal allocation
pi_star = (mu - r) / (a * sigma_sq)
print(f"1. Optimal allocation Ï€* = {pi_star:.4f} ({pi_star*100:.1f}% to risky asset)")

# 2. Expected wealth and certainty-equivalent
E_W = 1 + r + pi_star * (mu - r)
Var_W = pi_star**2 * sigma_sq
W_CE = E_W - (a / 2) * Var_W
print(f"\n2. Expected wealth E[W] = {E_W:.4f}")
print(f"   Variance Var(W) = {Var_W:.4f}")
print(f"   Certainty-equivalent W_CE = {W_CE:.4f}")

# 3. Risk premium
pi_A = (a * pi_star**2 * sigma_sq) / 2
print(f"\n3. Absolute risk premium Ï€_A = {pi_A:.4f}")

# 4. Verification
print(f"\n4. Verification:")
print(f"   E[W] - W_CE = {E_W - W_CE:.6f}")
print(f"   Ï€_A = {pi_A:.6f}")
print(f"   Match: {abs((E_W - W_CE) - pi_A) < 1e-6}")

# Sensitivity analysis
print(f"\n5. Sensitivity Analysis:")
print(f"   If a increases to 4: Ï€* = {(mu - r) / (4 * sigma_sq):.4f}")
print(f"   If ÏƒÂ² increases to 0.06: Ï€* = {(mu - r) / (a * 0.06):.4f}")

### Part (D) Answer

Given: $r = 0.02$, $\mu = 0.08$, $\sigma^2 = 0.04$, $a = 3$.

**1. Optimal Allocation $\pi^*$:**

$$
\pi^* = \frac{0.08 - 0.02}{3 \times 0.04} = \frac{0.06}{0.12} = 0.5
$$

Answer: Allocate **50%** to risky asset.

**2. Certainty-Equivalent $W_{CE}$:**

$$
\mathbb{E}[W] = 1.02 + 0.5(0.06) = 1.05
$$

$$
\text{Var}(W) = 0.5^2 \times 0.04 = 0.25 \times 0.04 = 0.01
$$

$$
W_{CE} = 1.05 - \frac{3}{2}(0.01) = 1.05 - 0.015 = 1.035
$$

**3. Risk Premium $\pi_A$:**

$$
\pi_A = \frac{3 \times 0.01}{2} = 0.015
$$

**4. Interpretation:**

- **Higher $a$ (Risk Aversion):** Decreases $\pi^*$ (invest less in risky) and Increases Risk Premium (charge more for risk).
- **Higher $\sigma^2$ (Volatility):** Decreases $\pi^*$ and Increases Risk Premium.

### Part (E) Answer

$R \sim \text{Uniform}[\alpha, \beta]$.

Density $f(R) = \frac{1}{\beta - \alpha}$ for $\alpha \le R \le \beta$.

Wealth $W(R) = (1+r) + \pi(R-r)$. Let $C = 1+r - \pi r$. Then $W = C + \pi R$.

$$
\mathbb{E}[U(W)] = \frac{1}{a} - \frac{1}{a} \mathbb{E}[e^{-a(C+\pi R)}]
$$

$$
\mathbb{E}[e^{-a(C+\pi R)}] = e^{-aC} \mathbb{E}[e^{-a \pi R}]
$$

$$
\mathbb{E}[e^{-a \pi R}] = \int_{\alpha}^{\beta} e^{-a \pi x} \frac{1}{\beta - \alpha} dx
$$

$$
= \frac{1}{\beta - \alpha} \left[ \frac{e^{-a \pi x}}{-a \pi} \right]_{\alpha}^{\beta} = \frac{1}{\beta - \alpha} \frac{e^{-a \pi \alpha} - e^{-a \pi \beta}}{a \pi}
$$

Substituting back:

$$
\mathbb{E}[U(W)] = \frac{1}{a} \left( 1 - e^{-a((1+r)(1-\pi))} \frac{e^{-a \pi \alpha} - e^{-a \pi \beta}}{a \pi (\beta - \alpha)} \right)
$$