---

### General Formulation of Nested Logit Model for Simulation and Estimation

#### **1. Data Structure**
- **Alternatives:** $ \mathcal{A} = \{\text{car}, \text{bike}, \text{train}, \text{metro}\} $, with $ J = 4 $.
- **Attributes:**
  - $ \text{time}_j $: Travel time for alternative $ j $.
  - $ \text{cost}_j $: Travel cost for alternative $ j $.
- **Nests:**
  - **Private Transport ($ \mathcal{N}_1 $):** $ \{\text{car}, \text{bike}\} $
  - **Public Transport ($ \mathcal{N}_2 $):** $ \{\text{train}, \text{metro}\} $

#### **2. Utility Function**
The utility for each individual $ n $ and alternative $ j $ is:
$$
U_{nj} = V_{nj} + \epsilon_{nj},
$$
where:
- **Systematic Utility:**
  $$
  V_{nj} = \text{ASC}_j + \beta_{\text{time}} \cdot \text{time}_{nj} + \beta_{\text{cost}} \cdot \text{cost}_{nj},
  $$
  - $ \text{ASC}_j $: Alternative-specific constant for $ j $.
  - $ \beta_{\text{time}}, \beta_{\text{cost}} $: Coefficients for travel time and cost.

#### **3. Nest-Level Correlation**
- **Scale Parameters ($ \gamma_k $):**
  - $ \gamma_{\text{private}} = 1.0 $: Fixed for private transport.
  - $ \gamma_{\text{public}} \in (0, 1] $: Estimated for public transport.

---

### **4. Choice Probability**

The probability of individual $ n $ choosing alternative $ j $ in nest $ k $ is:
$$
\mathbb{P}_{nj} = \mathbb{P}_{n}(k) \cdot \mathbb{P}_{nj|k},
$$
where:

#### **4.1 Within-Nest Probability**
$$
\mathbb{P}_{nj|k} = \frac{\exp\left(\frac{V_{nj}}{\gamma_k}\right)}{\sum_{i \in \mathcal{N}_k} \exp\left(\frac{V_{ni}}{\gamma_k}\right)}.
$$

#### **4.2 Probability of Choosing a Nest**
$$
\mathbb{P}_{n}(k) = \frac{\left( \sum_{i \in \mathcal{N}_k} \exp\left(\frac{V_{ni}}{\gamma_k}\right) \right)^{\gamma_k}}{\sum_{m=1}^{2} \left( \sum_{i \in \mathcal{N}_m} \exp\left(\frac{V_{ni}}{\gamma_m}\right) \right)^{\gamma_m}}.
$$

#### **4.3 Combined Probability**
$$
\mathbb{P}_{nj} = \frac{\exp\left(\frac{V_{nj}}{\gamma_k}\right)}{\sum_{i \in \mathcal{N}_k} \exp\left(\frac{V_{ni}}{\gamma_k}\right)} \cdot \frac{\left( \sum_{i \in \mathcal{N}_k} \exp\left(\frac{V_{ni}}{\gamma_k}\right) \right)^{\gamma_k}}{\sum_{m=1}^{2} \left( \sum_{i \in \mathcal{N}_m} \exp\left(\frac{V_{ni}}{\gamma_m}\right) \right)^{\gamma_m}}.
$$

---

### **5. Log-Likelihood Function**

#### **5.1 Observed Data**
For each individual $ n $, let:
- $ y_{nj} $: Binary indicator if alternative $ j $ is chosen.

#### **5.2 Log-Likelihood**
$$
\mathcal{L} = \sum_{n=1}^{N} \sum_{j=1}^{J} y_{nj} \cdot \log(\mathbb{P}_{nj}),
$$
where $ \mathbb{P}_{nj} $ is the combined probability for $ j $.

---

### **6. Key Parameters**
- **ASC coefficients ($ \text{ASC}_j $):** Alternative-specific constants for non-baseline alternatives.
- **$ \beta $:** Coefficients for attributes (e.g., travel time and cost).
- **$ \gamma_k $:** Scale parameters for nests (e.g., public vs private transport).

---

### **7. Special Cases**
1. **Multinomial Logit (MNL):**
   - If $ \gamma_k = 1 $ for all $ k $, the Nested Logit reduces to the MNL model, with no within-nest correlation.
2. **Perfect Correlation:**
   - If $ \gamma_k \to 0 $, alternatives within the nest are perfectly correlated and act as a single choice.
```

In [1]:
import numpy as np
import pandas as pd
import jax
import jax.numpy as jnp
from jax import grad, hessian
from scipy.optimize import minimize
from scipy import stats

# True parameters:
true_params = {
    'asc_bike': 0.3,
    'asc_train': 1.0,
    'asc_metro': 1.2,
    'beta_time': -0.05,
    'beta_cost': -0.08,
    'gamma_public': 0.7,  # gamma_private fixed at 1
}

def generate_nested_logit_data(
    n_obs=10000,
    asc_bike=0.3,
    asc_train=1.0,
    asc_metro=1.2,
    beta_time=-0.05,
    beta_cost=-0.08,
    gamma_public=0.7,
    gamma_private=1.0,
    seed=42
):
    np.random.seed(seed)
    alts = np.array(['car', 'bike', 'train', 'metro'])
    n_alt = 4

    time = np.random.uniform(5, 100, size=(n_obs, n_alt))
    cost = np.random.uniform(1, 20, size=(n_obs, n_alt))

    V_car = 0.0 + beta_time*time[:,0] + beta_cost*cost[:,0]
    V_bike = asc_bike + beta_time*time[:,1] + beta_cost*cost[:,1]
    V_train = asc_train + beta_time*time[:,2] + beta_cost*cost[:,2]
    V_metro = asc_metro + beta_time*time[:,3] + beta_cost*cost[:,3]
    V = np.column_stack([V_car, V_bike, V_train, V_metro])

    def nest_exp(V_slice, gamma):
        maxV = np.max(V_slice, axis=1, keepdims=True)
        return np.exp((V_slice - maxV)/gamma)

    public_idx = [2,3]   # train, metro
    private_idx = [0,1]  # car, bike

    V_public = V[:, public_idx]
    exp_public = nest_exp(V_public, gamma_public)
    sum_exp_public = np.sum(exp_public, axis=1, keepdims=True)

    V_private = V[:, private_idx]
    exp_private = nest_exp(V_private, gamma_private)
    sum_exp_private = np.sum(exp_private, axis=1, keepdims=True)

    upper_public = (sum_exp_public**gamma_public)
    upper_private = (sum_exp_private**gamma_private)
    denom_upper = upper_public + upper_private

    P_public = upper_public / denom_upper
    P_private = upper_private / denom_upper

    P_train = (exp_public[:,0:1]/sum_exp_public)*P_public
    P_metro = (exp_public[:,1:2]/sum_exp_public)*P_public
    P_car = (exp_private[:,0:1]/sum_exp_private)*P_private
    P_bike = (exp_private[:,1:2]/sum_exp_private)*P_private

    P_final = np.hstack([P_car, P_bike, P_train, P_metro])

    choices = []
    for i in range(n_obs):
        c = np.random.choice(n_alt, p=P_final[i])
        choices.append(c)
    choices = np.array(choices)

    df_list = []
    for i in range(n_obs):
        for j, alt in enumerate(alts):
            df_list.append({
                'id': i,
                'alt': alt,
                'time': time[i,j],
                'cost': cost[i,j],
                'choice': 1 if choices[i] == j else 0
            })
    df = pd.DataFrame(df_list)

    choice_shares = df.groupby('alt')['choice'].mean()
    print("Choice Shares:\n", choice_shares)

    return df

df = generate_nested_logit_data(
    n_obs=20000,
    asc_bike=true_params['asc_bike'],
    asc_train=true_params['asc_train'],
    asc_metro=true_params['asc_metro'],
    beta_time=true_params['beta_time'],
    beta_cost=true_params['beta_cost'],
    gamma_public=true_params['gamma_public'],
    gamma_private=1.0
)


class NestedLogitModel:
    def __init__(
        self,
        df: pd.DataFrame,
        id_col: str,
        choice_col: str,
        alt_col: str,
        time_col: str,
        cost_col: str
    ):
        # Alternatives: car(0), bike(1), train(2), metro(3)
        # Baseline: car => asc_car=0
        # Parameters to estimate:
        # asc_bike, asc_train, asc_metro, beta_time, beta_cost, gamma_public
        # gamma_private = 1.0 fixed
        self.alternatives = ['car', 'bike', 'train', 'metro']
        self.n_alt = 4
        self.n_features = 6

        df = df.copy()
        df['alt'] = pd.Categorical(df[alt_col], categories=self.alternatives, ordered=True)
        df = df.sort_values([id_col, alt_col])
        self.n_obs = df[id_col].nunique()

        y = df[choice_col].values.reshape(self.n_obs, self.n_alt)
        time_wide = df[time_col].values.reshape(self.n_obs, self.n_alt)
        cost_wide = df[cost_col].values.reshape(self.n_obs, self.n_alt)

        self.y = jnp.array(y)
        self.time = jnp.array(time_wide)
        self.cost = jnp.array(cost_wide)
        self.gamma_private = 1.0

        self.public_idx = jnp.array([2,3])
        self.private_idx = jnp.array([0,1])

    def log_likelihood(self, beta: jnp.ndarray) -> float:
        asc_bike = beta[0]
        asc_train = beta[1]
        asc_metro = beta[2]
        beta_time = beta[3]
        beta_cost = beta[4]
        gamma_public = beta[5]

        gamma_private = self.gamma_private

        V_car = 0.0 + beta_time*self.time[:,0] + beta_cost*self.cost[:,0]
        V_bike = asc_bike + beta_time*self.time[:,1] + beta_cost*self.cost[:,1]
        V_train = asc_train + beta_time*self.time[:,2] + beta_cost*self.cost[:,2]
        V_metro = asc_metro + beta_time*self.time[:,3] + beta_cost*self.cost[:,3]

        V = jnp.stack([V_car, V_bike, V_train, V_metro], axis=1)

        def nest_exp(V_sub, gamma):
            maxV = jnp.max(V_sub, axis=1, keepdims=True)
            return jnp.exp((V_sub - maxV)/gamma)

        V_public = V[:, self.public_idx]
        exp_public = nest_exp(V_public, gamma_public)
        sum_exp_public = jnp.sum(exp_public, axis=1, keepdims=True)

        V_private = V[:, self.private_idx]
        exp_private = nest_exp(V_private, gamma_private)
        sum_exp_private = jnp.sum(exp_private, axis=1, keepdims=True)

        upper_public = sum_exp_public**gamma_public
        upper_private = sum_exp_private**gamma_private
        denom_upper = upper_public + upper_private

        P_public = upper_public / denom_upper
        P_private = upper_private / denom_upper

        # public = train(2), metro(3)
        P_train = (exp_public[:,0:1]/sum_exp_public)*P_public
        P_metro = (exp_public[:,1:2]/sum_exp_public)*P_public

        # private = car(0), bike(1)
        P_car = (exp_private[:,0:1]/sum_exp_private)*P_private
        P_bike = (exp_private[:,1:2]/sum_exp_private)*P_private

        P_final = jnp.hstack([P_car, P_bike, P_train, P_metro])

        eps = 1e-15
        log_probs = jnp.log(P_final + eps)
        ll = jnp.sum(self.y * log_probs)
        return -ll

    def fit_with_inference(self):
        # Starting values close to true parameters:
        # asc_bike=0.3, asc_train=1.0, asc_metro=1.2, beta_time=-0.05, beta_cost=-0.08, gamma_public=0.7
        initial_guess = np.array([0.3, 1.0, 1.2, -0.05, -0.08, 0.7])

        grad_func = grad(self.log_likelihood)
        hess_func = hessian(self.log_likelihood)

        def objective(b):
            return float(self.log_likelihood(b))

        # Bounds:
        # For gamma_public: (1e-3, 1.0)
        # For other parameters, no bounds needed, assume no sign constraints
        bounds = [(None, None),  # asc_bike
                  (None, None),  # asc_train
                  (None, None),  # asc_metro
                  (None, None),  # beta_time
                  (None, None),  # beta_cost
                  (1e-3, 1.0)]   # gamma_public in (0,1]

        result = minimize(
            fun=objective,
            x0=initial_guess,
            method='L-BFGS-B',
            jac=lambda b: np.array(grad_func(jnp.array(b))),
            bounds=bounds,
            options={'gtol':1e-10, 'maxiter':2000}
        )

        if not result.success:
            print("Optimization failed:", result.message)
            return None

        beta_hat = result.x
        hessian_matrix = hess_func(jnp.array(beta_hat))
        try:
            vcov = np.linalg.inv(np.array(hessian_matrix))
        except np.linalg.LinAlgError:
            print("Hessian is singular. Cannot invert.")
            print("Hessian matrix:\n", hessian_matrix)
            return None

        std_errors = np.sqrt(np.diag(vcov))
        t_stats = beta_hat / std_errors
        df = self.n_obs - len(beta_hat)
        p_values = 2 * (1 - stats.t.cdf(np.abs(t_stats), df=df))
        ci_lower = beta_hat - stats.t.ppf(0.975, df=df)*std_errors
        ci_upper = beta_hat + stats.t.ppf(0.975, df=df)*std_errors

        param_names = ['asc_bike','asc_train','asc_metro','beta_time','beta_cost','gamma_public']
        results_df = pd.DataFrame({
            'Coefficient': param_names,
            'Estimate': beta_hat,
            'Std Error': std_errors,
            't Statistic': t_stats,
            'p Value': p_values,
            '95% CI Lower': ci_lower,
            '95% CI Upper': ci_upper
        })

        print("Estimated gamma_public:", beta_hat[5])

        return {
            'results': results_df,
            'log_likelihood': -result.fun,
            'converged': result.success,
            'beta_hat': beta_hat,
            'std_errors': std_errors
        }

model = NestedLogitModel(
    df=df,
    id_col='id',
    choice_col='choice',
    alt_col='alt',
    time_col='time',
    cost_col='cost'
)
results = model.fit_with_inference()

if results is not None:
    print("Estimated Results:")
    print(results['results'])

    # Compare to true parameters
    true_values = [true_params['asc_bike'],
                   true_params['asc_train'],
                   true_params['asc_metro'],
                   true_params['beta_time'],
                   true_params['beta_cost'],
                   true_params['gamma_public']]

    comp_df = results['results'].copy()
    comp_df['True Value'] = true_values

    print("\nComparison of Estimated vs True Parameters:")
    display(comp_df)
else:
    print("Estimation did not produce results.")


Choice Shares:
 alt
bike     0.28885
car      0.24235
metro    0.24900
train    0.21980
Name: choice, dtype: float64
Estimated gamma_public: 0.6782231460971575
Estimated Results:
    Coefficient  Estimate  Std Error  t Statistic  p Value  95% CI Lower  \
0      asc_bike  0.325111   0.023057    14.100554      0.0      0.279919   
1     asc_train  1.006498        NaN          NaN      NaN           NaN   
2     asc_metro  1.193502        NaN          NaN      NaN           NaN   
3     beta_time -0.049651   0.000903   -55.010261      0.0     -0.051420   
4     beta_cost -0.075177   0.002262   -33.235222      0.0     -0.079610   
5  gamma_public  0.678223   0.016482    41.149111      0.0      0.645917   

   95% CI Upper  
0      0.370304  
1           NaN  
2           NaN  
3     -0.047882  
4     -0.070743  
5      0.710529  

Comparison of Estimated vs True Parameters:


  std_errors = np.sqrt(np.diag(vcov))


Unnamed: 0,Coefficient,Estimate,Std Error,t Statistic,p Value,95% CI Lower,95% CI Upper,True Value
0,asc_bike,0.325111,0.023057,14.100554,0.0,0.279919,0.370304,0.3
1,asc_train,1.006498,,,,,,1.0
2,asc_metro,1.193502,,,,,,1.2
3,beta_time,-0.049651,0.000903,-55.010261,0.0,-0.05142,-0.047882,-0.05
4,beta_cost,-0.075177,0.002262,-33.235222,0.0,-0.07961,-0.070743,-0.08
5,gamma_public,0.678223,0.016482,41.149111,0.0,0.645917,0.710529,0.7
