# Coefficient Initialization

> Tianyu Du
> 
> Added since version `1.0.4`

[From ChatGPT] Coefficient initialization is an essential component of model estimation, especially in the context of machine learning and deep learning. The choice of initial coefficients can dramatically impact the efficiency, speed, and even the ultimate success of model training. Poor initialization can lead to slow convergence during the optimization process or result in the model getting stuck in suboptimal local minima, particularly in models with non-convex loss landscapes such as neural networks. Additionally, it can exacerbate the problem of vanishing or exploding gradients, inhibiting the backpropagation process. Conversely, thoughtful and strategic initialization, like Xavier or He initialization, can lead to faster convergence, better generalization performance, and more robust models. Thus, the way coefficients are initialized can significantly influence the effectiveness and reliability of machine learning models.

In [None]:
import torch
import torch_choice
import matplotlib.pyplot as plt

# Conditional Logit Models

## By default, coefficients are initialized following a standard Gaussian distribution.

Here we create a "big" model of thousands of parameters to illustrate the distribution of coefficients.

In [None]:
model = torch_choice.model.ConditionalLogitModel(
    coef_variation_dict={'var_1': 'constant', 'var_2': 'item', 'var_3': 'item-full', 'var_4': 'user'},
    num_param_dict={'var_1': 300, 'var_2': 500, 'var_3': 700, 'var_4': 900},
    num_items=4,
    num_users=10)

In [None]:
def plot_model_initial_coefficients(model_to_plot: torch.nn.Module) -> None:
    fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(20, 4), dpi=150)

    for i, (coef_name, coef_value) in enumerate(model_to_plot.state_dict().items()):
        arr = coef_value.view(-1,).to("cpu").numpy()
        axes[i].hist(arr, bins=40)
        axes[i].set_title(f"{coef_name} (K={len(arr)})")

In [None]:
plot_model_initial_coefficients(model)

## Alternatively, you can initialize to uniform or zeros using the `weight_initialization` argument.

In [None]:
model = torch_choice.model.ConditionalLogitModel(
    coef_variation_dict={'var_1': 'constant', 'var_2': 'item', 'var_3': 'item-full', 'var_4': 'user'},
    num_param_dict={'var_1': 300, 'var_2': 500, 'var_3': 700, 'var_4': 900},
    num_items=4,
    num_users=10,
    weight_initialization="uniform")

plot_model_initial_coefficients(model)

In [None]:
model = torch_choice.model.ConditionalLogitModel(
    coef_variation_dict={'var_1': 'constant', 'var_2': 'item', 'var_3': 'item-full', 'var_4': 'user'},
    num_param_dict={'var_1': 300, 'var_2': 500, 'var_3': 700, 'var_4': 900},
    num_items=4,
    num_users=10,
    weight_initialization="zero")

plot_model_initial_coefficients(model)

In [None]:
model = torch_choice.model.ConditionalLogitModel(
    coef_variation_dict={'var_1': 'constant', 'var_2': 'item', 'var_3': 'item-full', 'var_4': 'user'},
    num_param_dict={'var_1': 300, 'var_2': 500, 'var_3': 700, 'var_4': 900},
    num_items=4,
    num_users=10,
    weight_initialization="normal")

plot_model_initial_coefficients(model)

## You can initialize different sets of coefficients differently by passing a dictionary to `weight_initialization`. For coefficients not in `weight_initialization`, they are initialized as a standard normal distribution (the default).

In [None]:
model = torch_choice.model.ConditionalLogitModel(
    coef_variation_dict={'var_1': 'constant', 'var_2': 'item', 'var_3': 'item-full', 'var_4': 'user'},
    num_param_dict={'var_1': 300, 'var_2': 500, 'var_3': 700, 'var_4': 900},
    num_items=4,
    num_users=10,
    weight_initialization={'var_1': 'uniform',
                           'var_2': 'normal',
                           'var_3': 'zero'})  # <-- 'var_4' is missing, and it's initialized using Gaussian.

plot_model_initial_coefficients(model)

## You can inspect the method of initialization in the string representation of model coefficients (e.g., `initialization=normal`).

In [None]:
model

# Nested Logit Model

Initializing nested logit models is very similar to initializing conditional logit models. The only difference is you need to pass-in two arguments: `nest_weight_initialization` and `item_weight_initialization`. By default, every coefficient is initialized to a standard Gaussian distribution. The coefficient for inclusive values $\lambda$ has its own way of initialization and cannot be modified.

In [None]:
model = torch_choice.model.NestedLogitModel(
    nest_to_item={1: [0, 1, 2], 2: [3, 4], 3: [5, 6, 7]},
    #
    nest_coef_variation_dict={'var_1': 'constant', 'var_2': 'item'},
    nest_num_param_dict={'var_1': 300, 'var_2': 500},
    #
    item_coef_variation_dict={'var_3': 'item-full', 'var_4': 'user'},
    item_num_param_dict={'var_3': 700, 'var_4': 900},
    num_users=100,
    # 
    nest_weight_initialization={'var_1': 'uniform', 'var_2': 'zero'},
    item_weight_initialization={'var_4': 'uniform'}   # <-- var_3 is missing, it is initialized to Gaussian by default.
)

In [None]:
def plot_model_initial_coefficients(model_to_plot: torch.nn.Module) -> None:
    fig, axes = plt.subplots(nrows=1, ncols=5, figsize=(25, 4), dpi=150)

    for i, (coef_name, coef_value) in enumerate(model_to_plot.state_dict().items()):
        arr = coef_value.view(-1,).to("cpu").numpy()
        axes[i].hist(arr, bins=40)
        axes[i].set_title(f"{coef_name} (K={len(arr)})")

In [None]:
plot_model_initial_coefficients(model)