Questions about the implementation of deepnorm #16

jiaohuix · 2023-02-22T06:44:45Z

I have a doubt about deepnorm. In the paper, deepnorm_init function use xavier_normal_(x, gain=beta) for "ffn" "v_proj" "out_proj".

However, in the source code of torhscale use xavier_normal_(x, gain=1)/ beta:

`

        for name, p in self.named_parameters():
            if (
                "fc1" in name
                or "fc2" in name
                or "out_proj" in name
                or "v_proj" in name
            ):
                p.data.mul_(init_scale)

`
Although i know that X ~ N(0,std^2), aX ~ N(0,(a*std)^2), I plot the distribution of both methods using a histogram，the results show some differences between the two methods:

`

import torch
import matplotlib.pyplot as plt
from torch.nn.init import xavier_normal_
torch.manual_seed(1)

init_scale = 0.343
linear1 = torch.nn.Linear(4096, 512)  # 1  xavier_norm_(x, gain=beta)
linear2 = torch.nn.Linear(4096, 512) # 2 xavier_norm_(x, gain=1) / beta
xavier_normal_(linear1.weight,gain=init_scale)
xavier_normal_(linear2.weight,gain=1)

linear1_weight = linear1.weight.detach().numpy().reshape((-1, ))
linear2_weight = linear2.weight.detach().numpy().reshape((-1, )) / init_scale
plt.figure(figsize=(10, 6))
temp = plt.hist([linear1_weight, linear2_weight], bins=100, rwidth=0.8, histtype="step")
plt.xlabel("value")
plt.ylabel("count")
plt.legend({"1 xavier_norm_(x, gain=beta)", "2 xavier_norm_(x, gain=1)/beta"})

plt.show()

`

Is my implementation wrong? Which method should I use? I hope someone can enlighten me, thank you！！！

The text was updated successfully, but these errors were encountered:

shumingma · 2023-02-27T05:00:23Z

Hi @MiuGod0126

$\beta$ is a multiplier, so it should be:

linear2_weight = linear2.weight.detach().numpy().reshape((-1, )) * init_scale

instead of

linear2_weight = linear2.weight.detach().numpy().reshape((-1, )) / init_scale

jiaohuix · 2023-02-27T05:21:37Z

@shumingma Ooooh! Sorry, I was careless to see mul as division, thank you for your correction!!! I understand deeper on deepnorm_init, and the corrected distribution is as follows:

shumingma closed this as completed Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the implementation of deepnorm #16

Questions about the implementation of deepnorm #16

jiaohuix commented Feb 22, 2023 •

edited

shumingma commented Feb 27, 2023

jiaohuix commented Feb 27, 2023

Questions about the implementation of deepnorm #16

Questions about the implementation of deepnorm #16

Comments

jiaohuix commented Feb 22, 2023 • edited

shumingma commented Feb 27, 2023

jiaohuix commented Feb 27, 2023

jiaohuix commented Feb 22, 2023 •

edited