Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training issue #11

Open
LEECHOONGHO opened this issue Sep 9, 2021 · 5 comments
Open

Training issue #11

LEECHOONGHO opened this issue Sep 9, 2021 · 5 comments

Comments

@LEECHOONGHO
Copy link

Thanks for sharing the nice model implementation.

image

When I start training, the following warning appears, do you also get the same message?
I think it's a fairseq installation problem.
No module named 'lightconv_cuda'

And I'm training in batch size 5.... on 24G memory sized RTX 3090. Could the above problem be the cause?

@keonlee9420
Copy link
Owner

Hi @LEECHOONGHO, thanks for your attention. Please refer to #5 for that. It should resolve your issue.

@LEECHOONGHO
Copy link
Author

Thanks for your help. I'll try it!

@v-nhandt21
Copy link

v-nhandt21 commented Sep 15, 2021

Thanks for your help. I'll try it!

I reach the same.

Is there any problem when I keep training with that problem: "No module named 'lightconv_cuda'" ?

If you have solved on fairseq, can you share a little about your config and environment, I have also tried #5 but too much error,

Anyway, Thank you too much @keonlee9420

One more thing just for discussion is that why the batch size of this model is too small! , the maximum I can set is 4, while in Tacotron2 is 64 :))

@LEECHOONGHO
Copy link
Author

Thanks for your help. I'll try it!

I reach the same.

Is there any problem when I keep training with that problem: "No module named 'lightconv_cuda'" ?

If you have solved on fairseq, can you share a little about your config and environment, I have also tried #5 but too much error,

Anyway, Thank you too much @keonlee9420

One more thing just for discussion is that why the batch size of this model is too small! , the maximum I can set is 4, while in Tacotron2 is 64 :))

No, I couldn't solved fairseq installing problem. Maybe it requires to reinstall cuda or version up it to 11.0

Instead, I use my own lightweight_conv module.
Insert the code below in Parallel-Tacotron2/model/blocks and removefrom fairseq.modules import LightweightConv in the same file.

Whether you do this or not, the program runs and you can only train with very low batch sizes.
And the loss stays around 70 and it doesn't seemed to be trained properly.

class LightweightConv(nn.Module):
    def __init__(
        self,
        num_channels,
        kernel_size,
        padding_l,
        weight_softmax,
        num_heads,
        weight_dropout,
        stride=1,
        dilation=1,
        bias=True,
                ):
        super(LightweightConv, self).__init__()
        
        self.channels = num_channels
        self.heads = num_heads
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding_l
        self.dilation = dilation
        self.dropout_p = weight_dropout
        self.bias = bias
        self.weight_softmax = weight_softmax
        
        self.weights = nn.Parameter(torch.Tensor(self.heads, 1, self.kernel_size), requires_grad=True)
        
        self.kernel_softmax = nn.Softmax(dim=-1)
        self.dropout = nn.Dropout(self.dropout_p)
        
        if self.bias:
            self.bias_weights = nn.Parameter(torch.randn(self.heads))

        self.reset_parameters()    
            
    def reset_parameters(self):
        nn.init.xavier_uniform_(self.weights)
        if self.bias_weights is not None:
            nn.init.constant_(self.bias_weights, 0.)
            
    def forward(self, x):

        x = x.permute(1, 2, 0)
        # x.shape = [batchsize, channel, width]
        batch_size, in_channel, width = x.shape
        
        if self.weight_softmax:
            weights = self.kernel_softmax(self.weights)
        else:
            weigths = self.weights
            
        weigths = self.dropout(weights)
        
        x = x.view(-1, self.heads, width)
        
        if self.bias:
            output = F.conv1d(x, weigths, stride=self.stride, padding=self.padding, dilation=self.dilation, groups=self.heads, bias=self.bias_weights)
        else:
            output = F.conv1d(x, weigths, stride=self.stride, padding=self.padding, dilation=self.dilation, groups=self.heads)
        
        output = output.view(batch_size, -1, width).permute(2, 0, 1)
        
        return output

@GuangChen2016
Copy link

@v-nhandt21 @LEECHOONGHO hi all, I also couldn't solved fairseq installing problem, is the above LightweightConv module from @LEECHOONGHO work normal as the LightweightConv in fairseq.modules import LightweightConv? Or is there any other differences between them? Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants