Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] DAN loss may be below 0 #54

Closed
xianyuanliu opened this issue Feb 4, 2021 · 2 comments
Closed

[Bug] DAN loss may be below 0 #54

xianyuanliu opened this issue Feb 4, 2021 · 2 comments
Assignees
Labels
bug Something isn't working question Further information is requested
Projects

Comments

@xianyuanliu
Copy link
Member

馃悰 Bug

When I run DAN on digits_dann_lightn and action_dann_lightn, MMD loss T_mmd has some values below 0. It will cause T_total_loss below 0 because T_total_loss = T_task_loss + 1 * T_mmd. Is it correct?

To reproduce

Steps to reproduce the behavior:

In digits_dann_lightn,

  1. After merging Add DAN to digits_dann_lightn聽#53, set fast_dev_run=False and logger=True in main.py.
  2. Run python main.py --cfg ./configs/MN2UP-DAN.yaml --gpus 1.
  3. Check the loss by printing them or tensorboard.

** Stack trace/error message **
This is my output with repeats=10, epoch=100, init_epoch=20.
The T_mmd varies, so does T_total_loss. I think the loss should be above 0.

image

Expected Behaviour

The loss should be above 0 like CDAN.
image

There are some useful links.
ADA code
Xlearn code
I checked these codes and ours is almost similar to them. Thus, I am not sure whether this loss output is right.

Environment

[pip3] numpy==1.19.2
[pip3] pytorch-lightning==1.0.3
[pip3] pytorch-memlab==0.2.2
[pip3] torch==1.7.0
[pip3] torchaudio==0.7.0
[pip3] torchsummary==1.5.1
[pip3] torchvision==0.8.1
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.2.89              h74a9793_1
[conda] mkl                       2020.2                      256
[conda] mkl-service               2.3.0            py38hb782905_0
[conda] mkl_fft                   1.2.0            py38h45dec08_0
[conda] mkl_random                1.1.1            py38h47e9c7a_0
[conda] numpy                     1.19.2           py38hadc3359_0
[conda] numpy-base                1.19.2           py38ha3acd2a_0
[conda] pytorch                   1.7.0           py3.8_cuda102_cudnn7_0    pytorch
[conda] pytorch-lightning         1.0.2                    pypi_0    pypi
[conda] pytorch-memlab            0.2.2                    pypi_0    pypi
[conda] torchaudio                0.7.0                      py38    pytorch
[conda] torchsummary              1.5.1                    pypi_0    pypi
[conda] torchvision               0.8.1                py38_cu102    pytorch
@xianyuanliu xianyuanliu added bug Something isn't working question Further information is requested labels Feb 4, 2021
@haipinglu
Copy link
Member

@sz144 #53 has been merged into the master. Could you please take a look at this issue on MMD? Thanks.

@haipinglu haipinglu added this to To do (sorted by urgency) in v0.1.0 via automation Feb 8, 2021
@haipinglu haipinglu moved this from To do (sorted by urgency) to In progress (next 2 weeks) in v0.1.0 Mar 25, 2021
@xianyuanliu
Copy link
Member Author

In this DAN example, we use an unbiased estimation of MMD with linear complexity following the original paper. It may be the reason.

Refer to Xlearn, the complete version should be

def DAN(source, target, kernel_mul=2.0, kernel_num=5, fix_sigma=None):
    batch_size = int(source.size()[0])
    kernels = guassian_kernel(source, target,
        kernel_mul=kernel_mul, kernel_num=kernel_num, fix_sigma=fix_sigma)

    loss1 = 0
    for s1 in range(batch_size):
        for s2 in range(s1+1, batch_size):
            t1, t2 = s1+batch_size, s2+batch_size
            loss1 += kernels[s1, s2] + kernels[t1, t2]
    loss1 = loss1 / float(batch_size * (batch_size - 1) / 2)

    loss2 = 0
    for s1 in range(batch_size):
        for s2 in range(batch_size):
            t1, t2 = s1+batch_size, s2+batch_size
            loss2 -= kernels[s1, t2] + kernels[s2, t1]
    loss2 = loss2 / float(batch_size * batch_size)
    return loss1 + loss2

v0.1.0 automation moved this from In progress to Done Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
No open projects
v0.1.0
  
Done
Development

No branches or pull requests

3 participants