Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The training loss #53

Open
Vickeyhw opened this issue Nov 28, 2023 · 5 comments
Open

The training loss #53

Vickeyhw opened this issue Nov 28, 2023 · 5 comments

Comments

@Vickeyhw
Copy link

Thanks for your great work! When I run the code use:
python3 tools/train.py --cfg configs/imagenet/r34_r18/dot.yaml
The training loss is much larger than the kd method in the first few epochs, and the test acc is also low, is it normal?
1701193501070

@Vickeyhw Vickeyhw changed the title The training loss curve The training loss Nov 28, 2023
@Zzzzz1
Copy link
Collaborator

Zzzzz1 commented Nov 29, 2023

The loss scale is too large. Did you change the batch-size or num-gpus?

@Vickeyhw
Copy link
Author

@Zzzzz1 I use the original batch size 512 on 8 2080ti. After re-ran the code, I got the following results:
1701241395192
It seems still unstable and much worse than the vannila kd.

@JinYu1998
Copy link

@Vickeyhw How long does it take you to run an epoch please, I find it very strange that it takes me 100 minutes to run a 1/4 Epoch on 8*3090.

@Vickeyhw
Copy link
Author

@JinYu1998 23min/epoch.

@JinYu1998
Copy link

@JinYu1998 23min/epoch.

Thanks for your response, I think I've identified the problem. Since my data is not on SSD, the io issue is causing slow training...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants