Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training the problem #6

Closed
SHOUshou0426 opened this issue Oct 24, 2022 · 13 comments
Closed

Training the problem #6

SHOUshou0426 opened this issue Oct 24, 2022 · 13 comments

Comments

@SHOUshou0426
Copy link

Hello, there is a cutoff in the training data 999 epoch Evaluating k-NN accuracy. appear error:ValueError: range() arg 3 must not be zero but my train afhq datasets likewise error :ValueError: range() arg 3 must not be zero

**Traceback (most recent call last):
File "train.py", line 258, in
File "train.py", line 254, in main
File "train.py", line 190, in training_loop
File "C:\Users\yuanx.conda\envs\style2\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\yuanx\Desktop\style\style-aware-discriminator\metrics\knn_evaluator.py", line 69, in evaluate
top1, top5 = knn_classifier(
File "C:\Users\yuanx.conda\envs\style2\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, kwargs)
File "C:\Users\yuanx\Desktop\style\style-aware-discriminator\metrics\knn_evaluator.py", line 106, in knn_classifier
for idx in range(0, num_test_images, imgs_per_chunk):
ValueError: range() arg 3 must not be zero

this is my print
num_test_images, num_chunks = test_labels.shape[0], 100
num_test_images = 32;

imgs_per_chunk = num_test_images // num_chunks
imgs_per_chunk = 0

environment:torch=1.11.0+cu113 cuda=11.3

@SHOUshou0426
Copy link
Author

I use the command like this:
python train.py --mod-type adain --total-nimg 1.6M --batch-size 4 --load-size 320 --crop-size 256 --image-size 256 --train-dataset datasets/l2l_cloth/train --eval-dataset datasets/l2l_cloth/val --out-dir runs --extra-desc some descriptions

@kunheek
Copy link
Owner

kunheek commented Oct 24, 2022

Hi, can you tell me how many images you are using for your test image?
I guess it happens when number of your validation set is less than 100.

@SHOUshou0426
Copy link
Author

train is 130 images val is 32 images but my use AFHQ dataset appear error ValueError: range() arg 3 must not be zero

@SHOUshou0426
Copy link
Author

train AFHQ dataset is 19999 epoch appear error

@kunheek
Copy link
Owner

kunheek commented Oct 24, 2022

I see. Can you share the command used for AFHQ dataset? I will reproduce it myself.
Until the problem is fixed, you can train your model without evaluation by adding --evaluation false to the command. You can evaluate if after training using saved checkpoints.

By the way, due to the use of SwAV, I recommend to use batch size larger than 4 (16 will be enough). Also, 130 images may not be enough if you are training a model from scratch.

@SHOUshou0426
Copy link
Author

I use AFHQ dataset order is
python train.py --mod-type adain --total-nimg 1.6M --batch-size 16 --load-size 320 --crop-size 256 --image-size 256 --train-dataset datasets/afhq/train --eval-dataset datasets/afhq/val --out-dir runs --extra-desc some descriptions

your use metrics order not appear error ?
my attempt readme order is
python -m metrics fid reconstruction --seed 123 --checkpoint ./checkpoints/afhq-stylegan2-5M.pt --train-dataset ./datasets/afhq/train --eval-dataset ./datasets/afhq/val

is error

*C:\Users\yuanx.conda\envs\style2\lib\site-packages\torch\utils\cpp_extension.py:322: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
信息: 用提供的模式无法找到文件。
Traceback (most recent call last):
File "C:\Users\yuanx.conda\envs\style2\lib\runpy.py", line 192, in _run_module_as_main
return run_code(code, main_globals, None,
File "C:\Users\yuanx.conda\envs\style2\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\yuanx\Desktop\style\style-aware-discriminator\metrics_main
.py", line 88, in
main()
File "C:\Users\yuanx\Desktop\style\style-aware-discriminator\metrics_main
.py", line 70, in main
model = StyleAwareDiscriminator(opts)
File "C:\Users\yuanx\Desktop\style\style-aware-discriminator\mylib\base_model.py", line 23, in init
self._create_networks()
File "C:\Users\yuanx\Desktop\style\style-aware-discriminator\model\model.py", line 72, in create_networks
self.G = Generator(
File "C:\Users\yuanx\Desktop\style\style-aware-discriminator\model\networks\generator.py", line 42, in init
from .stylegan2_layers import EncodeBlock, StyleBlock
File "C:\Users\yuanx\Desktop\style\style-aware-discriminator\model\networks\stylegan2_layers.py", line 5, in
import model.networks.stylegan2_op as ops
File "C:\Users\yuanx\Desktop\style\style-aware-discriminator\model\networks\stylegan2_op_init
.py", line 1, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "C:\Users\yuanx\Desktop\style\style-aware-discriminator\model\networks\stylegan2_op\fused_act.py", line 10, in
fused = load(
File "C:\Users\yuanx.conda\envs\style2\lib\site-packages\torch\utils\cpp_extension.py", line 1144, in load
return _jit_compile(
File "C:\Users\yuanx.conda\envs\style2\lib\site-packages\torch\utils\cpp_extension.py", line 1357, in _jit_compile
_write_ninja_file_and_build_library(
File "C:\Users\yuanx.conda\envs\style2\lib\site-packages\torch\utils\cpp_extension.py", line 1456, in _write_ninja_file_and_build_library
_write_ninja_file_to_build_library(
File "C:\Users\yuanx.conda\envs\style2\lib\site-packages\torch\utils\cpp_extension.py", line 1898, in _write_ninja_file_to_build_library
_write_ninja_file(
File "C:\Users\yuanx.conda\envs\style2\lib\site-packages\torch\utils\cpp_extension.py", line 2023, in _write_ninja_file
cl_paths = subprocess.check_output(['where',
File "C:\Users\yuanx.conda\envs\style2\lib\subprocess.py", line 411, in check_output
return run(popenargs, stdout=PIPE, timeout=timeout, check=True,
File "C:\Users\yuanx.conda\envs\style2\lib\subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

@kunheek
Copy link
Owner

kunheek commented Oct 24, 2022

Note that custom CUDA kernel only works on Linux. It seems that you are using Windows.

@SHOUshou0426
Copy link
Author

Does the command need to be modified

@SHOUshou0426
Copy link
Author

The equipment is not enough, the use of less than 16batchsize will affect the effect

@kunheek
Copy link
Owner

kunheek commented Oct 24, 2022

You can use --mod-type=adain, but I cannot guarantee that it will work as I have never tested the code on Windows. I recommend you to run the code on Linux (you can use WSL if you are familiar with it).

In general, the larger the batch size, the better. I haven't tested the code with batch size smaller than 16, so I can't tell you the results of smaller batch sizes.

@SHOUshou0426
Copy link
Author

train is --mod_type=adain no problem,The metrics use readme command is faulty

@kunheek
Copy link
Owner

kunheek commented Oct 24, 2022

This is not because there is a problem, but because the '--mod-type' is automatically set according to the checkpoint used. Checkpoint 'afhq-stylegan2-5M.pt ' is the model trained using --mod-type=stylegan2.

@SHOUshou0426
Copy link
Author

thank you for the response,Try to WSL

@kunheek kunheek closed this as completed Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants