Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got error when running on Windows 10. #91

Closed
zzsmg opened this issue Nov 22, 2018 · 7 comments
Closed

Got error when running on Windows 10. #91

zzsmg opened this issue Nov 22, 2018 · 7 comments

Comments

@zzsmg
Copy link

zzsmg commented Nov 22, 2018

Hi, I got some error when I ran the test process on Windows 10 system with python 3.6.5 + pytorch 0.4.1 + cuda9.2.
I'm using this line of the demo file:
python main.py --data_test Demo --scale 4 --pre_train download --test_only --save_results --n_threads 0

The error message is:

Making model...
Download the model

Evaluation:
Traceback (most recent call last):
File "main.py", line 23, in
while not t.terminate():
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\trainer.py", line 139, in terminate
self.test()
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\trainer.py", line 83, in test
if self.args.save_results: self.ckp.begin_background()
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\utility.py", line 141, in begin_background
for p in self.process: p.start()
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'checkpoint.begin_background..bg_target'

(C:\Users\Zz\Anaconda3) D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src>Making model...
Download the model

Evaluation:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "C:\Users\Zz\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\Zz\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\Zz\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\main.py", line 23, in
while not t.terminate():
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\trainer.py", line 139, in terminate
self.test()
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\trainer.py", line 83, in test
if self.args.save_results: self.ckp.begin_background()
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\utility.py", line 141, in begin_background
for p in self.process: p.start()
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Please give some suggestion to help me deal with it. Thanks.

@tabetomo
Copy link
Contributor

Hi

I got the same issue.

With the following instruction,
https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

I createda new function main() which wrap the whole code except for import part (see below) and called it. here. it is noted a variable name (model) needs to be changes to avoid error (UnboundLocalError: local variable 'model' referenced before assignment, i.e. we cannot use "import model" and a variable ""model").

def main():
    torch.manual_seed(args.seed)
    checkpoint = utility.checkpoint(args)

    if args.data_test == 'video':
        from videotester import VideoTester
        model2 = model.Model(args, checkpoint)
        t = VideoTester(args, model2, checkpoint)
        t.test()
    else:
        if checkpoint.ok:
            loader = data.Data(args)
            model2 = model.Model(args, checkpoint)
            loss = loss.Loss(args, checkpoint) if not args.test_only else None
            t = Trainer(args, loader, model2, loss, checkpoint)
            while not t.terminate():
                t.train()
                t.test()

            checkpoint.done()

if __name__ == '__main__':
    main()

The issue problem disappered in my windows 10 environment. Good.

However when running the example line I just got the following

$ ./demo.sh
Making model...
Download the model

Evaluation:
0it [00:00, ?it/s]
[Set5 x4]       PSNR: nan (Best: nan @epoch 1)
0it [00:00, ?it/s]
[Set14 x4]      PSNR: nan (Best: nan @epoch 1)
0it [00:00, ?it/s]
[B100 x4]       PSNR: nan (Best: nan @epoch 1)
0it [00:00, ?it/s]
[Urban100 x4]   PSNR: nan (Best: nan @epoch 1)
0it [00:00, ?it/s]
[DIV2K x4]      PSNR: nan (Best: nan @epoch 1)
Forward: 9.68s

Saving...
Total: 9.68s

when running

# Standard benchmarks (Ex. EDSR_baseline_x4)
python main.py --data_test Set5+Set14+B100+Urban100+DIV2K --data_range 801-900 --scale 4 --pre_train download --test_only --self_ensemble

in demo.sh

@tabetomo
Copy link
Contributor

The repored additional issue (cannot test even if fixing windows) was solved as I mentioned in #115.

i.e.
We need to

  1. download the benchmark data
    https://cv.snu.ac.kr/research/EDSR/benchmark.tar
  2. put a comand line option "--dir_data [path to the benchmark folder]"
    It is noted the folder name for --dir_data should be "xxx" if your benchmark folder is "xxx/benchmark"
    , because the software defined in data/benchmark.py adds the "benchmark"

@tabetomo
Copy link
Contributor

tabetomo commented Feb 18, 2019

My merge requst fixed the following error. It works fine if we don't specify --save_results.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

However, if I included --save_results, we had the following error even if with the patch.

$ ./demo.sh
Making model...
Download the model

Evaluation:
Traceback (most recent call last):
  File "main.py", line 33, in <module>
    main()
  File "main.py", line 26, in main
    while not t.terminate():
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 139, in terminate
    self.test()
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 83, in test
    if self.args.save_results: self.ckp.begin_background()
  File "C:\home\EDSR-PyTorch\src\utility.py", line 141, in begin_background
    for p in self.process: p.start()
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'checkpoint.begin_background.<locals>.bg_target'

tabetomo@DESKTOP /cygdrive/c/home/EDSR-PyTorch/src
$ Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

When I commented out self.ckp.begin_background() and self.ckp.end_background() as mentioned in #105 and set --save_results --n_threads 1, I got


$ ./demo.sh
Making model...
Download the model

Evaluation:
  0%|                                                     | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 33, in <module>
    main()
  File "main.py", line 26, in main
    while not t.terminate():
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 139, in terminate
    self.test()
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 100, in test
    self.ckp.save_results(d, filename[0], save_list, scale)
  File "C:\home\EDSR-PyTorch\src\utility.py", line 159, in save_results
    self.queue.put(('{}{}.png'.format(filename, p), tensor_cpu))
AttributeError: 'checkpoint' object has no attribute 'queue'

It seems not suprising because if we commented out self.ckp.begin_background() , we don't create queue in the first place.

If I set --n_threads 0 in stead of --n_threads 1 in addition to the above changes, I got

$ ./demo.sh
Making model...
Download the model

Evaluation:
  0%|                                                     | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 33, in <module>
    main()
  File "main.py", line 26, in main
    while not t.terminate():
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 139, in terminate
    self.test()
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 87, in test
    for lr, hr, filename, _ in tqdm(d, ncols=80):
ValueError: not enough values to unpack (expected 4, got 3)

In summary, I think my merge request is valid, but it seems more modifications would be necesary if we want to save results. So it works if we don't specift --save_results

$ ./demo.sh
Making model...
Download the model

Evaluation:
100%|#############################################| 5/5 [00:02<00:00,  1.13s/it]
[Set5 x4]       PSNR: 32.288 (Best: 32.288 @epoch 1)
100%|###########################################| 14/14 [00:04<00:00,  3.79it/s]
[Set14 x4]      PSNR: 28.670 (Best: 28.670 @epoch 1)
100%|#########################################| 100/100 [00:20<00:00,  7.02it/s]
[B100 x4]       PSNR: 27.629 (Best: 27.629 @epoch 1)
100%|#########################################| 100/100 [01:14<00:00,  1.39it/s]
[Urban100 x4]   PSNR: 26.188 (Best: 26.188 @epoch 1)
0it [00:00, ?it/s]
[DIV2K x4]      PSNR: nan (Best: nan @epoch 1)
Forward: 104.79s

Saving...
Total: 104.79s

@fafancier
Copy link

fafancier commented Apr 24, 2019

I also can't save images when using --save_results which could call a multiprocess module in win10.
So I copy the image saving codes and try to don't specify --save_results. Then I can save it slowly.
Here is my modification(just in the tainer.py and the function test( ) ):

def test(self):
        torch.set_grad_enabled(False)

        epoch = self.optimizer.get_last_epoch() + 1
        self.ckp.write_log('\nEvaluation:')
        self.ckp.add_log(
            torch.zeros(1, len(self.loader_test), len(self.scale))
        )
        self.model.eval()

        timer_test = utility.timer()
        if self.args.save_results: self.ckp.begin_background()
        for idx_data, d in enumerate(self.loader_test):
            for idx_scale, scale in enumerate(self.scale):
                d.dataset.set_scale(idx_scale)
                for lr, hr, filename, _ in tqdm(d, ncols=80):
                    lr, hr = self.prepare(lr, hr)
                    sr = self.model(lr, idx_scale)
                    sr = utility.quantize(sr, self.args.rgb_range)

                    save_list = [sr]
                    self.ckp.log[-1, idx_data, idx_scale] += utility.calc_psnr(
                        sr, hr, scale, self.args.rgb_range, dataset=d
                    )
                    if self.args.save_gt:
                        save_list.extend([lr, hr])

                    if self.args.save_results:
                        self.ckp.save_results(d, filename[0], save_list, scale)

                #add by wfli
                    postfix = ('SR', 'LR', 'HR')
                    for v, p in zip(save_list, postfix):
                        normalized = v[0].mul(255 / self.args.rgb_range)
                        tensor_cpu = normalized.byte().permute(1, 2, 0).cpu()
                        imageio.imwrite(('..\\experiment\\test\\results-{}\\{}_x{}_{}.png'.format(d.dataset.name,filename[0],scale, p)), tensor_cpu.numpy())

                #add by wfli


                self.ckp.log[-1, idx_data, idx_scale] /= len(d)
                best = self.ckp.log.max(0)
                self.ckp.write_log(
                    '[{} x{}]\tPSNR: {:.3f} (Best: {:.3f} @epoch {})'.format(
                        d.dataset.name,
                        scale,
                        self.ckp.log[-1, idx_data, idx_scale],
                        best[0][idx_data, idx_scale],
                        best[1][idx_data, idx_scale] + 1
                    )
                )

        self.ckp.write_log('Forward: {:.2f}s\n'.format(timer_test.toc()))
        self.ckp.write_log('Saving...')]

@sipie800
Copy link

same issue in win10 python 3.7.8 torch 1.2,
By the way,using multiprocess functionity seem not to be an good choice.Testing images will eat up the graphic memory very soon.And the implement of multiprocessing in python is not so decent.
Perhaps going to cuda deeper is optional.

@rezraz1
Copy link

rezraz1 commented Jun 12, 2022

Hi, I tried to run
python main.py --data_test Demo --scale 4 --pre_train download --test_only --save_results
I had a few errors that I was able to fix with the help of [https://github.com//issues/105] and [https://github.com//issues/91] but now I have this problem. What could be the reason for this?

These are the works I have done so far to fix the errors :

error 1: AttributeError: Can't pickle local object 'checkpoint.begin_background..bg_target'
and
E0FError: Ran out of input
Resolved with : #105 (comment) :
It seems there are some conflicts between multiprocessing and your system. Remove this and these lines to disable multiprocessing.

error 2 : AttributeError: 'checkpoint' object has no attribute 'queue'
Resolved with : removing --save_results

error 3 : [WinError 1455] The paging file is too small for this operation to complete.
Resolved with : It is clear that --n_threads 0 is used to fix this error

also my system settings :

windows 8.1
python 3.8.5 base conda
pytorch 1.8.2+cuda10.2

The result i get now is this :

 Microsoft Windows [Version 6.3.9600]
(c) 2013 Microsoft Corporation. All rights reserved.

E:\EDSR-PyTorch-master\EDSR-PyTorch-master\src>python ma
in.py --data_test Demo --scale 4 --pre_train download --test_only --n_threads 0
Making model...
Download the model

Evaluation:
  0%|                                                     | 0/1 [00:00<?, ?it/s]
100%|█████████████████████████████████████████████| 1/1 [00:02<00:00,  2.01s/it]
100%|█████████████████████████████████████████████| 1/1 [00:02<00:00,  2.01s/it]

[Demo x4]       PSNR: 0.000 (Best: 0.000 @epoch 1)
Forward: 2.01s

Saving...
Total: 2.01s

Question :

Why the psnr is always zero even if I put a few images in the file or it would be nan if i use benchmark data?
What is the problem and what should I do to fix it?

Thank you for your help

@renxiaosa00
Copy link

Why the psnr is always zero even if I put a few images in the file or it would be nan if i use benchmark data?
What is the problem and what should I do to fix it?

I have same question,Why the psnr is always zero even? train:--model EDSR --scale 2 --patch_size 96 --save edsr_baseline_x2 --reset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants