Got error when running on Windows 10. #91

zzsmg · 2018-11-22T14:29:52Z

Hi, I got some error when I ran the test process on Windows 10 system with python 3.6.5 + pytorch 0.4.1 + cuda9.2.
I'm using this line of the demo file:
python main.py --data_test Demo --scale 4 --pre_train download --test_only --save_results --n_threads 0

The error message is:

Making model...
Download the model

Evaluation:
Traceback (most recent call last):
File "main.py", line 23, in
while not t.terminate():
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\trainer.py", line 139, in terminate
self.test()
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\trainer.py", line 83, in test
if self.args.save_results: self.ckp.begin_background()
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\utility.py", line 141, in begin_background
for p in self.process: p.start()
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'checkpoint.begin_background..bg_target'

(C:\Users\Zz\Anaconda3) D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src>Making model...
Download the model

Evaluation:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "C:\Users\Zz\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\Zz\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\Zz\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\main.py", line 23, in
while not t.terminate():
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\trainer.py", line 139, in terminate
self.test()
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\trainer.py", line 83, in test
if self.args.save_results: self.ckp.begin_background()
File "D:\EDSR-PyTorch-master\EDSR-PyTorch-master\src\utility.py", line 141, in begin_background
for p in self.process: p.start()
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Users\Zz\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Please give some suggestion to help me deal with it. Thanks.

The text was updated successfully, but these errors were encountered:

tabetomo · 2019-02-17T13:02:38Z

Hi

I got the same issue.

With the following instruction,
https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

I createda new function main() which wrap the whole code except for import part (see below) and called it. here. it is noted a variable name (model) needs to be changes to avoid error (UnboundLocalError: local variable 'model' referenced before assignment, i.e. we cannot use "import model" and a variable ""model").

def main():
    torch.manual_seed(args.seed)
    checkpoint = utility.checkpoint(args)

    if args.data_test == 'video':
        from videotester import VideoTester
        model2 = model.Model(args, checkpoint)
        t = VideoTester(args, model2, checkpoint)
        t.test()
    else:
        if checkpoint.ok:
            loader = data.Data(args)
            model2 = model.Model(args, checkpoint)
            loss = loss.Loss(args, checkpoint) if not args.test_only else None
            t = Trainer(args, loader, model2, loss, checkpoint)
            while not t.terminate():
                t.train()
                t.test()

            checkpoint.done()

if __name__ == '__main__':
    main()

The issue problem disappered in my windows 10 environment. Good.

However when running the example line I just got the following

$ ./demo.sh
Making model...
Download the model

Evaluation:
0it [00:00, ?it/s]
[Set5 x4]       PSNR: nan (Best: nan @epoch 1)
0it [00:00, ?it/s]
[Set14 x4]      PSNR: nan (Best: nan @epoch 1)
0it [00:00, ?it/s]
[B100 x4]       PSNR: nan (Best: nan @epoch 1)
0it [00:00, ?it/s]
[Urban100 x4]   PSNR: nan (Best: nan @epoch 1)
0it [00:00, ?it/s]
[DIV2K x4]      PSNR: nan (Best: nan @epoch 1)
Forward: 9.68s

Saving...
Total: 9.68s

when running

# Standard benchmarks (Ex. EDSR_baseline_x4)
python main.py --data_test Set5+Set14+B100+Urban100+DIV2K --data_range 801-900 --scale 4 --pre_train download --test_only --self_ensemble

in demo.sh

tabetomo · 2019-02-18T12:49:25Z

The repored additional issue (cannot test even if fixing windows) was solved as I mentioned in #115.

i.e.
We need to

download the benchmark data
https://cv.snu.ac.kr/research/EDSR/benchmark.tar
put a comand line option "--dir_data [path to the benchmark folder]"
It is noted the folder name for --dir_data should be "xxx" if your benchmark folder is "xxx/benchmark"
, because the software defined in data/benchmark.py adds the "benchmark"

tabetomo · 2019-02-18T13:58:29Z

My merge requst fixed the following error. It works fine if we don't specify --save_results.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

However, if I included --save_results, we had the following error even if with the patch.

$ ./demo.sh
Making model...
Download the model

Evaluation:
Traceback (most recent call last):
  File "main.py", line 33, in <module>
    main()
  File "main.py", line 26, in main
    while not t.terminate():
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 139, in terminate
    self.test()
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 83, in test
    if self.args.save_results: self.ckp.begin_background()
  File "C:\home\EDSR-PyTorch\src\utility.py", line 141, in begin_background
    for p in self.process: p.start()
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'checkpoint.begin_background.<locals>.bg_target'

tabetomo@DESKTOP /cygdrive/c/home/EDSR-PyTorch/src
$ Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\tabetomo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

When I commented out self.ckp.begin_background() and self.ckp.end_background() as mentioned in #105 and set --save_results --n_threads 1, I got


$ ./demo.sh
Making model...
Download the model

Evaluation:
  0%|                                                     | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 33, in <module>
    main()
  File "main.py", line 26, in main
    while not t.terminate():
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 139, in terminate
    self.test()
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 100, in test
    self.ckp.save_results(d, filename[0], save_list, scale)
  File "C:\home\EDSR-PyTorch\src\utility.py", line 159, in save_results
    self.queue.put(('{}{}.png'.format(filename, p), tensor_cpu))
AttributeError: 'checkpoint' object has no attribute 'queue'

It seems not suprising because if we commented out self.ckp.begin_background() , we don't create queue in the first place.

If I set --n_threads 0 in stead of --n_threads 1 in addition to the above changes, I got

$ ./demo.sh
Making model...
Download the model

Evaluation:
  0%|                                                     | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 33, in <module>
    main()
  File "main.py", line 26, in main
    while not t.terminate():
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 139, in terminate
    self.test()
  File "C:\home\EDSR-PyTorch\src\trainer.py", line 87, in test
    for lr, hr, filename, _ in tqdm(d, ncols=80):
ValueError: not enough values to unpack (expected 4, got 3)

In summary, I think my merge request is valid, but it seems more modifications would be necesary if we want to save results. So it works if we don't specift --save_results

$ ./demo.sh
Making model...
Download the model

Evaluation:
100%|#############################################| 5/5 [00:02<00:00,  1.13s/it]
[Set5 x4]       PSNR: 32.288 (Best: 32.288 @epoch 1)
100%|###########################################| 14/14 [00:04<00:00,  3.79it/s]
[Set14 x4]      PSNR: 28.670 (Best: 28.670 @epoch 1)
100%|#########################################| 100/100 [00:20<00:00,  7.02it/s]
[B100 x4]       PSNR: 27.629 (Best: 27.629 @epoch 1)
100%|#########################################| 100/100 [01:14<00:00,  1.39it/s]
[Urban100 x4]   PSNR: 26.188 (Best: 26.188 @epoch 1)
0it [00:00, ?it/s]
[DIV2K x4]      PSNR: nan (Best: nan @epoch 1)
Forward: 104.79s

Saving...
Total: 104.79s

fafancier · 2019-04-24T06:24:06Z

I also can't save images when using --save_results which could call a multiprocess module in win10.
So I copy the image saving codes and try to don't specify --save_results. Then I can save it slowly.
Here is my modification(just in the tainer.py and the function test( ) ):

def test(self):
        torch.set_grad_enabled(False)

        epoch = self.optimizer.get_last_epoch() + 1
        self.ckp.write_log('\nEvaluation:')
        self.ckp.add_log(
            torch.zeros(1, len(self.loader_test), len(self.scale))
        )
        self.model.eval()

        timer_test = utility.timer()
        if self.args.save_results: self.ckp.begin_background()
        for idx_data, d in enumerate(self.loader_test):
            for idx_scale, scale in enumerate(self.scale):
                d.dataset.set_scale(idx_scale)
                for lr, hr, filename, _ in tqdm(d, ncols=80):
                    lr, hr = self.prepare(lr, hr)
                    sr = self.model(lr, idx_scale)
                    sr = utility.quantize(sr, self.args.rgb_range)

                    save_list = [sr]
                    self.ckp.log[-1, idx_data, idx_scale] += utility.calc_psnr(
                        sr, hr, scale, self.args.rgb_range, dataset=d
                    )
                    if self.args.save_gt:
                        save_list.extend([lr, hr])

                    if self.args.save_results:
                        self.ckp.save_results(d, filename[0], save_list, scale)

                #add by wfli
                    postfix = ('SR', 'LR', 'HR')
                    for v, p in zip(save_list, postfix):
                        normalized = v[0].mul(255 / self.args.rgb_range)
                        tensor_cpu = normalized.byte().permute(1, 2, 0).cpu()
                        imageio.imwrite(('..\\experiment\\test\\results-{}\\{}_x{}_{}.png'.format(d.dataset.name,filename[0],scale, p)), tensor_cpu.numpy())

                #add by wfli


                self.ckp.log[-1, idx_data, idx_scale] /= len(d)
                best = self.ckp.log.max(0)
                self.ckp.write_log(
                    '[{} x{}]\tPSNR: {:.3f} (Best: {:.3f} @epoch {})'.format(
                        d.dataset.name,
                        scale,
                        self.ckp.log[-1, idx_data, idx_scale],
                        best[0][idx_data, idx_scale],
                        best[1][idx_data, idx_scale] + 1
                    )
                )

        self.ckp.write_log('Forward: {:.2f}s\n'.format(timer_test.toc()))
        self.ckp.write_log('Saving...')]

sipie800 · 2020-12-24T13:06:07Z

same issue in win10 python 3.7.8 torch 1.2,
By the way,using multiprocess functionity seem not to be an good choice.Testing images will eat up the graphic memory very soon.And the implement of multiprocessing in python is not so decent.
Perhaps going to cuda deeper is optional.

rezraz1 · 2022-06-12T06:55:41Z

Hi, I tried to run
python main.py --data_test Demo --scale 4 --pre_train download --test_only --save_results
I had a few errors that I was able to fix with the help of [https://github.com//issues/105] and [https://github.com//issues/91] but now I have this problem. What could be the reason for this?

These are the works I have done so far to fix the errors :

error 1: AttributeError: Can't pickle local object 'checkpoint.begin_background..bg_target'
and
E0FError: Ran out of input
Resolved with : #105 (comment) :
It seems there are some conflicts between multiprocessing and your system. Remove this and these lines to disable multiprocessing.

error 2 : AttributeError: 'checkpoint' object has no attribute 'queue'
Resolved with : removing --save_results

error 3 : [WinError 1455] The paging file is too small for this operation to complete.
Resolved with : It is clear that --n_threads 0 is used to fix this error

also my system settings :

windows 8.1
python 3.8.5 base conda
pytorch 1.8.2+cuda10.2

The result i get now is this :

 Microsoft Windows [Version 6.3.9600]
(c) 2013 Microsoft Corporation. All rights reserved.

E:\EDSR-PyTorch-master\EDSR-PyTorch-master\src>python ma
in.py --data_test Demo --scale 4 --pre_train download --test_only --n_threads 0
Making model...
Download the model

Evaluation:
  0%|                                                     | 0/1 [00:00<?, ?it/s]
100%|█████████████████████████████████████████████| 1/1 [00:02<00:00,  2.01s/it]
100%|█████████████████████████████████████████████| 1/1 [00:02<00:00,  2.01s/it]

[Demo x4]       PSNR: 0.000 (Best: 0.000 @epoch 1)
Forward: 2.01s

Saving...
Total: 2.01s

Question :

Why the psnr is always zero even if I put a few images in the file or it would be nan if i use benchmark data?
What is the problem and what should I do to fix it?

Thank you for your help

renxiaosa00 · 2023-10-11T02:43:55Z

Why the psnr is always zero even if I put a few images in the file or it would be nan if i use benchmark data?
What is the problem and what should I do to fix it?

I have same question,Why the psnr is always zero even? train:--model EDSR --scale 2 --patch_size 96 --save edsr_baseline_x2 --reset

This was referenced Feb 17, 2019

error inwindows #88

Closed

Got errors when running demo.sh #50

Closed

fix multiprocessing issue in windows environment #118

Merged

sanghyun-son closed this as completed Mar 27, 2019

HarukiYqM mentioned this issue Jul 6, 2021

how to test？ HarukiYqM/Non-Local-Sparse-Attention#1

Closed

HantingChen mentioned this issue Mar 21, 2022

不能使用--save_results 命令 huawei-noah/Pretrained-IPT#34

Closed

rezraz1 mentioned this issue Jun 12, 2022

Why the **psnr** is zero or nan? #332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got error when running on Windows 10. #91

Got error when running on Windows 10. #91

zzsmg commented Nov 22, 2018

tabetomo commented Feb 17, 2019

tabetomo commented Feb 18, 2019

tabetomo commented Feb 18, 2019 •

edited

Loading

fafancier commented Apr 24, 2019 •

edited

Loading

sipie800 commented Dec 24, 2020

rezraz1 commented Jun 12, 2022 •

edited

Loading

renxiaosa00 commented Oct 11, 2023

Got error when running on Windows 10. #91

Got error when running on Windows 10. #91

Comments

zzsmg commented Nov 22, 2018

tabetomo commented Feb 17, 2019

tabetomo commented Feb 18, 2019

tabetomo commented Feb 18, 2019 • edited Loading

fafancier commented Apr 24, 2019 • edited Loading

sipie800 commented Dec 24, 2020

rezraz1 commented Jun 12, 2022 • edited Loading

These are the works I have done so far to fix the errors :

also my system settings :

The result i get now is this :

Question :

renxiaosa00 commented Oct 11, 2023

tabetomo commented Feb 18, 2019 •

edited

Loading

fafancier commented Apr 24, 2019 •

edited

Loading

rezraz1 commented Jun 12, 2022 •

edited

Loading