Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Error in evaluating a model #18

Closed
GaoFangshu opened this issue Jul 15, 2017 · 11 comments
Closed

Error in evaluating a model #18

GaoFangshu opened this issue Jul 15, 2017 · 11 comments

Comments

@GaoFangshu
Copy link

GaoFangshu commented Jul 15, 2017

When I run

eval_only=1 game=./rts/game_MC/game model=actor_critic model_file=./rts/game_MC/model python3 run.py --batchsize 128 --fs_opponent 20 --latest_start 500 --latest_start_decay 0.99 --num_games 1024 --opponent_type AI_SIMPLE --stats winrate --num_eval 10000 --tqdm

There come two errors:

Traceback (most recent call last):
  File "run.py", line 154, in <module>
    evaluator = Eval()
  File "run.py", line 65, in __init__
    ("num_eval", 500)
TypeError: __init__() got an unexpected keyword argument 'define_params'
Exception ignored in: <bound method Eval.__del__ of <__main__.Eval object at 0x7fde053ee160>>
Traceback (most recent call last):
  File "run.py", line 132, in __del__
    self.GC.Stop()
AttributeError: 'Eval' object has no attribute 'GC'

For the first error TypeError: __init__() got an unexpected keyword argument 'define_params', I change

class Eval:
    def __init__(self):
        self.args = ArgsProvider(
            call_from = self,
            define_params = [
                ("stats", dict(type=str, choices=["rewards", "winrate"], default="rewards")),
                ("num_eval", 500)
            ]
        )

from line 59 in run.py as

class Eval:
    def __init__(self):
        self.args = ArgsProvider(
            call_from = self,
            define_args = [   # Change `define_params` as `define_args`
                ("stats", dict(type=str, choices=["rewards", "winrate"], default="rewards")),
                ("num_eval", 500)
            ]
        )

and it works. But I don't know how to debbug for second error AttributeError: 'Eval' object has no attribute 'GC'
I did have some .bin files (e.g. save-9059.bin) in the folder, are there anything that I missed?

BTW, how can I test and visualize my model from .bin file in html?
Thanks!

@yuandong-tian
Copy link
Contributor

I have fixed the first issue you mentioned.

For the second one, I think you must have trained the model using the codebase before the recent refactoring (that puts __init__.py in both ./elf and ./rlpytorch). Therefore there is an issue in loading (not finding the right module).

You can retrain your model with the current codebase. And I will write a patch to fix it.

@yuandong-tian
Copy link
Contributor

@GaoFangshu You are fast :) We will put the code used to test *.bin in html soon.

@yuandong-tian
Copy link
Contributor

@GaoFangshu
To address the second issues, first apply this patch:

diff --git a/rlpytorch/rlmethod_base.py b/rlpytorch/rlmethod_base.py
index 2fa0a8d..eb559b7 100644
--- a/rlpytorch/rlmethod_base.py
+++ b/rlpytorch/rlmethod_base.py
@@ -7,8 +7,13 @@

 import abc
 from collections import defaultdict
-from .utils import Stats
-from .args_utils import ArgsProvider
+
+try:
+    from .utils import Stats
+    from .args_utils import ArgsProvider
+except:
+    from utils import Stats
+    from args_utils import ArgsProvider

 class LearningMethod:
     def __init__(self, mi=None, args=None):
diff --git a/rlpytorch/rlmethod_common.py b/rlpytorch/rlmethod_common.py
index f4b36c2..850a6be 100644
--- a/rlpytorch/rlmethod_common.py
+++ b/rlpytorch/rlmethod_common.py
@@ -5,7 +5,11 @@
 # LICENSE file in the root directory of this source tree. An additional grant
 # of patent rights can be found in the PATENTS file in the same directory.

-from .rlmethod_base import *
+try:
+    from .rlmethod_base import *
+except:
+    from rlmethod_base import *
+
 import torch
 import torch.nn as nn
 from torch.autograd import Variable

and then when you load the model, put

sys.path.insert(0, os.path.join("./rlpytorch"))
sys.path.insert(0, os.path.join("./rts/game_MC/"))

before torch.load.

@yuandong-tian
Copy link
Contributor

@GaoFangshu wait.. for the second one, do you specify --load [your model.bin]?

@GaoFangshu
Copy link
Author

@yuandong-tian Yes, I specified --load save-9059.bin

@yuandong-tian
Copy link
Contributor

@GaoFangshu maybe what you got is not what I encountered. You can specify a link to your model and I will try loading it myself.

@GaoFangshu
Copy link
Author

@yuandong-tian OK, here are the files. I just used the defaut model file, and loaded the .bin:

eval_only=1 game=./rts/game_MC/game model=actor_critic model_file=./rts/game_MC/model python3 run.py --batchsize 128 --fs_opponent 20 --latest_start 500 --latest_start_decay 0.99 --num_games 1024 --opponent_type AI_SIMPLE --stats winrate --num_eval 10000 --tqdm --load save-27992.bin

Then comes:

Namespace(T=6, actor_only=False, ai_type='AI_NN', batchsize=128, discount=0.99, entropy_ratio=0.01, epsilon=0.0, eval=False, eval_freq=10, eval_gpu=1, freq_update=1, fs_ai=50, fs_opponent=20, game_multi=None, gpu=None, grad_clip_norm=None, greedy=False, handicap_level=0, latest_start=500, latest_start_decay=0.99, load='save-27992.bin', max_tick=30000, mcts_threads=64, min_prob=1e-06, num_episode=10000, num_eval=10000, num_games=1024, num_minibatch=5000, opponent_type='AI_SIMPLE', ratio_change=0, record_dir='./record', sample_node='pi', sample_policy='epsilon-greedy', save_dir=None, save_prefix='save', seed=0, simple_ratio=-1, stats='winrate', tqdm=True, verbose_collector=False, verbose_comm=False, wait_per_group=False)
Version:  f25f500a1422b657369d8d8b8c5725d5d74616d7_
Num Actions:  9
Num unittype:  6
Load from save-27992.bin
THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=84 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "run.py", line 184, in <module>
    eval_process.run_same_process(mi.clone(all_args.eval_gpu))
  File "/media/gaofangshu/Windows/Users/Fangshu Gao/Desktop/ELF-example/rlpytorch/model_interface.py", line 29, in clone
    mi.models[key] = model.clone(gpu=gpu)
  File "/media/gaofangshu/Windows/Users/Fangshu Gao/Desktop/ELF-example/rlpytorch/model_base.py", line 27, in clone
    model.cuda(gpu)
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 147, in cuda
    return self._apply(lambda t: t.cuda(device_id))
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 124, in _apply
    param.data = fn(param.data)
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 147, in <lambda>
    return self._apply(lambda t: t.cuda(device_id))
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/_utils.py", line 57, in _cuda
    with torch.cuda.device(device):
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/cuda/__init__.py", line 127, in __enter__
    torch._C._cuda_setDevice(self.idx)
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:84
Exception ignored in: <bound method Eval.__del__ of <__main__.Eval object at 0x7fb499e791d0>>
Traceback (most recent call last):
  File "run.py", line 132, in __del__
AttributeError: 'Eval' object has no attribute 'GC'
Beginning stop all collectors ...
Stop all game threads ...

Thank you for your help!

@yuandong-tian
Copy link
Contributor

@GaoFangshu Oh, ic. you only have one gpu on your machine. By default, the test is on gpu=1. So you need to set --eval_gpu 0.

See: https://github.com/facebookresearch/ELF/blob/master/rlpytorch/trainer.py#L224

@GaoFangshu
Copy link
Author

@yuandong-tian Thank you very much! It works!
And could you please see Issue #16 ? Will there be any documentation about actions and args in miniRTS?

@yuandong-tian
Copy link
Contributor

@GaoFangshu we will release the document regarding this soon. For now you can check the arXiv paper. https://arxiv.org/abs/1707.01067

@GaoFangshu
Copy link
Author

@yuandong-tian Thanks :D

teytaud pushed a commit that referenced this issue Nov 4, 2018
Temporarily add pytorch nightly build instructions
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants