Error in evaluating a model #18

GaoFangshu · 2017-07-15T05:45:00Z

When I run

eval_only=1 game=./rts/game_MC/game model=actor_critic model_file=./rts/game_MC/model python3 run.py --batchsize 128 --fs_opponent 20 --latest_start 500 --latest_start_decay 0.99 --num_games 1024 --opponent_type AI_SIMPLE --stats winrate --num_eval 10000 --tqdm

There come two errors:

Traceback (most recent call last):
  File "run.py", line 154, in <module>
    evaluator = Eval()
  File "run.py", line 65, in __init__
    ("num_eval", 500)
TypeError: __init__() got an unexpected keyword argument 'define_params'
Exception ignored in: <bound method Eval.__del__ of <__main__.Eval object at 0x7fde053ee160>>
Traceback (most recent call last):
  File "run.py", line 132, in __del__
    self.GC.Stop()
AttributeError: 'Eval' object has no attribute 'GC'

For the first error TypeError: __init__() got an unexpected keyword argument 'define_params', I change

class Eval:
    def __init__(self):
        self.args = ArgsProvider(
            call_from = self,
            define_params = [
                ("stats", dict(type=str, choices=["rewards", "winrate"], default="rewards")),
                ("num_eval", 500)
            ]
        )

from line 59 in run.py as

class Eval:
    def __init__(self):
        self.args = ArgsProvider(
            call_from = self,
            define_args = [   # Change `define_params` as `define_args`
                ("stats", dict(type=str, choices=["rewards", "winrate"], default="rewards")),
                ("num_eval", 500)
            ]
        )

and it works. But I don't know how to debbug for second error AttributeError: 'Eval' object has no attribute 'GC'
I did have some .bin files (e.g. save-9059.bin) in the folder, are there anything that I missed?

BTW, how can I test and visualize my model from .bin file in html?
Thanks!

The text was updated successfully, but these errors were encountered:

yuandong-tian · 2017-07-15T14:54:43Z

I have fixed the first issue you mentioned.

For the second one, I think you must have trained the model using the codebase before the recent refactoring (that puts __init__.py in both ./elf and ./rlpytorch). Therefore there is an issue in loading (not finding the right module).

You can retrain your model with the current codebase. And I will write a patch to fix it.

yuandong-tian · 2017-07-15T15:05:41Z

@GaoFangshu You are fast :) We will put the code used to test *.bin in html soon.

yuandong-tian · 2017-07-15T15:19:26Z

@GaoFangshu
To address the second issues, first apply this patch:

diff --git a/rlpytorch/rlmethod_base.py b/rlpytorch/rlmethod_base.py
index 2fa0a8d..eb559b7 100644
--- a/rlpytorch/rlmethod_base.py
+++ b/rlpytorch/rlmethod_base.py
@@ -7,8 +7,13 @@

 import abc
 from collections import defaultdict
-from .utils import Stats
-from .args_utils import ArgsProvider
+
+try:
+    from .utils import Stats
+    from .args_utils import ArgsProvider
+except:
+    from utils import Stats
+    from args_utils import ArgsProvider

 class LearningMethod:
     def __init__(self, mi=None, args=None):
diff --git a/rlpytorch/rlmethod_common.py b/rlpytorch/rlmethod_common.py
index f4b36c2..850a6be 100644
--- a/rlpytorch/rlmethod_common.py
+++ b/rlpytorch/rlmethod_common.py
@@ -5,7 +5,11 @@
 # LICENSE file in the root directory of this source tree. An additional grant
 # of patent rights can be found in the PATENTS file in the same directory.

-from .rlmethod_base import *
+try:
+    from .rlmethod_base import *
+except:
+    from rlmethod_base import *
+
 import torch
 import torch.nn as nn
 from torch.autograd import Variable

and then when you load the model, put

sys.path.insert(0, os.path.join("./rlpytorch"))
sys.path.insert(0, os.path.join("./rts/game_MC/"))

before torch.load.

yuandong-tian · 2017-07-15T15:27:20Z

@GaoFangshu wait.. for the second one, do you specify --load [your model.bin]?

GaoFangshu · 2017-07-15T15:28:30Z

@yuandong-tian Yes, I specified --load save-9059.bin

yuandong-tian · 2017-07-15T15:32:17Z

@GaoFangshu maybe what you got is not what I encountered. You can specify a link to your model and I will try loading it myself.

GaoFangshu · 2017-07-15T15:57:48Z

@yuandong-tian OK, here are the files. I just used the defaut model file, and loaded the .bin:

eval_only=1 game=./rts/game_MC/game model=actor_critic model_file=./rts/game_MC/model python3 run.py --batchsize 128 --fs_opponent 20 --latest_start 500 --latest_start_decay 0.99 --num_games 1024 --opponent_type AI_SIMPLE --stats winrate --num_eval 10000 --tqdm --load save-27992.bin

Then comes:

Namespace(T=6, actor_only=False, ai_type='AI_NN', batchsize=128, discount=0.99, entropy_ratio=0.01, epsilon=0.0, eval=False, eval_freq=10, eval_gpu=1, freq_update=1, fs_ai=50, fs_opponent=20, game_multi=None, gpu=None, grad_clip_norm=None, greedy=False, handicap_level=0, latest_start=500, latest_start_decay=0.99, load='save-27992.bin', max_tick=30000, mcts_threads=64, min_prob=1e-06, num_episode=10000, num_eval=10000, num_games=1024, num_minibatch=5000, opponent_type='AI_SIMPLE', ratio_change=0, record_dir='./record', sample_node='pi', sample_policy='epsilon-greedy', save_dir=None, save_prefix='save', seed=0, simple_ratio=-1, stats='winrate', tqdm=True, verbose_collector=False, verbose_comm=False, wait_per_group=False)
Version:  f25f500a1422b657369d8d8b8c5725d5d74616d7_
Num Actions:  9
Num unittype:  6
Load from save-27992.bin
THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=84 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "run.py", line 184, in <module>
    eval_process.run_same_process(mi.clone(all_args.eval_gpu))
  File "/media/gaofangshu/Windows/Users/Fangshu Gao/Desktop/ELF-example/rlpytorch/model_interface.py", line 29, in clone
    mi.models[key] = model.clone(gpu=gpu)
  File "/media/gaofangshu/Windows/Users/Fangshu Gao/Desktop/ELF-example/rlpytorch/model_base.py", line 27, in clone
    model.cuda(gpu)
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 147, in cuda
    return self._apply(lambda t: t.cuda(device_id))
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 124, in _apply
    param.data = fn(param.data)
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 147, in <lambda>
    return self._apply(lambda t: t.cuda(device_id))
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/_utils.py", line 57, in _cuda
    with torch.cuda.device(device):
  File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/cuda/__init__.py", line 127, in __enter__
    torch._C._cuda_setDevice(self.idx)
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:84
Exception ignored in: <bound method Eval.__del__ of <__main__.Eval object at 0x7fb499e791d0>>
Traceback (most recent call last):
  File "run.py", line 132, in __del__
AttributeError: 'Eval' object has no attribute 'GC'
Beginning stop all collectors ...
Stop all game threads ...

Thank you for your help!

yuandong-tian · 2017-07-15T16:39:38Z

@GaoFangshu Oh, ic. you only have one gpu on your machine. By default, the test is on gpu=1. So you need to set --eval_gpu 0.

See: https://github.com/facebookresearch/ELF/blob/master/rlpytorch/trainer.py#L224

GaoFangshu · 2017-07-15T16:47:57Z

@yuandong-tian Thank you very much! It works!
And could you please see Issue #16 ? Will there be any documentation about actions and args in miniRTS?

yuandong-tian · 2017-07-15T17:18:55Z

@GaoFangshu we will release the document regarding this soon. For now you can check the arXiv paper. https://arxiv.org/abs/1707.01067

GaoFangshu · 2017-07-15T23:08:53Z

@yuandong-tian Thanks :D

Temporarily add pytorch nightly build instructions

yuandong-tian closed this as completed Jul 18, 2017

gchlodzinski mentioned this issue Aug 4, 2017

unable to start train #27

Closed

teytaud pushed a commit that referenced this issue Nov 4, 2018

Merge pull request #18 from jma127/master

e3f4072

Temporarily add pytorch nightly build instructions

douglasrizzo mentioned this issue Nov 11, 2019

Minimum working example for the RTS game with an RL lifecycle? #138

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in evaluating a model #18

Error in evaluating a model #18

GaoFangshu commented Jul 15, 2017 •

edited

Loading

yuandong-tian commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

GaoFangshu commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

GaoFangshu commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

GaoFangshu commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

GaoFangshu commented Jul 15, 2017

Error in evaluating a model #18

Error in evaluating a model #18

Comments

GaoFangshu commented Jul 15, 2017 • edited Loading

yuandong-tian commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

GaoFangshu commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

GaoFangshu commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

GaoFangshu commented Jul 15, 2017

yuandong-tian commented Jul 15, 2017

GaoFangshu commented Jul 15, 2017

GaoFangshu commented Jul 15, 2017 •

edited

Loading