-
Notifications
You must be signed in to change notification settings - Fork 285
Error in evaluating a model #18
Comments
I have fixed the first issue you mentioned. For the second one, I think you must have trained the model using the codebase before the recent refactoring (that puts You can retrain your model with the current codebase. And I will write a patch to fix it. |
@GaoFangshu You are fast :) We will put the code used to test *.bin in html soon. |
@GaoFangshu
and then when you load the model, put
before |
@GaoFangshu wait.. for the second one, do you specify |
@yuandong-tian Yes, I specified |
@GaoFangshu maybe what you got is not what I encountered. You can specify a link to your model and I will try loading it myself. |
@yuandong-tian OK, here are the files. I just used the defaut model file, and loaded the .bin:
Then comes: Namespace(T=6, actor_only=False, ai_type='AI_NN', batchsize=128, discount=0.99, entropy_ratio=0.01, epsilon=0.0, eval=False, eval_freq=10, eval_gpu=1, freq_update=1, fs_ai=50, fs_opponent=20, game_multi=None, gpu=None, grad_clip_norm=None, greedy=False, handicap_level=0, latest_start=500, latest_start_decay=0.99, load='save-27992.bin', max_tick=30000, mcts_threads=64, min_prob=1e-06, num_episode=10000, num_eval=10000, num_games=1024, num_minibatch=5000, opponent_type='AI_SIMPLE', ratio_change=0, record_dir='./record', sample_node='pi', sample_policy='epsilon-greedy', save_dir=None, save_prefix='save', seed=0, simple_ratio=-1, stats='winrate', tqdm=True, verbose_collector=False, verbose_comm=False, wait_per_group=False)
Version: f25f500a1422b657369d8d8b8c5725d5d74616d7_
Num Actions: 9
Num unittype: 6
Load from save-27992.bin
THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=84 error=10 : invalid device ordinal
Traceback (most recent call last):
File "run.py", line 184, in <module>
eval_process.run_same_process(mi.clone(all_args.eval_gpu))
File "/media/gaofangshu/Windows/Users/Fangshu Gao/Desktop/ELF-example/rlpytorch/model_interface.py", line 29, in clone
mi.models[key] = model.clone(gpu=gpu)
File "/media/gaofangshu/Windows/Users/Fangshu Gao/Desktop/ELF-example/rlpytorch/model_base.py", line 27, in clone
model.cuda(gpu)
File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 147, in cuda
return self._apply(lambda t: t.cuda(device_id))
File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 118, in _apply
module._apply(fn)
File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 118, in _apply
module._apply(fn)
File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 124, in _apply
param.data = fn(param.data)
File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 147, in <lambda>
return self._apply(lambda t: t.cuda(device_id))
File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/_utils.py", line 57, in _cuda
with torch.cuda.device(device):
File "/media/gaofangshu/Windows/Ubuntu/anaconda3/lib/python3.5/site-packages/torch/cuda/__init__.py", line 127, in __enter__
torch._C._cuda_setDevice(self.idx)
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:84
Exception ignored in: <bound method Eval.__del__ of <__main__.Eval object at 0x7fb499e791d0>>
Traceback (most recent call last):
File "run.py", line 132, in __del__
AttributeError: 'Eval' object has no attribute 'GC'
Beginning stop all collectors ...
Stop all game threads ... Thank you for your help! |
@GaoFangshu Oh, ic. you only have one gpu on your machine. By default, the test is on gpu=1. So you need to set See: https://github.com/facebookresearch/ELF/blob/master/rlpytorch/trainer.py#L224 |
@yuandong-tian Thank you very much! It works! |
@GaoFangshu we will release the document regarding this soon. For now you can check the arXiv paper. https://arxiv.org/abs/1707.01067 |
@yuandong-tian Thanks :D |
Temporarily add pytorch nightly build instructions
When I run
There come two errors:
For the first error
TypeError: __init__() got an unexpected keyword argument 'define_params'
, I changefrom line 59 in
run.py
asand it works. But I don't know how to debbug for second error
AttributeError: 'Eval' object has no attribute 'GC'
I did have some
.bin
files (e.g.save-9059.bin
) in the folder, are there anything that I missed?BTW, how can I test and visualize my model from
.bin
file in html?Thanks!
The text was updated successfully, but these errors were encountered: