Reset error in deadly corridor: screen buffer #543

MetallicaSPA · 2023-05-13T19:42:52Z

Hello, I'm following this tutorial: https://www.youtube.com/watch?v=eBCU-tqLGfQ. I'm using stable baselines, wrapping the enviroment for gymnasium, and randomly it gives me the error: 'NoneType' object has no attribute 'screen_buffer', pointing to my reset function, which is:

def reset(self):
    self.game.new_episode()
    state = self.game.get_state().screen_buffer
    info = 0
    info = {"info":info}

    return self.grayscale(state), info

So the game should be reset. I tried with the basic.wad and defend_the_center.wad and nothing happened. What could be the issue? Any ideas? Thanks in advance.

The text was updated successfully, but these errors were encountered:

mwydmuch · 2023-05-13T20:17:48Z

Hi @MetallicaSPA! I may need some help to fully understand what is happening. If you mean that from time to time, you get None from get_state(), then this is expected. In the original ViZDoom API get_state() will return None if the episode ends/reaches the terminal state. So you should always check if it's None or use the self.game.is_episode_finished() check.
If your problem is that self.game.new_episode() doesn't reset your episode then this is unexpected, but I would need a code sample to run to see what is happening.

Also, we now provide official wrappers for Gym and Gymnasium, so you don't need to implement them yourself! Check https://github.com/Farama-Foundation/ViZDoom/tree/master/examples/python directory for Gym, Gymnasium and StableBaselines examples.

MetallicaSPA · 2023-05-13T20:33:29Z

If your problem is that self.game.new_episode() doesn't reset your episode then this is unexpected, but I would need a code sample to run to see what is happening.

That's what it seems to happen, because I tried and it happens at different steps; so I feel it's something random.
Here's the full code:

import vizdoom as vzd
import numpy as np
import cv2
import os 

from vizdoom import *
from gymnasium import Env
from gymnasium.spaces import Discrete, Box
from stable_baselines3.common.callbacks import CallbackList, EvalCallback, ProgressBarCallback, CheckpointCallback
from stable_baselines3 import A2C

DEFAULT_CONFIG = "/home/joaquin/TFM/Doom_RL/scenarios/deadly_corridor.cfg"
SCENARIO_PATH = '/home/joaquin/TFM/Doom_RL/scenarios_official/deadly_corridor.wad'
CHECKPOINT_DIR = './train/train_deadly_corridor'
LOG_DIR = './logs/log_deadly_corridor'

render = False # True will show the window while training, False don't but will make the training faster

class VizDoomGym(Env): 
    # Function that is called when we start the env
    def __init__(self, render=render): 
        # Inherit from Env
        super().__init__()
        # Setup the game 
        self.game = vzd.DoomGame()
        self.game.load_config(DEFAULT_CONFIG)
        self.game.set_doom_scenario_path(SCENARIO_PATH)
        
        self.game.set_doom_game_path("/home/joaquin/TFM/Doom_RL/DOOM2.WAD")
        self.game.set_render_hud(False)
        self.game.set_screen_resolution(vzd.ScreenResolution.RES_640X480)
        # self.game.set_screen_resolution(vzd.ScreenResolution.RES_160X120)
        # Set cv2 friendly format.
        # self.game.set_screen_format(vzd.ScreenFormat.BGR24)
        
        # Enables labeling of the in game objects.
        self.game.set_labels_buffer_enabled(True)
        # Enables depth buffer (turned off by default).
        self.game.set_depth_buffer_enabled(True)
        
        # Render frame logic
        if render == False: 
            self.game.set_window_visible(False)
        else:
            self.game.set_window_visible(True)
        
        self.game.clear_available_game_variables()
        self.game.set_available_game_variables([
                                          vzd.GameVariable.AMMO0,
                                          vzd.GameVariable.AMMO1,
                                          vzd.GameVariable.AMMO2,
                                          vzd.GameVariable.AMMO3,
                                          vzd.GameVariable.AMMO4,
                                          vzd.GameVariable.AMMO5,
                                          vzd.GameVariable.AMMO6,
                                          vzd.GameVariable.AMMO7,
                                          vzd.GameVariable.AMMO8,
                                          vzd.GameVariable.AMMO9,
                                          vzd.GameVariable.ARMOR,
                                          vzd.GameVariable.HEALTH,
                                          vzd.GameVariable.POSITION_X,
                                          vzd.GameVariable.POSITION_Y,
                                          vzd.GameVariable.POSITION_Z,
                                          vzd.GameVariable.SELECTED_WEAPON,
                                          vzd.GameVariable.SELECTED_WEAPON_AMMO,
                                          vzd.GameVariable.WEAPON0,
                                          vzd.GameVariable.WEAPON1,
                                          vzd.GameVariable.WEAPON2,
                                          vzd.GameVariable.WEAPON3,
                                          vzd.GameVariable.WEAPON4,
                                          vzd.GameVariable.WEAPON5,
                                          vzd.GameVariable.WEAPON6,
                                          vzd.GameVariable.WEAPON7,
                                          vzd.GameVariable.WEAPON8,
                                          vzd.GameVariable.WEAPON9,
                                          vzd.GameVariable.DAMAGE_TAKEN,
                                          vzd.GameVariable.HITCOUNT
                                          ])
        
        # Start the game 
        self.game.init()
        
        # Get game variables:
        self.damage_taken = 0
        self.hitcount = 0
        self.ammo = 52
    
        # Create the action space and observation space
        self.observation_space = Box(low=0, high=255, shape=(160,120,1), dtype=np.uint8)
        self.action_space = Discrete(14)
        
    # This is how we take a step in the environment
    def step(self, action):
        # Specify action and take step 
        actions = np.identity(14)
        action_reward = self.game.make_action(actions[action], 4) 
        
        # Get all the other stuff we need to return 
        if self.game.get_state(): 
            state = self.game.get_state().screen_buffer
            state = self.grayscale(state)
            
            ammo0 = self.game.get_state().game_variables[0]
            ammo1 = self.game.get_state().game_variables[1]
            ammo2= self.game.get_state().game_variables[2]
            ammo3 = self.game.get_state().game_variables[3]
            ammo4 = self.game.get_state().game_variables[4]
            ammo5 = self.game.get_state().game_variables[5]
            ammo6 = self.game.get_state().game_variables[6]
            ammo7 = self.game.get_state().game_variables[7]
            ammo8 = self.game.get_state().game_variables[8]
            ammo9 = self.game.get_state().game_variables[9]
            armor = self.game.get_state().game_variables[10]
            health = self.game.get_state().game_variables[11] 
            pos_x = self.game.get_state().game_variables[12]
            pos_y = self.game.get_state().game_variables[13]
            pos_z = self.game.get_state().game_variables[14]
            selected_weapon = self.game.get_state().game_variables[15] 
            selected_weapon_ammo = self.game.get_state().game_variables[16] 
            weapon0 = self.game.get_state().game_variables[17]
            weapon1 = self.game.get_state().game_variables[18]
            weapon2 = self.game.get_state().game_variables[19]
            weapon3 = self.game.get_state().game_variables[20]
            weapon4 = self.game.get_state().game_variables[21]
            weapon5 =self.game.get_state().game_variables[22]
            weapon6 = self.game.get_state().game_variables[23]
            weapon7 = self.game.get_state().game_variables[24]
            weapon8 = self.game.get_state().game_variables[25]
            weapon9 = self.game.get_state().game_variables[26]
            damage_taken = self.game.get_state().game_variables[27]
            hitcount = self.game.get_state().game_variables[28]
            
            info = {"ammo0":ammo0, "ammo1":ammo1, "ammo2":ammo2, "ammo3":ammo3,
                    "ammo4":ammo4,"ammo5":ammo5,"ammo6":ammo6,"ammo7":ammo7, "ammo8":ammo8,
                    "ammo9":ammo9, "armor":armor, "health":health, "pos_x":pos_x, 
                    "pos_y":pos_y, "pos_z":pos_z, "selected_weapon":selected_weapon, 
                    "selected_weapon_ammo":selected_weapon_ammo, "weapon0":weapon0,
                    "weapon1":weapon1,"weapon2":weapon2,"weapon3":weapon3,
                    "weapon4":weapon4,"weapon5":weapon5,"weapon6":weapon6,
                    "weapon7":weapon7,"weapon8":weapon8,"weapon9":weapon9, 
                    'damage_taken':damage_taken, 'hitcount':hitcount}
            
            # Calculate rewards:
            total_damage_taken = -damage_taken + self.damage_taken
            self.damage_taken = total_damage_taken
            total_hitcount = hitcount - self.hitcount
            total_ammo = ammo0 + ammo1 + ammo2 + ammo3 + ammo4 + ammo5 + ammo6 + ammo7 + ammo8 + ammo9 - self.ammo
            self.ammo = total_ammo
            
            reward = action_reward + total_damage_taken*10 + total_hitcount*200 + total_ammo*5
            
            truncated = False
        else: 
            state = np.zeros(self.observation_space.shape)
            info = 0
            reward = 0
            truncated = True
        
        info = {"info":info}
        done = self.game.is_episode_finished()
        
        return state, reward, done, truncated, info 
    
    # Define how to render the game or environment 
    def render(): 
        pass
    
    # What happens when we start a new game 
    def reset(self):
        self.game.new_episode()
        state = self.game.get_state().screen_buffer
        info = 0
        info = {"info":info}

        return self.grayscale(state), info
    
    
    # Grayscale the game frame and resize it 
    def grayscale(self, observation):
        gray = cv2.cvtColor(np.moveaxis(observation, 0, -1), cv2.COLOR_BGR2GRAY)
        resize = cv2.resize(gray, (160,120), interpolation=cv2.INTER_CUBIC)
        state = np.reshape(resize, (160,120,1))
        return state
    

    
    # Call to close down the game
    def close(self): 
        self.game.close()
        
# ENVIROMENT CHECK:        
# env = VizDoomGym(render=True)

# state = env.reset()

# env_checker.check_env(env)

# TRAIN MODEL

env = VizDoomGym()

checkpoint_callback = CheckpointCallback(save_freq=50000, save_path=CHECKPOINT_DIR, 
                                         save_replay_buffer=True, save_vecnormalize=True)
eval_callback = EvalCallback(env, best_model_save_path=CHECKPOINT_DIR, log_path=LOG_DIR, 
                             eval_freq=50000, deterministic=False, render=True, verbose=1)

callback = CallbackList([checkpoint_callback, eval_callback])

model = A2C('CnnPolicy', env, tensorboard_log=LOG_DIR, verbose=1, learning_rate=0.0001, n_steps=8192)
# model = A2C.load('/home/joaquin/TFM/Doom_RL/train/train_basic/best_model_1800000', env)
model.learn(total_timesteps=3000000, callback=callback, progress_bar=True)
model.save('vizdoom_A2C')
env.close()`

mwydmuch · 2023-05-13T22:13:00Z

How often does it happen? I'm running your code using Stable-Baselines3 2.0.0a5 alpha (one with Gymnasium support), installed in the following way:

pip install "sb3_contrib>=2.0.0a1" --upgrade
pip install "stable_baselines3>=2.0.0a1" --upgrade

and I don't see any problem with the reset method after 200k timesteps. I'm afraid I will need more details to help you. Details about your environment, and detailed instructions on how to reproduce the problem (and how it occurs).

MetallicaSPA · 2023-05-14T02:10:57Z

How often does it happen?

It happens every time I ran that enviroment, usually before 50k steps. Never happened with basic or defend the center.
Info about my enviroment:

I'm running everything in Linux Mint 21.1 Vera, under Anaconda using Spyder IDE.
Vizdoom version: 1.2.0
Gymnasium version: 0.26.3
Stable-baselines3 version_ 2.0.0a5

Let me know if you need any more information about my enviroment.

EDIT: Updated Gymnasium to 0.28.1, still getting the same problem.
Here's the traceback:

File ~/anaconda3/lib/python3.9/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
exec(code, globals, locals)

File ~/TFM/Doom_RL/vizdoom_A2C.py:248
model.learn(total_timesteps=3000000, callback=callback, progress_bar=True)

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/a2c/a2c.py:194 in learn
return super().learn(

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py:259 in learn
continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py:178 in collect_rollouts
new_obs, rewards, dones, infos = env.step(clipped_actions)

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/vec_env/base_vec_env.py:171 in step
return self.step_wait()

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/vec_env/vec_transpose.py:95 in step_wait
observations, rewards, dones, infos = self.venv.step_wait()

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py:69 in step_wait
obs, self.reset_infos[env_idx] = self.envs[env_idx].reset()

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/monitor.py:83 in reset
return self.env.reset(**kwargs)

File ~/TFM/Doom_RL/vizdoom_A2C.py:208 in reset
state = self.game.get_state().screen_buffer

AttributeError: 'NoneType' object has no attribute 'screen_buffer'

mwydmuch · 2023-05-14T15:28:42Z

@MetallicaSPA, I replicated your environment and ran a slightly modified script (I attached the modified version below). I've just changed paths to config/log/model files. After 3mln of timesteps, no error. Checked deathmatch and deadly corridor environments.

So at the moment, I think the reason might be that your .cfg or .wad files were somehow modified and, for example, now allow the agent to be killed before the episode starts. This is, for example, possible if the episode's start_time in the config is set to a large number. If you are sure that your .cfg/.wad files were not modified, then I will need to ask you to prepare a docker file that I can run to replicate the problem.

import vizdoom as vzd
import numpy as np
import cv2
import os 

from vizdoom import *
from gymnasium import Env
from gymnasium.spaces import Discrete, Box
from stable_baselines3.common.callbacks import CallbackList, EvalCallback, ProgressBarCallback, CheckpointCallback
from stable_baselines3 import A2C

SCENARIO = "deadly_corridor"
DEFAULT_CONFIG = os.path.join(scenarios_path, f"{SCENARIO}.cfg")
CHECKPOINT_DIR = f'./vizdoom_train/train_{SCENARIO}'
LOG_DIR = f'./vizdoom_logs/log_{SCENARIO}'

render = False # True will show the window while training, False don't but will make the training faster

class VizDoomGym(Env): 
    # Function that is called when we start the env
    def __init__(self, render=render): 
        # Inherit from Env
        super().__init__()
        # Setup the game
        self.game = vzd.DoomGame()
        self.game.load_config(DEFAULT_CONFIG)
        
        self.game.set_doom_game_path("doom2.wad")
        self.game.set_render_hud(False)
        #self.game.set_screen_resolution(vzd.ScreenResolution.RES_640X480)
        self.game.set_screen_resolution(vzd.ScreenResolution.RES_160X120)
        # Set cv2 friendly format.
        # self.game.set_screen_format(vzd.ScreenFormat.BGR24)
        
        # Enables labeling of the in game objects.
        self.game.set_labels_buffer_enabled(True)
        # Enables depth buffer (turned off by default).
        self.game.set_depth_buffer_enabled(True)
        
        # Render frame logic
        if render == False: 
            self.game.set_window_visible(False)
        else:
            self.game.set_window_visible(True)
        
        self.game.clear_available_game_variables()
        self.game.set_available_game_variables([
                                          vzd.GameVariable.AMMO0,
                                          vzd.GameVariable.AMMO1,
                                          vzd.GameVariable.AMMO2,
                                          vzd.GameVariable.AMMO3,
                                          vzd.GameVariable.AMMO4,
                                          vzd.GameVariable.AMMO5,
                                          vzd.GameVariable.AMMO6,
                                          vzd.GameVariable.AMMO7,
                                          vzd.GameVariable.AMMO8,
                                          vzd.GameVariable.AMMO9,
                                          vzd.GameVariable.ARMOR,
                                          vzd.GameVariable.HEALTH,
                                          vzd.GameVariable.POSITION_X,
                                          vzd.GameVariable.POSITION_Y,
                                          vzd.GameVariable.POSITION_Z,
                                          vzd.GameVariable.SELECTED_WEAPON,
                                          vzd.GameVariable.SELECTED_WEAPON_AMMO,
                                          vzd.GameVariable.WEAPON0,
                                          vzd.GameVariable.WEAPON1,
                                          vzd.GameVariable.WEAPON2,
                                          vzd.GameVariable.WEAPON3,
                                          vzd.GameVariable.WEAPON4,
                                          vzd.GameVariable.WEAPON5,
                                          vzd.GameVariable.WEAPON6,
                                          vzd.GameVariable.WEAPON7,
                                          vzd.GameVariable.WEAPON8,
                                          vzd.GameVariable.WEAPON9,
                                          vzd.GameVariable.DAMAGE_TAKEN,
                                          vzd.GameVariable.HITCOUNT
                                          ])
        
        # Start the game 
        self.game.init()
        
        # Get game variables:
        self.damage_taken = 0
        self.hitcount = 0
        self.ammo = 52
    
        # Create the action space and observation space
        self.observation_space = Box(low=0, high=255, shape=(160,120,1), dtype=np.uint8)
        self.action_space = Discrete(14)
        
    # This is how we take a step in the environment
    def step(self, action):
        # Specify action and take step 
        actions = np.identity(14)
        action_reward = self.game.make_action(actions[action], 4) 
        
        # Get all the other stuff we need to return 
        if self.game.get_state(): 
            state = self.game.get_state().screen_buffer
            state = self.grayscale(state)
            
            ammo0 = self.game.get_state().game_variables[0]
            ammo1 = self.game.get_state().game_variables[1]
            ammo2 = self.game.get_state().game_variables[2]
            ammo3 = self.game.get_state().game_variables[3]
            ammo4 = self.game.get_state().game_variables[4]
            ammo5 = self.game.get_state().game_variables[5]
            ammo6 = self.game.get_state().game_variables[6]
            ammo7 = self.game.get_state().game_variables[7]
            ammo8 = self.game.get_state().game_variables[8]
            ammo9 = self.game.get_state().game_variables[9]
            armor = self.game.get_state().game_variables[10]
            health = self.game.get_state().game_variables[11] 
            pos_x = self.game.get_state().game_variables[12]
            pos_y = self.game.get_state().game_variables[13]
            pos_z = self.game.get_state().game_variables[14]
            selected_weapon = self.game.get_state().game_variables[15] 
            selected_weapon_ammo = self.game.get_state().game_variables[16] 
            weapon0 = self.game.get_state().game_variables[17]
            weapon1 = self.game.get_state().game_variables[18]
            weapon2 = self.game.get_state().game_variables[19]
            weapon3 = self.game.get_state().game_variables[20]
            weapon4 = self.game.get_state().game_variables[21]
            weapon5 = self.game.get_state().game_variables[22]
            weapon6 = self.game.get_state().game_variables[23]
            weapon7 = self.game.get_state().game_variables[24]
            weapon8 = self.game.get_state().game_variables[25]
            weapon9 = self.game.get_state().game_variables[26]
            damage_taken = self.game.get_state().game_variables[27]
            hitcount = self.game.get_state().game_variables[28]
            
            info = {"ammo0":ammo0, "ammo1":ammo1, "ammo2":ammo2, "ammo3":ammo3,
                    "ammo4":ammo4,"ammo5":ammo5,"ammo6":ammo6,"ammo7":ammo7, "ammo8":ammo8,
                    "ammo9":ammo9, "armor":armor, "health":health, "pos_x":pos_x, 
                    "pos_y":pos_y, "pos_z":pos_z, "selected_weapon":selected_weapon, 
                    "selected_weapon_ammo":selected_weapon_ammo, "weapon0":weapon0,
                    "weapon1":weapon1,"weapon2":weapon2,"weapon3":weapon3,
                    "weapon4":weapon4,"weapon5":weapon5,"weapon6":weapon6,
                    "weapon7":weapon7,"weapon8":weapon8,"weapon9":weapon9, 
                    'damage_taken':damage_taken, 'hitcount':hitcount}
            
            # Calculate rewards:
            total_damage_taken = -damage_taken + self.damage_taken
            self.damage_taken = total_damage_taken
            total_hitcount = hitcount - self.hitcount
            total_ammo = ammo0 + ammo1 + ammo2 + ammo3 + ammo4 + ammo5 + ammo6 + ammo7 + ammo8 + ammo9 - self.ammo
            self.ammo = total_ammo
            
            reward = action_reward + total_damage_taken*10 + total_hitcount*200 + total_ammo*5
            
            truncated = False
        else: 
            state = np.zeros(self.observation_space.shape)
            info = 0
            reward = 0
            truncated = True
        
        info = {"info":info}
        done = self.game.is_episode_finished()
        
        return state, reward, done, truncated, info 
    
    # Define how to render the game or environment 
    def render(): 
        pass
    
    # What happens when we start a new game 
    def reset(self):
        self.game.new_episode()
        state = self.game.get_state().screen_buffer
        info = 0
        info = {"info":info}
        #print("Reseting!")

        return self.grayscale(state), info
    
    
    # Grayscale the game frame and resize it 
    def grayscale(self, observation):
        gray = cv2.cvtColor(np.moveaxis(observation, 0, -1), cv2.COLOR_BGR2GRAY)
        resize = cv2.resize(gray, (160,120), interpolation=cv2.INTER_CUBIC)
        state = np.reshape(resize, (160,120,1))
        return state
    

    
    # Call to close down the game
    def close(self): 
        self.game.close()
        
# ENVIROMENT CHECK:        
# env = VizDoomGym(render=True)

# state = env.reset()

# env_checker.check_env(env)

# TRAIN MODEL

env = VizDoomGym(render=True)

checkpoint_callback = CheckpointCallback(save_freq=50000, save_path=CHECKPOINT_DIR, 
                                         save_replay_buffer=True, save_vecnormalize=True)
eval_callback = EvalCallback(env, best_model_save_path=CHECKPOINT_DIR, log_path=LOG_DIR, 
                             eval_freq=50000, deterministic=False, render=True, verbose=1)

callback = CallbackList([checkpoint_callback, eval_callback])

model = A2C('CnnPolicy', env, verbose=1, learning_rate=0.0001, n_steps=8192)
# model = A2C.load('/home/joaquin/TFM/Doom_RL/train/train_basic/best_model_1800000', env)
model.learn(total_timesteps=3000000, callback=callback, progress_bar=True)
model.save('vizdoom_A2C')
env.close()

MetallicaSPA · 2023-05-15T02:31:44Z

So at the moment, I think the reason might be that your .cfg or .wad files were somehow modified and, for example, now allow the agent to be killed before the episode starts

Thanks for this! I modified my cfg file and set the episode start time to 1. After 100k steps it was running smoothly.
Seems that for any reason, you can get killed sooner there than in other episodes.

mwydmuch · 2023-05-15T10:37:19Z

Happy that we've figured this out! :)

MetallicaSPA closed this as completed May 15, 2023

mwydmuch added question scenarios labels May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset error in deadly corridor: screen buffer #543

Reset error in deadly corridor: screen buffer #543

MetallicaSPA commented May 13, 2023

mwydmuch commented May 13, 2023

MetallicaSPA commented May 13, 2023 •

edited

Loading

mwydmuch commented May 13, 2023

MetallicaSPA commented May 14, 2023 •

edited

Loading

mwydmuch commented May 14, 2023

MetallicaSPA commented May 15, 2023

mwydmuch commented May 15, 2023

Reset error in deadly corridor: screen buffer #543

Reset error in deadly corridor: screen buffer #543

Comments

MetallicaSPA commented May 13, 2023

mwydmuch commented May 13, 2023

MetallicaSPA commented May 13, 2023 • edited Loading

mwydmuch commented May 13, 2023

MetallicaSPA commented May 14, 2023 • edited Loading

mwydmuch commented May 14, 2023

MetallicaSPA commented May 15, 2023

mwydmuch commented May 15, 2023

MetallicaSPA commented May 13, 2023 •

edited

Loading

MetallicaSPA commented May 14, 2023 •

edited

Loading