Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset error in deadly corridor: screen buffer #543

Closed
MetallicaSPA opened this issue May 13, 2023 · 7 comments
Closed

Reset error in deadly corridor: screen buffer #543

MetallicaSPA opened this issue May 13, 2023 · 7 comments

Comments

@MetallicaSPA
Copy link

Hello, I'm following this tutorial: https://www.youtube.com/watch?v=eBCU-tqLGfQ. I'm using stable baselines, wrapping the enviroment for gymnasium, and randomly it gives me the error: 'NoneType' object has no attribute 'screen_buffer', pointing to my reset function, which is:

def reset(self):
    self.game.new_episode()
    state = self.game.get_state().screen_buffer
    info = 0
    info = {"info":info}

    return self.grayscale(state), info

So the game should be reset. I tried with the basic.wad and defend_the_center.wad and nothing happened. What could be the issue? Any ideas? Thanks in advance.

@mwydmuch
Copy link
Collaborator

Hi @MetallicaSPA! I may need some help to fully understand what is happening. If you mean that from time to time, you get None from get_state(), then this is expected. In the original ViZDoom API get_state() will return None if the episode ends/reaches the terminal state. So you should always check if it's None or use the self.game.is_episode_finished() check.
If your problem is that self.game.new_episode() doesn't reset your episode then this is unexpected, but I would need a code sample to run to see what is happening.

Also, we now provide official wrappers for Gym and Gymnasium, so you don't need to implement them yourself! Check https://github.com/Farama-Foundation/ViZDoom/tree/master/examples/python directory for Gym, Gymnasium and StableBaselines examples.

@MetallicaSPA
Copy link
Author

MetallicaSPA commented May 13, 2023

If your problem is that self.game.new_episode() doesn't reset your episode then this is unexpected, but I would need a code sample to run to see what is happening.

That's what it seems to happen, because I tried and it happens at different steps; so I feel it's something random.
Here's the full code:

import vizdoom as vzd
import numpy as np
import cv2
import os 

from vizdoom import *
from gymnasium import Env
from gymnasium.spaces import Discrete, Box
from stable_baselines3.common.callbacks import CallbackList, EvalCallback, ProgressBarCallback, CheckpointCallback
from stable_baselines3 import A2C

DEFAULT_CONFIG = "/home/joaquin/TFM/Doom_RL/scenarios/deadly_corridor.cfg"
SCENARIO_PATH = '/home/joaquin/TFM/Doom_RL/scenarios_official/deadly_corridor.wad'
CHECKPOINT_DIR = './train/train_deadly_corridor'
LOG_DIR = './logs/log_deadly_corridor'

render = False # True will show the window while training, False don't but will make the training faster

class VizDoomGym(Env): 
    # Function that is called when we start the env
    def __init__(self, render=render): 
        # Inherit from Env
        super().__init__()
        # Setup the game 
        self.game = vzd.DoomGame()
        self.game.load_config(DEFAULT_CONFIG)
        self.game.set_doom_scenario_path(SCENARIO_PATH)
        
        self.game.set_doom_game_path("/home/joaquin/TFM/Doom_RL/DOOM2.WAD")
        self.game.set_render_hud(False)
        self.game.set_screen_resolution(vzd.ScreenResolution.RES_640X480)
        # self.game.set_screen_resolution(vzd.ScreenResolution.RES_160X120)
        # Set cv2 friendly format.
        # self.game.set_screen_format(vzd.ScreenFormat.BGR24)
        
        # Enables labeling of the in game objects.
        self.game.set_labels_buffer_enabled(True)
        # Enables depth buffer (turned off by default).
        self.game.set_depth_buffer_enabled(True)
        
        # Render frame logic
        if render == False: 
            self.game.set_window_visible(False)
        else:
            self.game.set_window_visible(True)
        
        self.game.clear_available_game_variables()
        self.game.set_available_game_variables([
                                          vzd.GameVariable.AMMO0,
                                          vzd.GameVariable.AMMO1,
                                          vzd.GameVariable.AMMO2,
                                          vzd.GameVariable.AMMO3,
                                          vzd.GameVariable.AMMO4,
                                          vzd.GameVariable.AMMO5,
                                          vzd.GameVariable.AMMO6,
                                          vzd.GameVariable.AMMO7,
                                          vzd.GameVariable.AMMO8,
                                          vzd.GameVariable.AMMO9,
                                          vzd.GameVariable.ARMOR,
                                          vzd.GameVariable.HEALTH,
                                          vzd.GameVariable.POSITION_X,
                                          vzd.GameVariable.POSITION_Y,
                                          vzd.GameVariable.POSITION_Z,
                                          vzd.GameVariable.SELECTED_WEAPON,
                                          vzd.GameVariable.SELECTED_WEAPON_AMMO,
                                          vzd.GameVariable.WEAPON0,
                                          vzd.GameVariable.WEAPON1,
                                          vzd.GameVariable.WEAPON2,
                                          vzd.GameVariable.WEAPON3,
                                          vzd.GameVariable.WEAPON4,
                                          vzd.GameVariable.WEAPON5,
                                          vzd.GameVariable.WEAPON6,
                                          vzd.GameVariable.WEAPON7,
                                          vzd.GameVariable.WEAPON8,
                                          vzd.GameVariable.WEAPON9,
                                          vzd.GameVariable.DAMAGE_TAKEN,
                                          vzd.GameVariable.HITCOUNT
                                          ])
        
        # Start the game 
        self.game.init()
        
        # Get game variables:
        self.damage_taken = 0
        self.hitcount = 0
        self.ammo = 52
    
        # Create the action space and observation space
        self.observation_space = Box(low=0, high=255, shape=(160,120,1), dtype=np.uint8)
        self.action_space = Discrete(14)
        
    # This is how we take a step in the environment
    def step(self, action):
        # Specify action and take step 
        actions = np.identity(14)
        action_reward = self.game.make_action(actions[action], 4) 
        
        # Get all the other stuff we need to return 
        if self.game.get_state(): 
            state = self.game.get_state().screen_buffer
            state = self.grayscale(state)
            
            ammo0 = self.game.get_state().game_variables[0]
            ammo1 = self.game.get_state().game_variables[1]
            ammo2= self.game.get_state().game_variables[2]
            ammo3 = self.game.get_state().game_variables[3]
            ammo4 = self.game.get_state().game_variables[4]
            ammo5 = self.game.get_state().game_variables[5]
            ammo6 = self.game.get_state().game_variables[6]
            ammo7 = self.game.get_state().game_variables[7]
            ammo8 = self.game.get_state().game_variables[8]
            ammo9 = self.game.get_state().game_variables[9]
            armor = self.game.get_state().game_variables[10]
            health = self.game.get_state().game_variables[11] 
            pos_x = self.game.get_state().game_variables[12]
            pos_y = self.game.get_state().game_variables[13]
            pos_z = self.game.get_state().game_variables[14]
            selected_weapon = self.game.get_state().game_variables[15] 
            selected_weapon_ammo = self.game.get_state().game_variables[16] 
            weapon0 = self.game.get_state().game_variables[17]
            weapon1 = self.game.get_state().game_variables[18]
            weapon2 = self.game.get_state().game_variables[19]
            weapon3 = self.game.get_state().game_variables[20]
            weapon4 = self.game.get_state().game_variables[21]
            weapon5 =self.game.get_state().game_variables[22]
            weapon6 = self.game.get_state().game_variables[23]
            weapon7 = self.game.get_state().game_variables[24]
            weapon8 = self.game.get_state().game_variables[25]
            weapon9 = self.game.get_state().game_variables[26]
            damage_taken = self.game.get_state().game_variables[27]
            hitcount = self.game.get_state().game_variables[28]
            
            info = {"ammo0":ammo0, "ammo1":ammo1, "ammo2":ammo2, "ammo3":ammo3,
                    "ammo4":ammo4,"ammo5":ammo5,"ammo6":ammo6,"ammo7":ammo7, "ammo8":ammo8,
                    "ammo9":ammo9, "armor":armor, "health":health, "pos_x":pos_x, 
                    "pos_y":pos_y, "pos_z":pos_z, "selected_weapon":selected_weapon, 
                    "selected_weapon_ammo":selected_weapon_ammo, "weapon0":weapon0,
                    "weapon1":weapon1,"weapon2":weapon2,"weapon3":weapon3,
                    "weapon4":weapon4,"weapon5":weapon5,"weapon6":weapon6,
                    "weapon7":weapon7,"weapon8":weapon8,"weapon9":weapon9, 
                    'damage_taken':damage_taken, 'hitcount':hitcount}
            
            # Calculate rewards:
            total_damage_taken = -damage_taken + self.damage_taken
            self.damage_taken = total_damage_taken
            total_hitcount = hitcount - self.hitcount
            total_ammo = ammo0 + ammo1 + ammo2 + ammo3 + ammo4 + ammo5 + ammo6 + ammo7 + ammo8 + ammo9 - self.ammo
            self.ammo = total_ammo
            
            reward = action_reward + total_damage_taken*10 + total_hitcount*200 + total_ammo*5
            
            truncated = False
        else: 
            state = np.zeros(self.observation_space.shape)
            info = 0
            reward = 0
            truncated = True
        
        info = {"info":info}
        done = self.game.is_episode_finished()
        
        return state, reward, done, truncated, info 
    
    # Define how to render the game or environment 
    def render(): 
        pass
    
    # What happens when we start a new game 
    def reset(self):
        self.game.new_episode()
        state = self.game.get_state().screen_buffer
        info = 0
        info = {"info":info}

        return self.grayscale(state), info
    
    
    # Grayscale the game frame and resize it 
    def grayscale(self, observation):
        gray = cv2.cvtColor(np.moveaxis(observation, 0, -1), cv2.COLOR_BGR2GRAY)
        resize = cv2.resize(gray, (160,120), interpolation=cv2.INTER_CUBIC)
        state = np.reshape(resize, (160,120,1))
        return state
    

    
    # Call to close down the game
    def close(self): 
        self.game.close()
        
# ENVIROMENT CHECK:        
# env = VizDoomGym(render=True)

# state = env.reset()

# env_checker.check_env(env)

# TRAIN MODEL

env = VizDoomGym()

checkpoint_callback = CheckpointCallback(save_freq=50000, save_path=CHECKPOINT_DIR, 
                                         save_replay_buffer=True, save_vecnormalize=True)
eval_callback = EvalCallback(env, best_model_save_path=CHECKPOINT_DIR, log_path=LOG_DIR, 
                             eval_freq=50000, deterministic=False, render=True, verbose=1)

callback = CallbackList([checkpoint_callback, eval_callback])

model = A2C('CnnPolicy', env, tensorboard_log=LOG_DIR, verbose=1, learning_rate=0.0001, n_steps=8192)
# model = A2C.load('/home/joaquin/TFM/Doom_RL/train/train_basic/best_model_1800000', env)
model.learn(total_timesteps=3000000, callback=callback, progress_bar=True)
model.save('vizdoom_A2C')
env.close()`

@mwydmuch
Copy link
Collaborator

How often does it happen? I'm running your code using Stable-Baselines3 2.0.0a5 alpha (one with Gymnasium support), installed in the following way:

pip install "sb3_contrib>=2.0.0a1" --upgrade
pip install "stable_baselines3>=2.0.0a1" --upgrade

and I don't see any problem with the reset method after 200k timesteps. I'm afraid I will need more details to help you. Details about your environment, and detailed instructions on how to reproduce the problem (and how it occurs).

@MetallicaSPA
Copy link
Author

MetallicaSPA commented May 14, 2023

How often does it happen?

It happens every time I ran that enviroment, usually before 50k steps. Never happened with basic or defend the center.
Info about my enviroment:

I'm running everything in Linux Mint 21.1 Vera, under Anaconda using Spyder IDE.
Vizdoom version: 1.2.0
Gymnasium version: 0.26.3
Stable-baselines3 version_ 2.0.0a5

Let me know if you need any more information about my enviroment.

EDIT: Updated Gymnasium to 0.28.1, still getting the same problem.
Here's the traceback:

File ~/anaconda3/lib/python3.9/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
exec(code, globals, locals)

File ~/TFM/Doom_RL/vizdoom_A2C.py:248
model.learn(total_timesteps=3000000, callback=callback, progress_bar=True)

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/a2c/a2c.py:194 in learn
return super().learn(

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py:259 in learn
continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py:178 in collect_rollouts
new_obs, rewards, dones, infos = env.step(clipped_actions)

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/vec_env/base_vec_env.py:171 in step
return self.step_wait()

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/vec_env/vec_transpose.py:95 in step_wait
observations, rewards, dones, infos = self.venv.step_wait()

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py:69 in step_wait
obs, self.reset_infos[env_idx] = self.envs[env_idx].reset()

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/monitor.py:83 in reset
return self.env.reset(**kwargs)

File ~/TFM/Doom_RL/vizdoom_A2C.py:208 in reset
state = self.game.get_state().screen_buffer

AttributeError: 'NoneType' object has no attribute 'screen_buffer'

@mwydmuch
Copy link
Collaborator

@MetallicaSPA, I replicated your environment and ran a slightly modified script (I attached the modified version below). I've just changed paths to config/log/model files. After 3mln of timesteps, no error. Checked deathmatch and deadly corridor environments.

So at the moment, I think the reason might be that your .cfg or .wad files were somehow modified and, for example, now allow the agent to be killed before the episode starts. This is, for example, possible if the episode's start_time in the config is set to a large number. If you are sure that your .cfg/.wad files were not modified, then I will need to ask you to prepare a docker file that I can run to replicate the problem.

import vizdoom as vzd
import numpy as np
import cv2
import os 

from vizdoom import *
from gymnasium import Env
from gymnasium.spaces import Discrete, Box
from stable_baselines3.common.callbacks import CallbackList, EvalCallback, ProgressBarCallback, CheckpointCallback
from stable_baselines3 import A2C

SCENARIO = "deadly_corridor"
DEFAULT_CONFIG = os.path.join(scenarios_path, f"{SCENARIO}.cfg")
CHECKPOINT_DIR = f'./vizdoom_train/train_{SCENARIO}'
LOG_DIR = f'./vizdoom_logs/log_{SCENARIO}'

render = False # True will show the window while training, False don't but will make the training faster

class VizDoomGym(Env): 
    # Function that is called when we start the env
    def __init__(self, render=render): 
        # Inherit from Env
        super().__init__()
        # Setup the game
        self.game = vzd.DoomGame()
        self.game.load_config(DEFAULT_CONFIG)
        
        self.game.set_doom_game_path("doom2.wad")
        self.game.set_render_hud(False)
        #self.game.set_screen_resolution(vzd.ScreenResolution.RES_640X480)
        self.game.set_screen_resolution(vzd.ScreenResolution.RES_160X120)
        # Set cv2 friendly format.
        # self.game.set_screen_format(vzd.ScreenFormat.BGR24)
        
        # Enables labeling of the in game objects.
        self.game.set_labels_buffer_enabled(True)
        # Enables depth buffer (turned off by default).
        self.game.set_depth_buffer_enabled(True)
        
        # Render frame logic
        if render == False: 
            self.game.set_window_visible(False)
        else:
            self.game.set_window_visible(True)
        
        self.game.clear_available_game_variables()
        self.game.set_available_game_variables([
                                          vzd.GameVariable.AMMO0,
                                          vzd.GameVariable.AMMO1,
                                          vzd.GameVariable.AMMO2,
                                          vzd.GameVariable.AMMO3,
                                          vzd.GameVariable.AMMO4,
                                          vzd.GameVariable.AMMO5,
                                          vzd.GameVariable.AMMO6,
                                          vzd.GameVariable.AMMO7,
                                          vzd.GameVariable.AMMO8,
                                          vzd.GameVariable.AMMO9,
                                          vzd.GameVariable.ARMOR,
                                          vzd.GameVariable.HEALTH,
                                          vzd.GameVariable.POSITION_X,
                                          vzd.GameVariable.POSITION_Y,
                                          vzd.GameVariable.POSITION_Z,
                                          vzd.GameVariable.SELECTED_WEAPON,
                                          vzd.GameVariable.SELECTED_WEAPON_AMMO,
                                          vzd.GameVariable.WEAPON0,
                                          vzd.GameVariable.WEAPON1,
                                          vzd.GameVariable.WEAPON2,
                                          vzd.GameVariable.WEAPON3,
                                          vzd.GameVariable.WEAPON4,
                                          vzd.GameVariable.WEAPON5,
                                          vzd.GameVariable.WEAPON6,
                                          vzd.GameVariable.WEAPON7,
                                          vzd.GameVariable.WEAPON8,
                                          vzd.GameVariable.WEAPON9,
                                          vzd.GameVariable.DAMAGE_TAKEN,
                                          vzd.GameVariable.HITCOUNT
                                          ])
        
        # Start the game 
        self.game.init()
        
        # Get game variables:
        self.damage_taken = 0
        self.hitcount = 0
        self.ammo = 52
    
        # Create the action space and observation space
        self.observation_space = Box(low=0, high=255, shape=(160,120,1), dtype=np.uint8)
        self.action_space = Discrete(14)
        
    # This is how we take a step in the environment
    def step(self, action):
        # Specify action and take step 
        actions = np.identity(14)
        action_reward = self.game.make_action(actions[action], 4) 
        
        # Get all the other stuff we need to return 
        if self.game.get_state(): 
            state = self.game.get_state().screen_buffer
            state = self.grayscale(state)
            
            ammo0 = self.game.get_state().game_variables[0]
            ammo1 = self.game.get_state().game_variables[1]
            ammo2 = self.game.get_state().game_variables[2]
            ammo3 = self.game.get_state().game_variables[3]
            ammo4 = self.game.get_state().game_variables[4]
            ammo5 = self.game.get_state().game_variables[5]
            ammo6 = self.game.get_state().game_variables[6]
            ammo7 = self.game.get_state().game_variables[7]
            ammo8 = self.game.get_state().game_variables[8]
            ammo9 = self.game.get_state().game_variables[9]
            armor = self.game.get_state().game_variables[10]
            health = self.game.get_state().game_variables[11] 
            pos_x = self.game.get_state().game_variables[12]
            pos_y = self.game.get_state().game_variables[13]
            pos_z = self.game.get_state().game_variables[14]
            selected_weapon = self.game.get_state().game_variables[15] 
            selected_weapon_ammo = self.game.get_state().game_variables[16] 
            weapon0 = self.game.get_state().game_variables[17]
            weapon1 = self.game.get_state().game_variables[18]
            weapon2 = self.game.get_state().game_variables[19]
            weapon3 = self.game.get_state().game_variables[20]
            weapon4 = self.game.get_state().game_variables[21]
            weapon5 = self.game.get_state().game_variables[22]
            weapon6 = self.game.get_state().game_variables[23]
            weapon7 = self.game.get_state().game_variables[24]
            weapon8 = self.game.get_state().game_variables[25]
            weapon9 = self.game.get_state().game_variables[26]
            damage_taken = self.game.get_state().game_variables[27]
            hitcount = self.game.get_state().game_variables[28]
            
            info = {"ammo0":ammo0, "ammo1":ammo1, "ammo2":ammo2, "ammo3":ammo3,
                    "ammo4":ammo4,"ammo5":ammo5,"ammo6":ammo6,"ammo7":ammo7, "ammo8":ammo8,
                    "ammo9":ammo9, "armor":armor, "health":health, "pos_x":pos_x, 
                    "pos_y":pos_y, "pos_z":pos_z, "selected_weapon":selected_weapon, 
                    "selected_weapon_ammo":selected_weapon_ammo, "weapon0":weapon0,
                    "weapon1":weapon1,"weapon2":weapon2,"weapon3":weapon3,
                    "weapon4":weapon4,"weapon5":weapon5,"weapon6":weapon6,
                    "weapon7":weapon7,"weapon8":weapon8,"weapon9":weapon9, 
                    'damage_taken':damage_taken, 'hitcount':hitcount}
            
            # Calculate rewards:
            total_damage_taken = -damage_taken + self.damage_taken
            self.damage_taken = total_damage_taken
            total_hitcount = hitcount - self.hitcount
            total_ammo = ammo0 + ammo1 + ammo2 + ammo3 + ammo4 + ammo5 + ammo6 + ammo7 + ammo8 + ammo9 - self.ammo
            self.ammo = total_ammo
            
            reward = action_reward + total_damage_taken*10 + total_hitcount*200 + total_ammo*5
            
            truncated = False
        else: 
            state = np.zeros(self.observation_space.shape)
            info = 0
            reward = 0
            truncated = True
        
        info = {"info":info}
        done = self.game.is_episode_finished()
        
        return state, reward, done, truncated, info 
    
    # Define how to render the game or environment 
    def render(): 
        pass
    
    # What happens when we start a new game 
    def reset(self):
        self.game.new_episode()
        state = self.game.get_state().screen_buffer
        info = 0
        info = {"info":info}
        #print("Reseting!")

        return self.grayscale(state), info
    
    
    # Grayscale the game frame and resize it 
    def grayscale(self, observation):
        gray = cv2.cvtColor(np.moveaxis(observation, 0, -1), cv2.COLOR_BGR2GRAY)
        resize = cv2.resize(gray, (160,120), interpolation=cv2.INTER_CUBIC)
        state = np.reshape(resize, (160,120,1))
        return state
    

    
    # Call to close down the game
    def close(self): 
        self.game.close()
        
# ENVIROMENT CHECK:        
# env = VizDoomGym(render=True)

# state = env.reset()

# env_checker.check_env(env)

# TRAIN MODEL

env = VizDoomGym(render=True)

checkpoint_callback = CheckpointCallback(save_freq=50000, save_path=CHECKPOINT_DIR, 
                                         save_replay_buffer=True, save_vecnormalize=True)
eval_callback = EvalCallback(env, best_model_save_path=CHECKPOINT_DIR, log_path=LOG_DIR, 
                             eval_freq=50000, deterministic=False, render=True, verbose=1)

callback = CallbackList([checkpoint_callback, eval_callback])

model = A2C('CnnPolicy', env, verbose=1, learning_rate=0.0001, n_steps=8192)
# model = A2C.load('/home/joaquin/TFM/Doom_RL/train/train_basic/best_model_1800000', env)
model.learn(total_timesteps=3000000, callback=callback, progress_bar=True)
model.save('vizdoom_A2C')
env.close()

@MetallicaSPA
Copy link
Author

So at the moment, I think the reason might be that your .cfg or .wad files were somehow modified and, for example, now allow the agent to be killed before the episode starts

Thanks for this! I modified my cfg file and set the episode start time to 1. After 100k steps it was running smoothly.
Seems that for any reason, you can get killed sooner there than in other episodes.

@mwydmuch
Copy link
Collaborator

Happy that we've figured this out! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants