# Reinforcement learning with Nethack

NetHack is a single-player adventure game that was introduced in 1987 based on the Dungeons & Dragons roleplaying game. you start the game your hero in a dungeon, with the goal to descend (50 levels, all procedurally generated, so all unique in every game) to retrieve the Amulet of Yendor and complete quests to offer it to your in-game deity. You can't save a game, if you die you have to start all over again.

This game has been made available for Reinforcement Learning training, the NetHack Learning Environment (NLE). A NetHack Challenge competition has been launched for NeurIPS 2021 in which teams will compete to build the best AI agents to play the game of NetHack. 

See website: https://nethackchallenge.com/  
Competition leadership board: https://www.aicrowd.com/challenges/neurips-2021-the-nethack-challenge/leaderboards  

</br>
<img src="https://github.com/facebookresearch/nle/raw/master/dat/nle/example_run.gif">

Source picture: [facebook github](https://github.com/facebookresearch/nle)






# First play the Nethack game yourself

Immerse yourself in the Nethack challenge by playing the game for yourself. This is the best way to get the first insight into the challenge. Although the game interface is "retro" (actually one of the reasons why I like it), the game itself has an enormous depth. Please watch this [video](https://www.youtube.com/watch?v=SjuTyJlgLJ8&t=329s), and I think you can appreciate the AI challenge around this game but also the challenges you face as a player. Have fun!

Link to the game: https://alt.org/nethack/

Quick start:  
You: @ symbol  
Movement: 123456789 (keypad numlock should be on)  
Pickup an object: ,    
Search (hidden doors): s  
Fight: F (followed by a direction)  
Fire (ranged fight): f (followed by a direction)  
Help: h  
what-is: / (allows cursor to move over map to get desciprions of all the symbols)  


Link to the wiki: https://nethackwiki.com/wiki/

And if you are willing to spend 2.49 eur you can buy vulture for Nethack on Steam. Wrapping the original ASCII text dungeons of NetHack, it brings this game into the graphical point and click world while still preserving the original keyboard commands for those wishing to play as it was intended.


# NetHack Learning Environment (NLE)
The NLE is was first presented at NeurIPS 2020 (see [paper](https://arxiv.org/abs/2006.13760)) and an extended version is now used for a challenging competition in NeurIPS 2021. 
+ The github page for NLE in this [link](https://github.com/facebookresearch/nle)
+ NeurIPS nethack starterkit: https://gitlab.aicrowd.com/nethack/neurips-2021-the-nethack-challenge  
+ It also includes a Colab that helped me tremendously to get started: https://www.aicrowd.com/showcase/introtonethack
+ TorchBeast is a baseline model for the NetHack Challenge based on
TorchBeast - with all the code you need to train, run and submit a model.: https://gitlab.aicrowd.com/nethack/neurips-2021-the-nethack-challenge/-/tree/master/nethack_baselines/torchbeast 



# Gym  
NLE is loaded as a gym environment, with all the typical functions that reinforcement learning (RL) researchers will be familiar with. 
Gym is released by Elon Musk-founded research lab OpenAI  (http://gym.openai.com/docs/). It is a toolkit for developing and comparing reinforcement learning algorithms.
</br>
<img src="https://i.imgur.com/ria9HOm.jpg">

Source: [OpenAI](https://openai.com/)


There are many many games made available, among others a whole host of Atari2600 games.

In [None]:
# @title Setup NLE

%%capture
!pip install -U cmake
!apt update -qq && apt install -qq -y flex bison libbz2-dev libglib2.0 libsm6 libxext6
!pip install -U pip
!pip install git+https://github.com/facebookresearch/nle.git@eric/notebook-render  # this can render notebooks
#!pip install git+https://github.com/facebookresearch/nle.git # latest version

In [None]:
# @title Import all we need
import matplotlib.pyplot as plt
import gym
import nle
import numpy as np
import random 
from nle import nethack as nh
#Google colab tools
from google.colab import files # To handle files and eg export to your browser
import glob # To handle files and eg export to your browser
from google.colab import drive # Mount your Google drive

In [None]:
# The way to import/export files from google colab to you google drive

drive.mount('/content/drive')

#Test it
%ls '/content/drive/My Drive/nethack/' 


Mounted at /content/drive
'Kopie van NetHackTutorial.ipynb'  'Nethack RL'   nle.63.0.ttyrec.bz2


In [None]:
# @title make environment and Show a game screen
env = gym.make("NetHackChallenge-v0", savedir=None) 
env.reset()  # each reset generates a new dungeon
env.render('notebook') # show the start game screen
obs, reward, done, info = env.step(0) # move N
env.render('notebook') # show the game screen


[0;37;40mH[0;37;40me[0;37;40ml[0;37;40ml[0;37;40mo[0;30;40m [0;37;40mA[0;37;40mg[0;37;40me[0;37;40mn[0;37;40mt[0;37;40m,[0;30;40m [0;37;40mw[0;37;40me[0;37;40ml[0;37;40mc[0;37;40mo[0;37;40mm[0;37;40me[0;30;40m [0;37;40mt[0;37;40mo[0;30;40m [0;37;40mN[0;37;40me[0;37;40mt[0;37;40mH[0;37;40ma[0;37;40mc[0;37;40mk[0;37;40m![0;30;40m [0;30;40m [0;37;40mY[0;37;40mo[0;37;40mu[0;30;40m [0;37;40ma[0;37;40mr[0;37;40me[0;30;40m [0;37;40ma[0;30;40m [0;37;40mn[0;37;40me[0;37;40mu[0;37;40mt[0;37;40mr[0;37;40ma[0;37;40ml[0;30;40m [0;37;40mf[0;37;40me[0;37;40mm[0;37;40ma[0;37;40ml[0;37;40me[0;30;40m [0;37;40mh[0;37;40mu[0;37;40mm[0;37;40ma[0;37;40mn[0;30;40m [0;37;40mM[0;37;40mo[0;37;40mn[0;37;40mk[0;37;40m.[0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m 
[0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;4

## Nethack action space

Here is the link to the code: 
https://github.com/facebookresearch/nle/blob/master/nle/nethack/actions.py

Here is a nice overview in a table format:
https://gist.github.com/HanClinto/310bc189dcb34b9628d5151b168a34b0#actions-env_actions

In [None]:
# Let's investigate the action space and see all the commands possible
print("Action space:", env.action_space)
print()
for i, a in enumerate(env._actions):
    print(f'action {i} is ', a, f' (= keypress {a})')


Action space: Discrete(113)

action 0 is  CompassDirection.N  (= keypress 107)
action 1 is  CompassDirection.E  (= keypress 108)
action 2 is  CompassDirection.S  (= keypress 106)
action 3 is  CompassDirection.W  (= keypress 104)
action 4 is  CompassDirection.NE  (= keypress 117)
action 5 is  CompassDirection.SE  (= keypress 110)
action 6 is  CompassDirection.SW  (= keypress 98)
action 7 is  CompassDirection.NW  (= keypress 121)
action 8 is  CompassDirectionLonger.N  (= keypress 75)
action 9 is  CompassDirectionLonger.E  (= keypress 76)
action 10 is  CompassDirectionLonger.S  (= keypress 74)
action 11 is  CompassDirectionLonger.W  (= keypress 72)
action 12 is  CompassDirectionLonger.NE  (= keypress 85)
action 13 is  CompassDirectionLonger.SE  (= keypress 78)
action 14 is  CompassDirectionLonger.SW  (= keypress 66)
action 15 is  CompassDirectionLonger.NW  (= keypress 89)
action 16 is  MiscDirection.UP  (= keypress 60)
action 17 is  MiscDirection.DOWN  (= keypress 62)
action 18 is  MiscDi

In [None]:
# Check commands with multiple lines, e.g.  a command with a direction as next key 
obs, reward, done, info = env.step(48) # kick
env.render('notebook') # show the game screen
obs, reward, done, info = env.step(0) # kick direction
env.render('notebook') # show the game screen





[0;37;40mI[0;37;40mn[0;30;40m [0;37;40mw[0;37;40mh[0;37;40ma[0;37;40mt[0;30;40m [0;37;40md[0;37;40mi[0;37;40mr[0;37;40me[0;37;40mc[0;37;40mt[0;37;40mi[0;37;40mo[0;37;40mn[0;37;40m?[0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m 
[0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;4

In [None]:
# Check a multi-line commands with selections from a list
env.reset()  # each reset generates a new dungeon
obs, reward, done, info = env.step(33) # drop
env.render('notebook') # show the game screen
obs, reward, done, info = env.step(6) # choice b
env.render('notebook') # show the game screen

# selections are normally from lower case a-z characters, which are entered by the corresponding action for that character e.g.
# a - 24 - apply
# b - 6 - move SW
# c - 30 - close

# Sort the list to show a-z
actionlist = []
for i, a in enumerate(env._actions):
  if a >= 97 and a<= 122:
     actionlist.append([a,i])
actionlist.sort()

print()
print('| Char | Action_nr | action ')
print('| ---- | ---------------------- ')
for a, i in actionlist:
        print("|", chr(a), "   |", i, "      | ", a)



[0;37;40mW[0;37;40mh[0;37;40ma[0;37;40mt[0;30;40m [0;37;40md[0;37;40mo[0;30;40m [0;37;40my[0;37;40mo[0;37;40mu[0;30;40m [0;37;40mw[0;37;40ma[0;37;40mn[0;37;40mt[0;30;40m [0;37;40mt[0;37;40mo[0;30;40m [0;37;40md[0;37;40mr[0;37;40mo[0;37;40mp[0;37;40m?[0;30;40m [0;37;40m[[0;37;40ma[0;37;40m-[0;37;40mf[0;30;40m [0;37;40mo[0;37;40mr[0;30;40m [0;37;40m?[0;37;40m*[0;37;40m][0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m 
[0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;4

In [None]:
# to get ride of the --More-- and get to the next line 
env.reset()  # each reset generates a new dungeon
obs, reward, done, info = env.step(0) # move
print ('Is screen below waiting on space e.g. when --more-- is shown:',obs['misc'][2])
env.render('notebook') # show the game screen
obs, reward, done, info = env.step(44) # inventory
print ('Is screen below waiting on space e.g. when --more-- is shown:',obs['misc'][2])
env.render('notebook') # show the game screen
obs, reward, done, info = env.step(36) # ESC to get ride of the --More-- and get to the next line
print ('Is screen below waiting on space e.g. when --more-- is shown:',obs['misc'][2])
env.render('notebook') # show the game screen

Is screen below waiting on space e.g. when --more-- is shown: 0

[0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m 
[0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m

## Nethack observation space

In [None]:
# Show the breakdown of the observation space
obs = env.reset()  # each reset generates a new dungeon
print("Observation space:", env.observation_space)

obs.keys()

Observation space: Dict(blstats:Box(-2147483648, 2147483647, (25,), int64), chars:Box(0, 255, (21, 79), uint8), colors:Box(0, 15, (21, 79), uint8), glyphs:Box(0, 5976, (21, 79), int16), inv_glyphs:Box(0, 5976, (55,), int16), inv_letters:Box(0, 127, (55,), uint8), inv_oclasses:Box(0, 18, (55,), uint8), inv_strs:Box(0, 255, (55, 80), uint8), message:Box(0, 255, (256,), uint8), misc:Box(-2147483648, 2147483647, (3,), int32), specials:Box(0, 255, (21, 79), uint8), tty_chars:Box(0, 255, (24, 80), uint8), tty_colors:Box(0, 31, (24, 80), int8), tty_cursor:Box(0, 255, (2,), uint8))


dict_keys(['glyphs', 'chars', 'colors', 'specials', 'blstats', 'message', 'inv_glyphs', 'inv_strs', 'inv_letters', 'inv_oclasses', 'tty_chars', 'tty_colors', 'tty_cursor', 'misc'])

In [None]:
# Show the inventory of the agent ('inv_glyphs', 'inv_strs', 'inv_letters', 'inv_oclasses')
def print_inventory(obs):
  for let, glyph, strs, oclass in zip(
      obs['inv_letters'], obs['inv_glyphs'], obs['inv_strs'], obs['inv_oclasses']):
    
      l = chr(let)
      desc = bytes(strs).decode('utf-8').replace('\0','')
      if let:
          print('In slot (%s) - glyph: %d, (class %d) - "%s"' % (l, glyph, oclass, desc))


#Show the inventory
print_inventory(obs)

In slot ($) - glyph: 2316, (class 12) - "1133 gold pieces"
In slot (a) - glyph: 1928, (class 2) - "a +0 scalpel (weapon in hand)"
In slot (b) - glyph: 2043, (class 3) - "an uncursed +1 pair of leather gloves (being worn)"
In slot (c) - glyph: 2118, (class 6) - "an uncursed stethoscope"
In slot (d) - glyph: 2195, (class 8) - "3 uncursed potions of healing"
In slot (e) - glyph: 2195, (class 8) - "a blessed potion of healing"
In slot (f) - glyph: 2189, (class 8) - "4 uncursed potions of extra healing"
In slot (g) - glyph: 2293, (class 11) - "a wand of sleep (0:5)"
In slot (h) - glyph: 2262, (class 10) - "a blessed spellbook of healing"
In slot (i) - glyph: 2283, (class 10) - "a blessed spellbook of extra healing"
In slot (j) - glyph: 2272, (class 10) - "a blessed spellbook of stone to flesh"
In slot (k) - glyph: 2158, (class 7) - "6 uncursed apples"


In [None]:
# Show the stats of the agent ('blstats')
def print_stats(obs):
  bl_meaning = [
    'hero col', 'hero_row', 'strength_pct', 'strength', 'dexterity', 'constitution', 
    'intelligence', 'wisdom', 'charisma', 'score', 'hitpoints', 'max_hitpoints', 'depth', 
    'gold', 'energy', 'max_energy', 'armor_class', 'monster_level', 'experience_level', 
    'experience_points', 'time', 'hunger_state', 'carrying_capacity', 'dungeon_number', 'level_number']

  print('BL STATS')
  print(' '.join(["%s: %d" % (m,s) for m, s in zip(bl_meaning, obs['blstats'])]))

# Show the message to the agent ('message')
def print_message(obs):
  print ('MESSAGE')
  print(bytes(obs['message']).decode('ascii').replace('\0',''))

# Show the character stats
print_stats(obs)

#Show the message
print()
print_message(obs)


BL STATS
hero col: 74 hero_row: 3 strength_pct: 14 strength: 14 dexterity: 10 constitution: 15 intelligence: 11 wisdom: 13 charisma: 16 score: 0 hitpoints: 12 max_hitpoints: 12 depth: 1 gold: 1133 energy: 5 max_energy: 5 armor_class: 8 monster_level: 0 experience_level: 1 experience_points: 0 time: 1 hunger_state: 1 carrying_capacity: 0 dungeon_number: 0 level_number: 1

MESSAGE
Hello Agent, welcome to NetHack!  You are a neutral female gnomish Healer.


The 'glyphs', 'chars', 'colors' contain the map of the dungeon on the current level. It is a map of 21 rows by 79 columns, landscape. The centerpart of the screen excluding the top message line and bottom stat line.   
<centre>
<img src="https://nethackwiki.com/mediawiki/images/c/ce/RogueIBM.png" width=300>
</centre>

Hence `obs['chars'][row, column]` delivers the info on each tile where 0 <= row <=20 and 0 <= column <= 78.

| Key | Shape | Min | Max | DType | Description |
| --- | ----- | --- | --- | ----- | ----------- |
| `'chars'` | [21, 79] | 0 | 255 | uint8 | Characters used on the map, is used when the screen is rendered  |
| `'colors'` | [21, 79] | 0 | 15 | uint8 | Colors of the characters on the map, is used when the screen is rendered |
| `'glyphs'` | [21, 79] | 0 | 5976 | int16 | The map described in glyphs representing the specific object unique identification number. This is what you want to be using in your machine learning. however note only one glyph is given even if multiple objects are on the tile. |




In [None]:
# Show the map ('glyphs', 'chars', 'colors')
env = gym.make("NetHackChallenge-v0", savedir=None) 
obs=env.reset()  # each reset generates a new dungeon
print ('The shape of the map: ', obs['chars'].shape)
print ('Print the chars array as example')
print (obs['chars'])
print()
print ('The position of the agent is row:', obs['blstats'][1], ' and column: ', obs['blstats'][0] )
env.render('notebook') # show the game screen
print('The character on the map tile on which our agent is standing: ',chr(obs['chars'][obs['blstats'][1],obs['blstats'][0]]))
print('The color on the map tile on which our agent is standing: ',obs['colors'][obs['blstats'][1],obs['blstats'][0]])
standingon=obs['glyphs'][obs['blstats'][1],obs['blstats'][0]]


obj_classes = {getattr(nh, x): x for x in dir(nh) if x.endswith('_CLASS')}
glyph_classes = sorted((getattr(nh, x), x) for x in dir(nh) if x.endswith('_OFF'))


def glyph_desc(i):
    # quick hack from glyph to description for monsters and objects
    # To-do, full encoder, look into encode the dungeon into a fixed-size representation (GlyphEncoder)
    desc = ''
    if glyph_classes and i == glyph_classes[0][0]:
        cls = glyph_classes.pop(0)[1]
    
    if nh.glyph_is_monster(i):
        desc = f': "{nh.permonst(nh.glyph_to_mon(i)).mname}"'
    
    if nh.glyph_is_normal_object(i):
        obj = nh.objclass(nh.glyph_to_obj(i))
        appearance = nh.OBJ_DESCR(obj) or nh.OBJ_NAME(obj) 
        oclass = ord(obj.oc_class)
        desc = f': {obj_classes[oclass]}: "{appearance}"'
    
    return (desc )
    
print('The glypth on the map tile on which our agent is standing: ',standingon,' with description:', glyph_desc(standingon))
print()
print('Or to see around you')
print('To the west you see (monster/object):', glyph_desc(obs['glyphs'][obs['blstats'][1],obs['blstats'][0]-1]))
print('To the east you see (monster/object):', glyph_desc(obs['glyphs'][obs['blstats'][1],obs['blstats'][0]+1]))
print('To the north you see (monster/object):', glyph_desc(obs['glyphs'][obs['blstats'][1]-1,obs['blstats'][0]]))
print('To the south you see (monster/object):', glyph_desc(obs['glyphs'][obs['blstats'][1]+1,obs['blstats'][0]]))


The shape of the map:  (21, 79)
Print the chars array as example
[[32 32 32 ... 32 32 32]
 [32 32 32 ... 32 32 32]
 [32 32 32 ... 32 32 32]
 ...
 [32 32 32 ... 32 32 32]
 [32 32 32 ... 32 32 32]
 [32 32 32 ... 32 32 32]]

The position of the agent is row: 15  and column:  35

[0;37;40mT[0;37;40mh[0;37;40me[0;37;40mr[0;37;40me[0;30;40m [0;37;40mi[0;37;40ms[0;30;40m [0;37;40ma[0;30;40m [0;37;40ms[0;37;40mt[0;37;40ma[0;37;40mi[0;37;40mr[0;37;40mc[0;37;40ma[0;37;40ms[0;37;40me[0;30;40m [0;37;40mu[0;37;40mp[0;30;40m [0;37;40mh[0;37;40me[0;37;40mr[0;37;40me[0;37;40m.[0;30;40m [0;30;40m [0;37;40mY[0;37;40mo[0;37;40mu[0;30;40m [0;37;40ms[0;37;40me[0;37;40me[0;30;40m [0;37;40mh[0;37;40me[0;37;40mr[0;37;40me[0;30;40m [0;37;40ma[0;30;40m [0;37;40mb[0;37;40mo[0;37;40mw[0;37;40m.[0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;4

# Random agent

In [None]:
obs = env.reset()
steps=0
tot_rew=0

print_stats(obs)
print_inventory(obs)
print_message(obs)
env.render('notebook')


actionlist = [0,1,2,3,4,5,6,7,61, 17] 
dict = {0: 'N', 1: 'E',2: 'S',3: 'W',4: 'NE',5: 'SE',6: 'SW',7: 'NW',35: 'EAT',61: 'PICKUP',17: 'DOWN'}
while True:
    # action=env.action_space.sample() # select any random action
    if obs['misc'][2]:
      #print(prev_message) # forensics to figure out the message before the special condition
      #print_message(obs)
      action = 99 # escape
    elif obs['misc'][1]:
      #print('waiting the input to a line')
      #print(prev_message) # forensics to figure out the message before the special condition
      #print_message(obs)
      ;
    elif obs['misc'][0]:
      action = 0 # Let's answer No to be safe
    else:
      action = random.sample(actionlist, 1)[0] # select random actions form a list
    prev_message = bytes(obs['message']).decode('ascii').replace('\0','')
    obs, rew, done, info = env.step(action) # execute the action and see the results
    steps += 1 # Keep track of the steps
    tot_rew += rew # keep track of cumulative reward
    if steps<5:
      print('action ', dict[action])
      env.render('notebook') # Showcase a first few gamescreens
    if done:
        break

print('GAME ENDED')
print('Rewards:', tot_rew)
print('Steps:', steps)
print_stats(obs)
print_inventory(obs)
env.render('notebook')



BL STATS
hero col: 5 hero_row: 18 strength_pct: 15 strength: 15 dexterity: 9 constitution: 15 intelligence: 14 wisdom: 14 charisma: 8 score: 0 hitpoints: 14 max_hitpoints: 14 depth: 1 gold: 0 energy: 3 max_energy: 3 armor_class: 7 monster_level: 0 experience_level: 1 experience_points: 0 time: 1 hunger_state: 1 carrying_capacity: 0 dungeon_number: 0 level_number: 1
In slot (a) - glyph: 1923, (class 2) - "a +1 dagger (weapon in hand)"
In slot (b) - glyph: 1976, (class 2) - "a +1 crossbow (alternate weapon; not wielded)"
In slot (c) - glyph: 1912, (class 2) - "51 +2 crossbow bolts (in quiver pouch)"
In slot (d) - glyph: 1912, (class 2) - "32 +0 crossbow bolts"
In slot (e) - glyph: 2032, (class 3) - "an uncursed +2 cloak of displacement (being worn)"
In slot (f) - glyph: 2173, (class 7) - "4 uncursed cram rations"
MESSAGE
Hello Agent, welcome to NetHack!  You are a neutral male gnomish Ranger.

[0;37;40mH[0;37;40me[0;37;40ml[0;37;40ml[0;37;40mo[0;30;40m [0;37;40mA[0;37;40mg[0;37;

This is already fun to see the results. As expected, not to many points, random movement is not going to win this game
+ 17512 steps, 43 points, Agent-Sam-Hum-Mal-Law starved to death in the Dungeons of Doom on level 1.
+ 78 steps, 0 points,  Agent-Rog-Hum-Fem-Cha died in The Dungeons of Doom on level 1.  Killed by a sewer rat.
+ 9599 steps, 126 points, Agent-Sam-Hum-Fem-Law died in The Dungeons of Doom on level 1.  Killed by a newt, while fainted from lack of food. 
+ 469 steps, 0 points,  Agent-Mon-Hum-Fem-Cha choked on her food in The Dungeons of Doom on level 1.  Choked on a food ration.
+ 29068 steps, 115 points,  Agent-Bar-Hum-Mal-Neu quit in The Dungeons of Doom on level 2.

# Watch your replays
You need be able to inspect the results, statistics are good but actually seeing how the agent behaves sometimes is better. NLE can write "ttyrec2" files, it is the "ttyrec" unix terminal recordings format with one added byte to store the inputs ("actions"). Else you look at a replay without the information on the actions being taken.
+ you must explicitly turn on saving, by passing “savedir=“ in rather than None. e.g. `env = gym.make('NetHackChallenge-v0', savedir='replays')
+ to watch the replays, use of nle-ttyplay from NLE is recommended: https://discourse.aicrowd.com/t/getting-started-use-nle-ttyplay-instead-of-regular-ttyplay-to-play-back-recordings/6060
+ Since I can't run this in a colab notebook, I save the files to my Googl drive and watch them with e.g. Termrec: https://sourceforge.net/projects/termrec/  (also rus on Windows) or you can use "ttyrec2 "https://github.com/Noeda/ttyrec2 (C-code)
* Note: this is work-in-progress, the ttplay files are somehow empty, maybe I need a virtual monitor, maybe an gym wrapper? , also asked on Aicrowd: https://discourse.aicrowd.com/t/colab-notebook-baseline-submission/6118

In [None]:
# Expirimental, not sure if I need it
# @title Set-up the virtual display environment
!apt-get update
!apt-get install python-opengl -y
!apt install xvfb -y
!pip install pyvirtualdisplay
!pip install piglet
!apt-get install ffmpeg -y

0% [Working]            Hit:1 http://archive.ubuntu.com/ubuntu bionic InRelease
0% [Waiting for headers] [Connecting to security.ubuntu.com (91.189.91.39)] [Co                                                                               Hit:2 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
                                                                               Hit:3 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
                                                                               Hit:4 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
0% [Waiting for headers] [Connecting to security.ubuntu.com (91.189.91.39)] [Co0% [1 InRelease gpgv 242 kB] [Waiting for headers] [Connecting to security.ubun                                                                               Hit:5 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
0% [1 InRelease gpgv 242 kB] [Connecting to security.ubuntu.com (91.18

In [None]:
# Expirimental, not sure if I need it
# @title Start the virtual monitor
from pyvirtualdisplay import Display
display = Display(visible=0, size=(80, 24))
display.start()

<pyvirtualdisplay.display.Display at 0x7f6b8b37d290>

In [None]:
!mkdir replays
%ls


libnethack.so  [0m[01;36mnhdat[0m@  record    [01;34msave[0m/     [01;32mxlogfile[0m*
[01;32mlogfile[0m*       [01;32mperm[0m*   [01;34mreplays[0m/  [01;36msysconf[0m@


In [None]:
env = gym.make('NetHackChallenge-v0', savedir='/tmp/replays')
obs = env.reset()
steps=0
tot_rew=0

actionlist = [0,1,2,3,4,5,6,7,61, 17] 
dict = {0: 'N', 1: 'E',2: 'S',3: 'W',4: 'NE',5: 'SE',6: 'SW',7: 'NW',35: 'EAT',61: 'PICKUP',17: 'DOWN'}
while True:
    # action=env.action_space.sample() # select any random action
    if obs['misc'][2]:
      action = 99 # escape
    elif obs['misc'][1]:
      ;
    elif obs['misc'][0]:
      action = 0 # Let's answer No to be safe
    else:
      action = random.sample(actionlist, 1)[0] # select random actions form a list
    prev_message = bytes(obs['message']).decode('ascii').replace('\0','')
    obs, rew, done, info = env.step(action) # execute the action and see the results
    steps += 1 # Keep track of the steps
    tot_rew += rew # keep track of cumulative reward
    if steps<2:
      print('action ', dict[action])
      env.render('notebook') # Showcase a first few gamescreens
    if done:
        break

print('GAME ENDED')
print('Rewards:', tot_rew)
print('Steps:', steps)



action  SW

[0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m 
[0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;40m [0;30;4

In [None]:
%cd ..
%ls 



/tmp
dap_multiplexer.45bbf937f534.root.log.INFO.20210705-174545.50
[0m[01;36mdap_multiplexer.INFO[0m@
[01;35mdebugger_1mmmxfuwm3[0m=
[01;35mdrivefs_ipc.0[0m=
[01;35mdrivefs_ipc.0_shell[0m=
[01;34mhsperfdata_root[0m/
[01;34minitgoogle_syslog_dir.0[0m/
[01;34mnlex_1tlxns[0m/
[01;34mreplays[0m/
[01;34mtmptj6wu1em[0m/


In [None]:
# The code below will not work in a colab environment
!nle-ttyplay

^C


In [None]:
import glob, os
def save_replays(dataset_path):

  # Populate the folders
  count=0
  for pathAndFilename in glob.iglob(os.path.join(dataset_path, "*.bz2")):
      title, ext = os.path.splitext(os.path.basename(pathAndFilename))  
      %cp '{pathAndFilename}' '/content/drive/My Drive/nethack/{title}.bz2'
      print('copied ', pathAndFilename)
      count=count+1
  print('loaded ',count, 'files to Google drive')
  return 

save_replays('replays')

copied  replays/nle.61.0.ttyrec.bz2
loaded  1 files to Google drive


# Torchbeast

In [None]:
!pip install "nle[agent]"
!python -m nle.agent.agent --num_actors 80 --batch_size 32 --unroll_length 80 --learning_rate 0.0001 --entropy_cost 0.0001 --use_lstm --total_steps 1000000000

[1;30;43mStreaminguitvoer ingekort tot de laatste 5000 regels.[0m
 'episode_returns': (-13.4502125, 43.059925),
 'mean_episode_return': 14.804856300354004,
 'pg_loss': -271.2831726074219,
 'total_loss': -206.6820831298828}
[INFO:4646 agent:572 2021-07-02 20:43:56,351] Steps 1415680 @ 0.0 SPS. Loss -206.682083. Return per episode: 14.8. Stats:
{'baseline_loss': 65.25215148925781,
 'entropy_loss': -0.6510570645332336,
 'episode_returns': (-13.4502125, 43.059925),
 'mean_episode_return': 14.804856300354004,
 'pg_loss': -271.2831726074219,
 'total_loss': -206.6820831298828}
[INFO:4646 agent:572 2021-07-02 20:44:01,356] Steps 1415680 @ 0.0 SPS. Loss -206.682083. Return per episode: 14.8. Stats:
{'baseline_loss': 65.25215148925781,
 'entropy_loss': -0.6510570645332336,
 'episode_returns': (-13.4502125, 43.059925),
 'mean_episode_return': 14.804856300354004,
 'pg_loss': -271.2831726074219,
 'total_loss': -206.6820831298828}
[INFO:4646 agent:572 2021-07-02 20:44:06,361] Steps 1415680 @ 0.0 S

In [None]:
!python -m nle.scripts.plot

/usr/bin/python3: Error while finding module specification for 'nle.scripts.plot' (ModuleNotFoundError: No module named 'nle')


In [None]:
from nle import nethack as nh



from nle import nethack as nh

obj_classes = {getattr(nh, x): x for x in dir(nh) if x.endswith('_CLASS')}
glyph_classes = sorted((getattr(nh, x), x) for x in dir(nh) if x.endswith('_OFF'))

#for i in range(nh.MAX_GLYPH):
for i in range(3000):
    desc = ''
    if glyph_classes and i == glyph_classes[0][0]:
        cls = glyph_classes.pop(0)[1]
    
    if nh.glyph_is_monster(i):
        desc = f': "{nh.permonst(nh.glyph_to_mon(i)).mname}"'
    
    if nh.glyph_is_normal_object(i):
        obj = nh.objclass(nh.glyph_to_obj(i))
        appearance = nh.OBJ_DESCR(obj) or nh.OBJ_NAME(obj) 
        oclass = ord(obj.oc_class)
        desc = f': {obj_classes[oclass]}: "{appearance}"'

    print(f'Glyph {i} Type: {cls.replace("_OFF","")} {desc}'  )

Glyph 0 Type: GLYPH_MON : "giant ant"
Glyph 1 Type: GLYPH_MON : "killer bee"
Glyph 2 Type: GLYPH_MON : "soldier ant"
Glyph 3 Type: GLYPH_MON : "fire ant"
Glyph 4 Type: GLYPH_MON : "giant beetle"
Glyph 5 Type: GLYPH_MON : "queen bee"
Glyph 6 Type: GLYPH_MON : "acid blob"
Glyph 7 Type: GLYPH_MON : "quivering blob"
Glyph 8 Type: GLYPH_MON : "gelatinous cube"
Glyph 9 Type: GLYPH_MON : "chickatrice"
Glyph 10 Type: GLYPH_MON : "cockatrice"
Glyph 11 Type: GLYPH_MON : "pyrolisk"
Glyph 12 Type: GLYPH_MON : "jackal"
Glyph 13 Type: GLYPH_MON : "fox"
Glyph 14 Type: GLYPH_MON : "coyote"
Glyph 15 Type: GLYPH_MON : "werejackal"
Glyph 16 Type: GLYPH_MON : "little dog"
Glyph 17 Type: GLYPH_MON : "dingo"
Glyph 18 Type: GLYPH_MON : "dog"
Glyph 19 Type: GLYPH_MON : "large dog"
Glyph 20 Type: GLYPH_MON : "wolf"
Glyph 21 Type: GLYPH_MON : "werewolf"
Glyph 22 Type: GLYPH_MON : "winter wolf cub"
Glyph 23 Type: GLYPH_MON : "warg"
Glyph 24 Type: GLYPH_MON : "winter wolf"
Glyph 25 Type: GLYPH_MON : "hell hound p

In [None]:

#env.action_space.high
#env.action_space.low
# e.g. array([-10000.], dtype=float32)

Box(-10000.0, 10000.0, (1,), float32)