Environment Generation for Zero-Shot Compositional Reinforcement Learning

Overview

This repository includes the implementation of Environment Generation for Zero-Shot Compositional Reinforcement Learning by Izzeddin Gur, Natasha Jaques, Yingjie Miao, Jongwook Choi,Manoj Tiwari, Honglak Lee, and Aleksandra Faust. NeurIPS'21.

Getting Started

Download this repo

svn export https://github.com/google-research/google-research/trunk/compositional_rl

Install Dependencies

Install Bootstrap

Download bootstrap files:

mkdir gwob/bootstrap/ && cd gwob/bootstrap && wget https://github.com/twbs/bootstrap/releases/download/v4.3.1/bootstrap-4.3.1-dist.zip

Unzip and extract:

unzip bootstrap-4.3.1-dist.zip && cp bootstrap-4.3.1-dist/css/bootstrap.min.css . && cp bootstrap-4.3.1-dist/js/bootstrap.min.js . && rm -r bootstrap-4.3.1-dist* && cd ../../

Install MiniWoB

Clone the MiniWoB project:

git clone https://github.com/stanfordnlp/miniwob_plusplus gwob/miniwob_plusplus

Checkout the version that we used in our project:

cd gwob/miniwob_plusplus && git checkout 833a477a8fbfbd2497e95fee019f76df2b9bd75e

Convert all python files from Python 2 to Python 3:

pip install 2to3 && cd ../../ && 2to3 gwob/miniwob_plusplus/python/miniwob -w

Integrate miniwob by making necessary changes:

python3 integrate_miniwob.py

Install miniwob:

pip install gwob/miniwob_plusplus/python/

Install the ChromeDriver with the version that is matching your Chrome browser:

export PATH=$PATH:/path/to/chromedriver

Install gMiniWoB

Install gminiwob:

pip install gwob/

Examples

Generating random websites in gMiniWoB

Open file:///path/to/compositional_rl/gwob/gminiwob/sample_random_website.html in a browser and click "START".
Each time the "START" button is clicked, this will create a random gMiniWoB website using a subset of primitives available in gMiniWoB.

Running a rule-based policy with a fixed test environment

Run python3 gwob/examples/web_environment_example.py --data_dep_path='/path/to/compositional_rl/gwob/ to run a rule-based policy for a simulated shopping website. If you get any errors related to non-headless browsing, make sure to pass --run_headless_mode=True.

Environment design and Q-value network

The following is a simple tutorial for randomly designing an environment and using an LSTM-based DQN to generate logits and values.

import gin
import numpy as np

from a2perf.domains.web_navigation.gwob.CoDE import test_websites
from a2perf.domains.web_navigation.gwob.CoDE import utils
from a2perf.domains.web_navigation.gwob.CoDE import vocabulary_node
from a2perf.domains.web_navigation.gwob.CoDE import web_environment
from a2perf.domains.web_navigation.gwob.CoDE import web_primitives
from a2perf.domains.web_navigation.gwob.CoDE import q_networks

gin.parse_config_files_and_bindings(["/path/to/compositional_rl/gwob/configs/envdesign.gin"], None)

# Create an empty environment.
env = web_environment.GMiniWoBWebEnvironment(
  base_url="file:///path/to/compositional_rl/gwob/",
  global_vocabulary=vocabulary_node.LockedVocabulary())

# Create a q network.
q_net = q_networks.DQNWebLSTM(vocab_size=env.local_vocab.max_vocabulary_size, return_state_value=True)

# Sample a new design of the form {'number_of_pages': Integer, 'action': List[Integer], 'action_page': List[Integer]}.
# `action` denotes primitive indices and `action_page` denotes their corresponding page indices.
# Each item in the `action_page` should be less than `number_of_pages`.
# For this tutorial, we will randomly sample a design.
number_of_pages = np.random.randint(4) + 1
design =  {'number_of_pages': number_of_pages,
            'action': np.random.choice(np.arange(len(web_primitives.CONCEPTS)), 5),
            'action_page': np.random.choice(np.arange(number_of_pages), 5)}

# Design the actual environment.
env.design_environment(
    design, auto_num_pages=True)

# Reset the environment.
state = env.reset()

# Add batch dimension.
state = {key: np.expand_dims(tensor, axis=0) for key, tensor in state.items()}

# Get flattened logits and values.
logits, values = q_net(state)

# Get greedy action.
action = np.argmax(logits)

# Run the action.
new_state, reward, done, info = env.step(action)

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
configs		configs
docker		docker
docs		docs
environment_generation		environment_generation
gwob		gwob
.gitmodules		.gitmodules
README.md		README.md
__init__.py		__init__.py
create_environment.py		create_environment.py
integrate_miniwob.py		integrate_miniwob.py
requirements.txt		requirements.txt
setup_docker.sh		setup_docker.sh

Farama-Foundation/a2perf-web-nav

Folders and files

Latest commit

History

Repository files navigation

Environment Generation for Zero-Shot Compositional Reinforcement Learning

Overview

Getting Started

Install Dependencies

Install Bootstrap

Install MiniWoB

Install gMiniWoB

Examples

Generating random websites in gMiniWoB

Running a rule-based policy with a fixed test environment

Environment design and Q-value network

About

Resources

Stars

Watchers

Forks

Languages