We have published a paper which outlines why Starcraft is one of the most challenging offline RL benchmarks to date along with the performance benchmarks for a spectrum of offline RL approaches that we have explored on this data. The current set of algorithms released as part of this repository are
- Behavior Cloning (BC). One can implement a fine-tuned version (FT-BC) by warmstarting from a trained checkpoint.
Our training setup is heavily config driven. Our main config can be found in
alphastar/configs/alphastar_supervised.py
. Run all these commands from the
root directory where the package is downloaded into.
To run training for a few steps with some config arguments updated, run:
python alphastar/unplugged/scripts/train.py \
--config=alphastar/unplugged/configs/alphastar_supervised.py:alphastar.dummy \
--config.train.max_number_of_frames=16 \
--config.train.learner_kwargs.batch_size=4 \
--config.train.datasource.kwargs.shuffle_buffer_size=16 \
--config.train.optimizer_kwargs.lr_frames_before_decay=4 \
--config.train.learner_kwargs.unroll_len=3 \
--config.train.datasource.name=DummyDataSource
To run the same script using Bazel:
bazel run --cxxopt='-std=c++17' alphastar/unplugged/scripts:train -- \
--config=alphastar/unplugged/configs/alphastar_supervised.py:alphastar.dummy \
--config.train.max_number_of_frames=16 \
--config.train.learner_kwargs.batch_size=4 \
--config.train.datasource.kwargs.shuffle_buffer_size=16 \
--config.train.optimizer_kwargs.lr_frames_before_decay=4 \
--config.train.learner_kwargs.unroll_len=3 \
--config.train.datasource.name=DummyDataSource
Note the extra --
between alphastar/unplugged/scripts:train
and the rest of
the flags.
Do note that these commands are training with a dummy architecture on a dummy datasource.
To train with real data
-
Follow the instructions for data generation in alphastar/unplugged/data/README.md,
-
The next step is to create a paths python file with two constants
- BASE_PATH and RELATIVE_PATHS. BASE_PATH is the root directory for the converted datasets that were generated in Step 1. RELATIVE_PATHS is a dictionary mapping of keys and values as follows : {(replay_versions, data_split, player_min_mmr) : }
We have provided a template file for setting the data paths appropriately. Please copy this template file to some directory of choice
cp alphastar/unplugged/data/paths.py.template /tmp/paths.py
Modify the paths based on step 1 and use the file as
config.train.datasource.kwargs.dataset_paths_fname
while launching training.While training, the particular data that you want to train on can be set by setting the
replay_versions
,data_split
,player_min_mmr
anddataset_paths_fname
via the config usingconfig.train.datasource.kwargs
or invoking the same on command line. -
After these two steps, run (to confirm the entire training apparatus with training data from SC2 version 4.9.2, assuming that the paths file from step 2 is
/tmp/paths.py
)python alphastar/unplugged/scripts/train.py \ --config=alphastar/unplugged/configs/alphastar_supervised.py:alphastar.dummy \ --config.train.max_number_of_frames=16 \ --config.train.learner_kwargs.batch_size=4 \ --config.train.datasource.kwargs.shuffle_buffer_size=16 \ --config.train.optimizer_kwargs.lr_frames_before_decay=4 \ --config.train.learner_kwargs.unroll_len=3 \ --config.train.datasource.name=OfflineTFRecordDataSource \ --config.train.datasource.kwargs.dataset_paths_fname='/tmp/paths.py' \ --config.train.datasource.kwargs.replay_versions='("4.9.2",)'
-
To run default full scale training after the real dataset is generated and the paths are updated, run the following command. Do note that the default setting is to run with all replay versions. If you want to run on specific replay versions only, please set
config.train.datasource.kwargs.replay_versions
as shown below.python alphastar/unplugged/scripts/train.py \ --config=alphastar/unplugged/configs/alphastar_supervised.py:alphastar.full \ --config.train.datasource.kwargs.dataset_paths_fname='/tmp/paths.py' \ --config.train.datasource.kwargs.replay_versions='("4.9.2",)'
To evaluate a random agent in the environment for one full episode, run:
python alphastar/unplugged/scripts/evaluate.py \
--config=alphastar/unplugged/configs/alphastar_supervised.py:alphastar.dummy \
--config.eval.log_to_csv=False \
--config.eval.evaluator_type=random_params
More instructions on how to use these scripts for full-fledged training and evaluation can be found in the docstrings of the scripts. Information about different architecture names can be found here.
Disclaimer: This is not an official Google product.
If you use the agents, architectures and benchmarks published in this repository, please cite our Alphastar Unplugged paper.