# StarCraft II - Project Plan for MoveToBeacon minigame

## Tasks
Specific tasks that I must be able to solve in order to be able to solve the full MoveToBeacon minigame challange.

1. **Setup the game environment** <br>
    i. Install StarCraft II Learning Environment (SC2LE / pysc2) <br>
    ii. Parameters of the game (map, resolution, replays) <br>
    
    
2. **How to interact with the game** <br>
    i. How to use reset and step methods <br>
    ii. What are the observables <br>
    iii. What are the actions 
    
    
3. **How to build an agent** <br>
    i. How to use method step <br>
    ii. How to choose a valid action id <br>
    iii. How to know what kind of additional parameters are needed (spatial or not, parameter space in general)
    
    
4. **How to preprocess the spatial observation** <br>
    i. How to get the number of possible values that each variable can assume (variable = feature layer in the case of the feature_screen and feature_minimap) -> **manual inspection and custom preprocessing** <br>
    ii. How to embed each layer independently (since they have different vocabularies and semantic meanings) - used ohe when needed, log2 for numerical variables with high range and casting to float for those that were numerical but with values limited in a small range<br>
    iii. How to deal with screen and minimap different resolutions [see specific thread] - assume same resolution<br>
    iv. How many feature layers to use (all info or just the useful one for MoveToBeacon?) [see specific thread] - custom number of them
    
    
5. **How to preprocess player information** <br>
    i. Categorical (e.g. player id) vs pure integer (e.g. minerals and gas) distinction - done<br>
    ii. What should be the output of the preprocessing - multi-channel image
    

6. **Which architecture to use for learning state representation** <br>
    i. When actor and critic architectures depart from each other? - last layer <br>
    ii. Which kind of representation should we produce? <br>
    iii. Spatial and non-spatial outputs - done <br>
    iv. Possible architectures [see specific thread] 
    
    
7. **Actor final layers** <br>
    i. How to choose actions id - masked softmax all in one <br>
    ii. How to encode the information about the chosen action when we need to get the additional parameters - at the moment with the embedding layer, but we could use the FiLM layer <br>
    iii. How to know if parameters are spatial or not - access to parameter specifics and look for the size
    iv. How to sample non-spatial additional params - which layers and which input to use? Fully-connected with one hidden layer of 256 neurons and ReLU activation (we could skip this hidden layer maybe)
    v. How to sample spatial additional params - which layers and which input to use? (conv2d 1x1 or 3x3 with 1 of padding directly after the spatial features -> no activation functions specific to the FullyConvNonSpatial net)

## Milestones / Tests
What intermediary objective I plan to achive in the meanwhile.

### Components to test
1. Input preprocessing
2. Shared architecture
3. Critic architecture
4. Actor architecture - action id 
5. Actor architecture - actions parameters
6. Critic update
7. Actor update
8. Training cycle
9. Relational architecture

### Milestones
**Agent 1:** 
    - Actor and Critic provided only with useful info for MoveToBeacon (x,y of agent and beacon center, flags  exists_beacon and is_selected)
    - Simplest FF net that chooses onyl the function id (so action space = 3)
    - Scripted (optimal) choice of additional parameters
    
Tests the actor architecture for choosing the action id (4), actor-critic update (7) and the training cycle (8). <br>

**Agent 2:** 
    - Actor-Critic provided only with one-hot-encoded version of the most important layers (player relative and selected)
    - AtariNet-like agent to select action id (so action space = 3)
    - Scripted (optimal) choice of additional parameters
    
Tests the input preprocessing (1) and a bit of the shared architecture (2).

**Agent 3:** 
    - Actor-Critic provided only with one-hot-encoded version of the most important layers (player relative and selected)
    - AtariNet-like agent to select action id and additional parameters
    
Tests actor and critic architecture (3, 4, 5).

**Agent 4:** 
    - Actor-Critic provided only with one-hot-encoded version of the most important layers (player relative and selected)
    - FullyConvNet-like agent to select action id and additional parameters
    
Tests actor and critic architecture (3, 4, 5).

**Agent 5:** 
    - Relational architecture from Relational Deep RL paper
    - Actor-Critic provided only with one-hot-encoded version of the most important layers (player relative and selected)
    - Only action id selection
    - Scripted (optimal) choice of additional parameters
    
Test relational architecture (9).

**Agent 6:** 
    - Relational architecture from Relational Deep RL paper
    - Actor-Critic provided only with one-hot-encoded version of the most important layers (player relative and selected) + non-spatial info (no minimap)
    - Full action selection
    
Agent almost complete. 

All input info retained is actually not useful, so it doesn't make sense to include it except for generalization purposes that will be addressed when we take on the whole minigame challenge.

**Update:** all agents up to 4 done successfully, agent 5 not planned, agent 6 still to try. Problems with the GPU memory.

## Issues / Threads
Conceptual choices that I have to make.

1. Trade-off between generalization and complexity (How many features do we actually need? Is it okay to manually filter them before feeding them to the agent?)
2. How to deal with screen and minimap resolutions: ideally they should be 84x84 and 64x64 respectively, but in the original paper they assume them to have the same resolution. Actually for the moment I will actually skip the whole minimap observation, but in the other minigames could be important.
3. Possible architectures: Atari-like, FullConv, Relational

## Useful material
Most useful guides, repositories and papers.

StarCraft II Learning Environment from DeepMind: 
- Library repository: https://github.com/deepmind/pysc2 
- Original paper: https://arxiv.org/abs/1708.04782
- Relational paper: https://arxiv.org/abs/1806.01830

Full-game mostly scripted bots: <br>
https://github.com/skjb/pysc2-tutorial 

TensorFlow solutions for A2C and PPO agents for the minigames: <br>
https://github.com/inoryy/reaver <br>
https://github.com/chris-chris/pysc2-examples

Some simple Q-table solutions of MoveToBeacon minigame: <br>
https://github.com/yvan/nbsblogs/tree/master/pysc2_tut

