DDPG with Auxillary Rewards:
State -> Actor -> Action
State, Action -> Critic-> Q(State,Action)
Aux_reward_i:
State -> Actor -> lower level representation of state (LRS)
LRS -> aux_reward_module_i -> Aux_reward_i
mean_square_loss(Q , Q_obs) -> critic -> State, action
-Q -> critic -> State, action -> Actor -> State