Implement "Observation learning" with DQFD

tensorforce supports [DQFD](https://arxiv.org/abs/1704.03732), which can be used to use another agent to teach us what to do. but, this needs either parasite learning (i.e. agent knows everything the other agent knows) or some other form of allowing them to have the same input which is hard because the input partially is the forecast which is internally generated.