# From game theory to general information seeking

## Intro

Minimax assumes the worst case scenario, where the opponent is optimal. This simplifies search to a single worst case scenario world model, but can miss rewards when facing weaker opponents (e.g., via traps).

If one were to know with certainty the model of the opponent (and that opponent model of oneself), that could be used to model adversary choices via a recursive minimax search. The simplest cases involve:

1. `X` assumes `O` plays randomly, in which case the evaluation converges to a simple dynamic programming search (or expectiminimax without adversary nodes).

2. `X` knows that `O` assumes `X` plays randomly, in which case `X` can simulate how `O` would evaluate each position and play against that. This allows for deterministic traps. It assumes `O` would never realize `X` is not actually playing at random.

Curiosity question: If both assume the other assumes each one plays randomly, it would converge back to the original minimax?

3. `X` knows that `O` uses a given heuristic instead of recursive minimax search (mixed models where an adversary may use minimax for `n` iterations followed by heuristic evaluation can also be considered)

In each case it should be possible (provided the game is flexible enoguh) for `X` to increase it's expected payoff by deviating from the optimal policy, specially in cases where it starts at a disadvantage.

**Section 1** provides simple demos for these 3 scenarios, with references to more extensive works that have been published in the literature.

***

**Section 2** exhamines more complicated scenarios, where `X` does not know for sure what the model of `O` is, but might be able to infer this based on their choices. In enough complex games this would in theory motivate "adversary-testing" moves that sacrifice some efficiency / incur some risk to obtain extra information about the adversary.

This provides an example of artifical-intelligence information seeking in a context where the agent has a perfect (deterministic or stochastic) model of the world, and it is obvious (for the programmer at least) what cues are valuable and when they should be collected / evaluated.

Literature review is needed to assess what has been done in this field already

***

**Section 3** explores model-based inference and information seeking in less constrained environments... NEED A GOOD LEARNING ENVIRONMENT FOR THIS!

Literature review is needed to asses what has been done in this field already

***

**(Possibly unrelated)**

Genetic algorithm for meta-learning / exploration-exploitation based on simple learning models / information content cues




