# Monte Carlo Planner Design
We'll build a traditional searching planning algorithm, the kind used in very old game players. It's a place to start. We'll call it a "Monte Carlo" planner because we'll probably expand neither breadth-first nor depth-first, but stochastically.

## Object model
A very rudimentary object model that we can use before delving into a toy problem to solve. We should remember that these objects represent what the AI *believes* about the world, not what is actually true in the world.

Here's the 30,000-ft view.
* The *Environment* is the "objective reality" in which the Agent lives. 
* An *Agent* is the organism/entity/being that devises a *Plan*. 
  * An Agent exists within the context of an *Environment*.
  * An Agent has a current *Frame*.
  * An Agent can command the effecting of certain specific changes in that Environment by way of *Effectors*.
  * An Agent can receive updates to its *Frame* by way of *Sensors*.
  * An Agent can receive a *Reward* from the *Environment*.
  * The Agent's job is to try to maximize this Reward.
* A *Frame* is a summary of the Agent's current belief about the state of the Environment. 
  * A Frame doesn't have to *accurately* represent the entire Environment. It only needs to be accurate enough to increase Reward above some more naive baseline. The complexity of a Frame is programmatically arbitrary, and systems of higher or lesser intelligence can be built by experimenting with different Frame structures.
  * A Frame's internal state gets set by the Agent after the Agent executes an Action. Upon executing an Action, the Agent consults its Sensors and sets the Frame's state accordingly.
  * A Frame must include the Reward received by the Agent upon executing an Action that resulted in entering that Frame.
    * The Reward can simply be an expected value, or it can be a magnitude combined with a probability.
  * A Frame contains a set of *Actions*. Each Action is essentially mutually exclusive to all others, i.e. only one Action can be executed at a time.
  * Multiple Frames can collapse into one another. That is, Frames can be observed to be equivalent, by virtue of representing equivalent states.
* A *Plan* is an object that describes a sequence of *Actions*, and resulting *Consequence* Frames.
  * The *Plan* always starts with the current *Frame*.
* An *Action* represents one or more commands sent to the Effectors.
  * An Action contains one or more *Consequences*.
  * An Action has an expected Reward value, based on some function of the expected Rewards of the states of its Consequences. Presumably, this function is simply a weighted sum.
  * It may be useful for the Action's Reward value to include a measurement of uncertainty.
* A *Consequence* represents the result of committing an action within a Frame.
  * A Consequence has a probability.
  * A Consequence has a child Frame.
* Eventually, defining new Actions or new compilations of Frame states will count as Actions in and of themselves, thus facilitating high-order learning. But that depends on exactly how the Frame represents the state. For now, Action and Frame definitions will be fixed, and supplied by the programmer.
* The Agent contains an *ActionGenerator* that, given a Frame, randomly produces one or more Actions.
* The Agent contains a *ConsequenceGenerator* that, given a Frame and an Action, randomly produces one or more Consequences.
  * Upon producing these Consequences, the expected Reward of the Action may be updated.
  * Upon updating the Reward of the Action, the ConsequenceGenerator may update the expected Reward of the Frame that the Action came from. The Frame's Reward consists not only of an intrinsic Reward native to that Frame, but also a supplemental Reward consisting of the expected Reward of the best Action one can perform within that Frame. This way, a small temporary gain may be offset by horrible consequences later, and vice versa.

### Frames
A frame represents a currently or hypothetically true world state. A frame affords the performance of actions. Performing an action within the context of a frame can generate child frames, that contain new currently true facts. Each frame has a desirability score, and the planner's job is to maximize said score.

In [8]:
class Frame:
    def __init__(self):
        self.worldState = None
        
        # A collection of actions that can lead us to this frame.
        # Used for tailing back through a Plan in order to report a victory route
        # once a win condition is discovered.
        self.fromActions = {}
        
        # The action collection represents Action objects, indexed by their keys.
        self.actions = {}
        
    # Compute a lookup key for this Frame, presumably based on the worldState.
    def key(self):
        return ''
        
        

### Action Generator
The Action Generator can generate actions within the context of a frame. It generates a new random action every time it's called.

In [9]:
class Action:

class ActionGenerator:
    def __init__(self):
        pass
    
    def generateAction(self, frame):
        return None

IndentationError: expected an indented block (<ipython-input-9-40a42313e6da>, line 3)

### Consequence Generator
Given a frame and an action, the Consequence Generator generates a new frame that may result from performing that action within that frame.

In [None]:
class ConsequenceGenerator:
    def __init__(self):
        pass
    
    def generateConsequence(self, frame, action):
        return None

In [12]:
import random
import string
rndstr = ''.join([random.choice(string.ascii_letters + string.digits) for n in xrange(32)])
print(rndstr)

NameError: name 'xrange' is not defined