# Problem Statement
Consider a Hero (Agent) trying to escape the matrix from the evil villan. The matrix is a supring Environment that the Hero (Agent) must solve in order to escape.

The Environment is the model of a Maze Problem but with buildings 
- Here we will have a 2D matrix as our observation space
```
[Buildings 1, 2, 3, 4, 5
            [1, 0, 0, 0, 0], --> Floor 5
            [0, 0, 0, 1, 0], --> Floor 4
            [0, 0, 0, 0, 0], --> Floor 3
            [0, 1, 0, 0, 0], --> Floor 2
            [0, 0, 1, 0, 1], --> Floor 1
] 
```
- Here each column is a building of floors ranging from 1-5, bottom to top respectively
- Each 0 represents a closed door and each 1 represents a open door that the Hero (Agent) must reach
- Each step we need to move to the floor in the first building where 1 i.e door is open
- This has to be repeated for N steps, Only then will our Hero (Agent) be set free

- Example, Consider that the hero has to solve for 3 steps. The following 2D building matrix

```
Current Observation: 
 [[0 1 0 0 1]
 [0 0 0 1 0]
 [1 0 0 0 0]  --> We send the Hero, Floor 3 because the Door is Open i.e 1
 [0 0 1 0 0]
 [0 0 0 0 0]]
>>> Moving to Floor 3 and Accessing Door!

Current Observation: 
 [[1 0 0 1 0] --> We send the Hero, Floor 5 because the Door is Open i.e 1
 [0 0 1 0 0]
 [0 0 0 0 1]
 [0 1 0 0 0]
 [0 0 0 0 0]]
>>> Moving to Floor 5 and Accessing Door!

Current Observation: 
 [[0 0 1 0 0]
 [0 1 0 0 0]
 [0 0 0 1 0]
 [1 0 0 0 1] --> We send the Hero, Floor 2 because the Door is Open i.e 1
 [0 0 0 0 0]]
>>> Moving to Floor 2 and Accessing Door!

Escaped from the Evil Villan! Reward: 3.0
```
- In the first step Hero has to move to floor 3 because the value is 1
- In the second step he has to move to floor 5 because the value is 1
- In the third step he has to move to floor 2 because the value is 1
    - Notice how each step a new building was added with a door from step 2 onwards
    - This is randomly added every step, i.e every step the first building is taken away and a new building is added to the end of the buildings with a Door at a random floor



**Our objective is to provide necessary floor number which will tell our Hero in which floor he will find the door open to escape. If we do not reach the right floor then our Hero will be caught and it is Game over!**

In [None]:
import random
import numpy as np

class Environment:
    def __init__(self):
        # Initialize total time steps the agent is allowed to interact with the Enviroment
        self.total_time_steps = 0
        self.obs = []
        self.current_agent_floor = 5

    def reset(self):
        # Reset the Environement total time steps for the Agent
        # This indicates that we need to correct the stearing wheel 10 times to reach the finish line
        self.total_time_steps = 10
        # Consider index 0 to be distance of the car to the left side barricade
        # index 1 to be distance of the car to the right side barricade
        self.obs = np.zeros((5, 5), dtype=int)
        for row_index in range(5): self.obs[random.randint(0, 4), row_index] = 1
        return self.get_observation()

    def get_observation(self):
        # Return observation vector currently is all ones since the Environment has no internal state
        # Usually this would return the Enviroments observation of the Agent
        return self.obs

    def get_actions(self):
        # Return the set of actions that the Agent can perform which in this case is up and down
        return [1, 2, 3, 4, 5]

    def is_done(self):
        # Return the total_time_steps which is used to check if the all the steps are exhausted by the Agent
        # i.e, Indicates the end of Episode to the Agent
        return self.total_time_steps == 0

    def move_to_floor(self, to_floor):
        self.current_agent_floor = to_floor
        print(f">>> Moving to Floor {self.current_agent_floor} and Accessing Door!")        
        self.obs[5 - self.current_agent_floor, 0] = 0

    def action(self, to_floor = None):
        # Display the current observation space
        print("\nCurrent Observation: \n", self.obs)
        
        self.move_to_floor(to_floor=to_floor)

        if self.obs[:, 0].any():
            raise Exception(f"Oh No! You Opened the Wrong Door, Game Over! :(\n\n{self.obs}")
        else:
            reward = 1.0
        # Decrement the total time steps
        self.total_time_steps -= 1

        # Update the observation space
        # Move the puzzle one step ahead
        self.obs = self.obs[:, [1, 2, 3, 4, 0]]
        # Add a new puzzle layer in the end
        self.obs[:, 4] = 0
        # Add random solution for the newly added last layer
        self.obs[random.randint(0, 4), 4] = 1
        
        # Return random reward
        return reward

## Code Challenge

In [None]:
# The Agent is an entity which enforces some policy which decides 
# the action for each step using it's observation against the Environment
class Agent:
    def __init__(self):
        # Declare and Initialize the total reward value as 0
        self.total_reward = 0
        # Declare and Initialize the floor to which the Agent has to move as 1
        self.move_to_floor = 1

    def step(self, env):
        # This main function of the Agent where the Environmental challenge is solved
        # MAIN LOGIC TO SOLVE THE CHALLENGE
        self.set_floor_to_move(env)
        # Move to the floor in the enviroment
        reward = env.action(to_floor=self.move_to_floor)
        # Total reward is incremented
        self.total_reward += reward
    
    def set_floor_to_move(self, env):
        # **** Write code here, to set the value of `self.move_to_floor` *****
        # Hint: Remove pass
        pass


# Initialize the Enviroment and Agent objects
env = Environment()
agent = Agent()

# Reset the Environment to its desired initial state which will return the obeservation
obs = env.reset()
# Display the obseration
print("**** BEGIN GAME ****")
# Agent keeps taking steps until the enviroment does not allow it to take any more steps
# Note: We will not raise Exception since we check if the Environment has time steps
#  to take before taking a step using our Agent
while not env.is_done():
    agent.step(env)

# Display the total reward
print("Reached Finish Line! Reward:", agent.total_reward)

**** BEGIN GAME ****

Current Observation: 
 [[0 0 0 0 1]
 [0 0 0 1 0]
 [1 0 1 0 0]
 [0 0 0 0 0]
 [0 1 0 0 0]]
>>> Moving to Floor 1 and Accessing Door!


Exception: ignored

## Solution

In [None]:
# The Agent is an entity which enforces some policy which decides 
# the action for each step using it's observation against the Environment
class Agent:
    def __init__(self):
        # Declare and Initialize the total reward value as 0
        self.total_reward = 0
        # Declare and Initialize the floor to which the Agent has to move as 1
        self.move_to_floor = 1

    def step(self, env):
        # This main function of the Agent where the Environmental challenge is solved
        # MAIN LOGIC TO SOLVE THE CHALLENGE
        self.set_floor_to_move(env)
        # Move to the floor in the enviroment
        reward = env.action(to_floor=self.move_to_floor)
        # Total reward is incremented
        self.total_reward += reward
    
    def set_floor_to_move(self, env):
        # Get the current observation space
        current_obs = env.get_observation()
        # Calculate the floor to move
        ((col_index_where_value_is_one, ), ) = np.where(current_obs[:, 0][::-1]==1)
        # Since our floor levels are from 1-5
        self.move_to_floor = col_index_where_value_is_one + 1

# Initialize the Enviroment and Agent objects
env = Environment()
agent = Agent()

# Reset the Environment to its desired initial state which will return the obeservation
obs = env.reset()
# Display the obseration
print("**** BEGIN GAME ****")
# Agent keeps taking steps until the enviroment does not allow it to take any more steps
# Note: We will not raise Exception since we check if the Environment has time steps
#  to take before taking a step using our Agent
while not env.is_done():
    agent.step(env)

# Display the total reward
print("\nEscaped from the Evil Villan! Reward:", agent.total_reward)

**** BEGIN GAME ****

Current Observation: 
 [[0 0 0 0 0]
 [0 0 0 1 0]
 [1 1 1 0 0]
 [0 0 0 0 1]
 [0 0 0 0 0]]
>>> Moving to Floor 3 and Accessing Door!

Current Observation: 
 [[0 0 0 0 0]
 [0 0 1 0 0]
 [1 1 0 0 0]
 [0 0 0 1 1]
 [0 0 0 0 0]]
>>> Moving to Floor 3 and Accessing Door!

Current Observation: 
 [[0 0 0 0 0]
 [0 1 0 0 0]
 [1 0 0 0 0]
 [0 0 1 1 1]
 [0 0 0 0 0]]
>>> Moving to Floor 3 and Accessing Door!

Current Observation: 
 [[0 0 0 0 0]
 [1 0 0 0 0]
 [0 0 0 0 0]
 [0 1 1 1 0]
 [0 0 0 0 1]]
>>> Moving to Floor 4 and Accessing Door!

Current Observation: 
 [[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [1 1 1 0 1]
 [0 0 0 1 0]]
>>> Moving to Floor 2 and Accessing Door!

Current Observation: 
 [[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [1 1 0 1 0]
 [0 0 1 0 1]]
>>> Moving to Floor 2 and Accessing Door!

Current Observation: 
 [[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [1 0 1 0 0]
 [0 1 0 1 1]]
>>> Moving to Floor 2 and Accessing Door!

Current Observation: 
 [[0 0 0 0 1]
 [0 0 0 0 0]
 [0 0