<a href="https://colab.research.google.com/github/lcbjrrr/quantai/blob/main/02_FIAP_Ext_RL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Reinforcement Learning (Intro)

Reinforcement learning is a type of machine learning where an agent learns to make better decisions through trial and error. In this setup, the *agent* acts as the learner or decision maker, while the *environment* represents everything the agent interacts with. As the agent explores the environment, it observes the current *state*—a snapshot of the situation—and selects an *action* in response. The environment then returns a new state and provides a *reward*, which signals how good or bad the action was. Over time, the agent uses these rewards to improve its decision-making and discover strategies that lead to better outcomes. Think of it like training a robot to ride an elevator: every time it presses the right button and reaches the desired floor, it earns a reward and learns to repeat that success.

![](https://miro.medium.com/v2/resize:fit:1400/1*suRqGq0gWXQV2bcMGAhA_A.jpeg)

In [15]:
import random

class Environment:
    def __init__(self):
        self.state = 0

    def get_state(self):
        return self.state

    def step(self, action):
        reward = self.calc_reward(action)
        self.state = self.state + reward
        # Takes an action and returns (next_state, reward, done)
        if self.state == 3:
          return (self.state, reward, True)
        else:
          return (self.state, reward, False)

    def calc_reward(self, action):
        if action == "R":   # (R)ight
            return 1        # Reward of 1
        elif action == "L": # (L)eft
            return -1       # Penalty of -1


class Agent:
    def __init__(self):
        self.actions = ["L", "R"]

    def choose_action(self):
        return random.choice(self.actions)


In [16]:
env = Environment()
agent = Agent()

done = False
steps = 0

while not done and steps < 30:
    state = env.get_state()
    action = agent.choose_action()
    next_state, reward, done = env.step(action)

    print(f"Step {steps}: State={state}, Action={action}, Next State={next_state}, Reward={reward}")
    steps += 1

if done:
    print("🎉 Agent reached the goal!")
else:
    print("❌ Agent did not reach the goal in time.")


Step 0: State=0, Action=L, Next State=-1, Reward=-1
Step 1: State=-1, Action=R, Next State=0, Reward=1
Step 2: State=0, Action=R, Next State=1, Reward=1
Step 3: State=1, Action=L, Next State=0, Reward=-1
Step 4: State=0, Action=L, Next State=-1, Reward=-1
Step 5: State=-1, Action=L, Next State=-2, Reward=-1
Step 6: State=-2, Action=R, Next State=-1, Reward=1
Step 7: State=-1, Action=R, Next State=0, Reward=1
Step 8: State=0, Action=R, Next State=1, Reward=1
Step 9: State=1, Action=R, Next State=2, Reward=1
Step 10: State=2, Action=R, Next State=3, Reward=1
🎉 Agent reached the goal!


## Ativity: Elevator

Design a reinforcement learning solution where an agent simulates a person trying to reach a desired floor using an elevator. The user will input three values: the number of floors in the building, the starting floor, and the target floor. The agent can choose to move either “up” or “down,” and the episode continues until the agent arrives at the destination. Reaching the target floor yields a reward of +1, while all other movements result in a reward of 0. The agent should learn to navigate efficiently, avoiding invalid moves beyond the building limits, and use the environment’s feedback to reach the goal. Students are expected to implement an environment and agent class, define the decision logic, and simulate the episode with this setup.