# Environment Demo

In [47]:
import importlib
import environment 
importlib.reload(environment)

import numpy as np
import time

## Robot features

- 25 cm diameter
- compass
- 360 vision sensor and object reconition in range (50cm-100cm)
- comunication between others robots
- ability to pick up stuff (in they're in the same position of the object)
- holonomic motion (every directions)
- maximum velocity: 200 cm/s
- maximum acceleration: 400 cm/s²

In [68]:
f"{environment.SIMULATION_ROBOT_SIZE}, {environment.SIMULATION_SENSOR_RANGE}, {environment.SIMULATION_MAX_DISTANCE}"

'1.0, 2.0, 2.0'

We are in a continuous 2D environment (no physics), a robot possesses the capability to navigate in any direction, covering any distance up to a defined maximum per step. Additionally, the robot can pick up and deposit an object objects when is underneath or when it reached an edge, resepctively.

The robots are equipped with sensory equipment capable of identifying nearby entities. A "neighbor" is characterized by a tuple comprising the type of object, the distance to it, and its relative direction. Accordingly, each robot maintains a list of such tuples for a predefined fixed number of neighboring entities.

## Arena

5m x 5m with robots and colored objects 

In [69]:
environment.SIMULATION_ARENA_SIZE 

20.0

## Objective
List of (color_id, edge). The robots must pick up the objects and deposit them in right position. The deposit area is in an edge of the arena.

In [70]:
# Example
[(environment.RED, environment.NORTH_EDGE), (environment.GREEN, environment.SOUTH_EDGE)]

[(3, 180), (5, 0)]

In [71]:
print(f"""Rewards:
movement: + or - the distance covered for a good or bad move
correct pick up: {environment.REWARD_CORRECT_PICK} 
wrong pick up: {environment.REWARD_WRONG_PICK} 
correct drop down: {environment.REWARD_CORRECT_DROP} 
wrong drop down:{environment.REWARD_WRONG_DROP} 
""")

Rewards:
movement: + or - the distance covered for a good or bad move
correct pick up: 5 
wrong pick up: -5 
correct drop down: 10 
wrong drop down:-10 



## Environment initialization

In [48]:
initial_setting = {
    'agents': np.array([[0, 5], [0, 10], [0, 15]], dtype=float),
    'blocks': np.array([[5, 15], [13, 5], [16, 4], [11, 12]], dtype=float),
    'colors': np.array([environment.RED, environment.RED, environment.BLUE, environment.RED], dtype=int)
}
env = environment.Environment(objective=[(environment.RED, environment.NORTH_EDGE)],
                   size=environment.SIMULATION_ARENA_SIZE, 
                   n_agents=3, 
                   n_blocks=4,
                   n_neighbors=3,
                   sensor_range=environment.SIMULATION_SENSOR_RANGE,
                   sensor_angle=360,
                   max_distance_covered_per_step=environment.SIMULATION_MAX_DISTANCE,
                   sensitivity=0.5,
                   initial_setting=initial_setting)
initial_state, _ = env.reset() # Initial state
env.print_env()
initial_state

. . . . . . 0 . . . . . . 1 . . . . . 2 . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . [91mO[0m . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . [91mO[0m . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . [91mO[0m . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .

[{'neighbors': array([[  1.,   0., 180.],
         [  0.,   0.,   0.],
         [  0.,   0.,   0.]]),
  'carrying': -1},
 {'neighbors': array([[  1.,   0., 180.],
         [  0.,   0.,   0.],
         [  0.,   0.,   0.]]),
  'carrying': -1},
 {'neighbors': array([[  1.,   0., 180.],
         [  0.,   0.,   0.],
         [  0.,   0.,   0.]]),
  'carrying': -1}]

## Testing

In [65]:
action = env.action_space.sample()
# action = ([1, 180], [0, 0], [0.9, 90])
next_state, reward, done, _, _ = env.step(action)
env.print_env()
reward

. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 1 . . . . . . . . 2 .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 0 . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . [91mO[0m . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . [91mO[0m . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . [91mO[0m . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .

0.0

In [66]:
env.agents_location

array([[ 3.28,  6.55],
       [ 0.87, 11.08],
       [ 0.79, 17.92]])

## Random solver

In [58]:
steps = 0
total_reward = 0
while True:
    action = env.action_space.sample()
    next_state, reward, done, _, info = env.step(action)
    steps += 1
    total_reward += reward
    if done:
        break

f"Done in {steps} steps with reward {total_reward}, info: {info}"

"Done in 6233 steps with reward 323.5618995171115, info: {'completed': [(1, 3, 0), (0, 3, 2), (3, 3, 1)]}"

## Profiling

about 3000 steps per second

In [67]:
total_time = 0
for i in range(3000):
    action = env.action_space.sample()
    start = time.time()
    next_state, reward, done, _, _ = env.step(action)
    end = time.time()
    total_time += end - start
total_time

1.7965059280395508