# ML-Agents Open a UnityEnvironment
<img src="https://github.com/Unity-Technologies/ml-agents/blob/release_18_docs/docs/images/image-banner.png?raw=true" align="middle" width="435"/>

## References:
- https://github.com/Unity-Technologies/ml-agents/blob/release_18_docs/docs/Python-API.md

## Setup

### import library

In [11]:
import mlagents

## Run the Environment

In [12]:
try:
  env.close()
except:
  pass


from mlagents_envs.environment import UnityEnvironment
# This is a non-blocking call that only loads the environment.
#env = UnityEnvironment(file_name=None, worker_id=run_id, seed=1, side_channels=[])
env = UnityEnvironment(file_name=None)
# Start interacting with the environment.
env.reset()

Execute Unity

### Reset the environment
To reset the environment, simply call `env.reset()`. This method takes no argument and returns nothing but will send a signal to the simulation to reset.

In [13]:
env.reset()

### Behavior Specs


#### Get the Behavior Specs from the Environment

In [14]:
# We will only consider the first Behavior
behavior_name = list(env.behavior_specs)[0]
print(f"Name of the behavior : {behavior_name}")
spec = env.behavior_specs[behavior_name]

Name of the behavior : 3DBall?team=0


#### Get the Observation Space from the Behavior Specs

In [15]:
# Examine the number of observations per Agent
print("Number of observations : ", len(spec.observation_specs))

Number of observations :  1


#### Get the Action Space from the Behavior Specs

In [16]:
# Is the Action continuous or multi-discrete ?
if spec.action_spec.continuous_size > 0:
  print(f"There are {spec.action_spec.continuous_size} continuous actions")
if spec.action_spec.is_discrete():
  print(f"There are {spec.action_spec.discrete_size} discrete actions")


# How many actions are possible ?
#print(f"There are {spec.action_size} action(s)")

# For discrete actions only : How many different options does each action has ?
if spec.action_spec.discrete_size > 0:
  for action, branch_size in enumerate(spec.action_spec.discrete_branches):
    print(f"Action number {action} has {branch_size} different options")

There are 2 continuous actions


### Stepping the environment

#### Get the steps from the Environment
You can do this with the `env.get_steps(behavior_name)` method. If there are multiple behaviors in the Environment, you can call this method with each of the behavior's names.
_Note_ This will not move the simulation forward.

In [17]:
decision_steps, terminal_steps = env.get_steps(behavior_name)

#### Set actions for each behavior
You can set the actions for the Agents of a Behavior by calling `env.set_actions()` you will need to specify the behavior name and pass a tensor of dimension 2. The first dimension of the action must be equal to the number of Agents that requested a decision during the step.

In [18]:
env.set_actions(behavior_name, spec.action_spec.empty_action(len(decision_steps)))

#### Move the simulation forward
Call `env.step()` to move the simulation forward. The simulation will progress until an Agent requestes a decision or terminates.

In [19]:
env.step()

### Run the Environment for a few episodes

In [20]:
for episode in range(3):
  env.reset()
  decision_steps, terminal_steps = env.get_steps(behavior_name)
  tracked_agent = -1 # -1 indicates not yet tracking
  done = False # For the tracked_agent
  episode_rewards = 0 # For the tracked_agent
  while not done:
    # Track the first agent we see if not tracking
    # Note : len(decision_steps) = [number of agents that requested a decision]
    if tracked_agent == -1 and len(decision_steps) >= 1:
      tracked_agent = decision_steps.agent_id[0]

    # Generate an action for all agents
    action = spec.action_spec.random_action(len(decision_steps))

    # Set the actions
    env.set_actions(behavior_name, action)

    # Move the simulation forward
    env.step()

    # Get the new simulation results
    decision_steps, terminal_steps = env.get_steps(behavior_name)
    if tracked_agent in decision_steps: # The agent requested a decision
      episode_rewards += decision_steps[tracked_agent].reward
    if tracked_agent in terminal_steps: # The agent terminated its episode
      episode_rewards += terminal_steps[tracked_agent].reward
      done = True
  print(f"Total rewards for episode {episode} is {episode_rewards}")


Total rewards for episode 0 is 1.2000000327825546
Total rewards for episode 1 is 1.9000000432133675
Total rewards for episode 2 is 2.200000047683716


### Close the Environment to free the port it is using

In [21]:
env.close()
print("Closed environment")

Closed environment
