# Training an RL agent with a standard environment
In this notebook, we show how to train an RL agent using the stable-baselines3 library over an environment provided by CyclesGym. We start importing the packages needed.

In [1]:
from cyclesgym.envs import Corn
from cyclesgym.envs.crop_planning import CropPlanningFixedPlanting
import numpy as np
from cyclesgym.utils.paths import PROJECT_PATH
import wandb
from wandb.integration.sb3 import WandbCallback
import gym
from stable_baselines3 import PPO

First, we define a configuration file that is logged using wandb.

In [7]:
config = dict(start_year=1980, end_year=1990,
              total_timesteps=1000, n_steps=80, batch_size=80, n_epochs=10,
              verbose=1, device='cpu', n_weather_samples=50, 
              rotation_crops=['CornRM.100', 'SoybeanMG.3'] )

wandb.init(
    config=config,
    sync_tensorboard=True,
    project='notebook_experiments',
    monitor_gym=True,
    save_code=True,
    dir=PROJECT_PATH,
)

config = wandb.config

Now we use a subset of the configuration file to define a crop planning environment that simulate a multiyear rotation between maize and soybeans.

In [8]:
env_conf = {key: config[key] for key in ['start_year', 'end_year', 'rotation_crops', 'n_weather_samples']}

env = CropPlanningFixedPlanting(**env_conf)
env = gym.wrappers.RecordEpisodeStatistics(env)


Simulation 2022_06_15_12_01_33-25539085-741e-46bd-9f02-e9a81d6d4717/control running ...

Simulation time: 1 seconds.
starting generating weather files
done generating weather files


We can now define the learning agent. Here we use the PPO model from the stable-baselines3 library fro simplicity.

In [9]:
model = PPO('MlpPolicy', env, n_steps=config['n_steps'], batch_size=config['batch_size'],
            n_epochs=config['n_epochs'], verbose=config['verbose'], tensorboard_log=wandb.run.dir,
            device=config['device'])

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


Now we train the model for a number ot total step specified in the config dictionary.

In [5]:
model.learn(total_timesteps=config["total_timesteps"], callback=[WandbCallback()])
wandb.finish()




Simulation 2022_06_15_11_53_13-03f071f9-d437-4cf2-88f0-84011bfceca9/control running ...

Simulation time: 0 seconds.
Logging to /home/luca/Projects/plant_science/cyclesgym/wandb/run-20220615_115300-5u3t5fey/files/PPO_1

Simulation 2022_06_15_11_53_16-f90c22ee-ee7e-4d08-9fb1-f0eafc2d38da/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_53_19-b2b71c36-4056-4072-890f-2223d2e34793/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_53_23-7411578f-7682-4618-adc8-9c354870236e/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_53_26-a7561144-7389-481a-b354-87f12fc4ec46/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_53_29-2ab26a1a-b94c-4c7e-bb02-e01f586a8c37/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_53_32-ae47b203-d66c-4437-97d6-3c09e0f4d841/control running ...

Simulation time: 1 seconds.

Simulation 2022_06_15_11_53_36-8623599d-1de1-4cac-9512-29d1ad081ec8/con


Simulation 2022_06_15_11_55_14-ad602316-eabb-469b-82b3-6e4c32bffc82/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_55_17-e9942e53-c4d9-4109-b9ca-f1ab72c4dbd4/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_55_21-5fc799c2-8d1e-4e72-bb14-ef3e29978e43/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_55_24-bf9d6bb5-c932-405b-8e95-10e7759ed19e/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_55_27-67dd0c0e-eca3-4bc7-906e-ab08aec839f2/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_55_30-ccc843b4-25c8-49cf-9de5-83ca41fcf8fb/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_55_34-7032835c-e270-4c3d-8a9b-96b9857ef8df/control running ...

Simulation time: 0 seconds.
-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 11            |
|    ep_rew_mean          | 1.03e+04      


Simulation 2022_06_15_11_57_09-276e6f05-1335-4adc-af62-98df7eb2cca1/control running ...

Simulation time: 1 seconds.

Simulation 2022_06_15_11_57_13-97fb09b4-1df5-43b6-84c0-21efdfeda6ff/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_57_16-2190bfcd-69de-4a2e-b0c0-eb214c2f90f9/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_57_19-1d57fd17-93e7-4647-a5eb-b25f4d00ffdc/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_57_22-cbd13ff0-b66e-44ae-9860-4aabcc41880c/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_57_25-fe80b992-8bed-4c77-88d0-1467c002c7ff/control running ...

Simulation time: 0 seconds.

Simulation 2022_06_15_11_57_28-7f329899-019c-4ae4-b86f-ce62c728e859/control running ...

Simulation time: 1 seconds.

Simulation 2022_06_15_11_57_32-104fc262-07be-4686-9d9b-8eade49760f8/control running ...

Simulation time: 0 seconds.
------------------------------------------
| rollout/   

<stable_baselines3.ppo.ppo.PPO at 0x7f5b6bba1640>

Here we show how to train a similar agent on a fertilization environment

In [10]:
config = dict(start_year=1980, end_year=1990,
              total_timesteps=1000, n_steps=80, batch_size=80, n_epochs=10,
              verbose=1, device='cpu', n_weather_samples=50)

wandb.init(
    config=config,
    sync_tensorboard=True,
    project='notebook_experiments',
    monitor_gym=True,
    save_code=True,
    dir=PROJECT_PATH,
)

config = wandb.config

env_conf = {key: config[key] for key in ['start_year', 'end_year', 'n_weather_samples']}

env = Corn(**env_conf)
env = gym.wrappers.RecordEpisodeStatistics(env)

model = PPO('MlpPolicy', env, n_steps=config['n_steps'], batch_size=config['batch_size'],
            n_epochs=config['n_epochs'], verbose=config['verbose'], tensorboard_log=wandb.run.dir,
            device=config['device'])

model.learn(total_timesteps=config["total_timesteps"], callback=[WandbCallback()])
wandb.finish()

VBox(children=(Label(value='0.013 MB of 0.013 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, maxâ€¦

TypeError: __init__() got an unexpected keyword argument 'n_weather_samples'