# RL for fAIr Demo
(see description for custom environment help is needed and ANYONE can help) 
## What is RL?
Reinforcement Learning is when an agent learns how to do something by recieving good/bad feedback. In our case, an agent would learn how to optimize fertilization on different farms. 

#### Reasons why RL might be a good idea: 
- High Adaptabilility to Different Environments:  RL is very dynamic and can model complex situations. For example, temperature, soil moisture levels, etc... are all dynamic variables and make fertilization different each time. RL is great for accounting for any situation. 
  
- Optimization: The agent can 'simulate' optimal fertilization beforehand. For example, a farmer will know beforehand how much fertilizer is needed, where to apply the most fertilizer, etc...

- Very awesome

#### Reasons why we might choose a different software

- Complexity: Creating a environment detailed enough will be a big challenge -- it will take a very very deep understanding of the problem. Thankfully, however, **anyone can help make the environment better by just expanding our knowledge of the problem** (I will elaborate on this below).

- Simulation Gap: As an add on to the problem above, if we don't model the farm very well, it can lead to it not being so effective on the field
  

### Resources

[Nicholas Renotte Series](https://www.youtube.com/playlist?list=PLgNJO2hghbmjlE6cuKMws2ejC54BTAaWV)

# Installing Dependencies

In [3]:
!pip install gym
!pip install tensorflow
!pip install keras-rl2 
!pip install numpy
!pip install pygame



# Creating Environment / Custom Environment

### How Everyone Can Help

To teach the agent how to fertilize, we have to design an environment and how it can interact with it. In this case, we have to design a farm and how the agent can engage with it. 

To help,  research:
- **what defines the condition of the farm**. For example, soil moisture, temperature, nitrogen levels, and nutrient levels all define the condition of the farm.
- **what actions can the agent take**:  For example, the agent could add more fertilizer, it could wait a day, it could stop fertilizing, etc...

Obviously you'll notice that we have a very surface level understanding of how to describe the farm and how to interact with the farm. General research into this helps design it. 

In [19]:
from gym import Env
from gym.spaces import Discrete, Box
import numpy as np
import random

In [5]:
class CustomEnv(Env):
    def __init__(self): #actions, observation space, space, shower length 
        
        self.action_space = Discrete(3) #this will refer to the actions the agent can take
        self.observation_space = Box(low=np.array([0]), high=np.array([100]), dtype=np.float32) #display current state, but could be used to hold n-dimensional factors
        self.state = 38 + random.randint(-3,3) #starting state
        self.length = 60 # how many episodes
        
    def step(self, action): #what we do when we take a step, how we treat actions

        self.state += action -1 
        self.length -= 1 #each episode decrease episode
        
        if self.state >=37 and self.state <=39: # set optimal, and reward. also could be extended to n factors.
            reward =1 
        else: 
            reward = -1 #discourage other stuff
        
        # Check if shower is done
        if self.shower_length <= 0: 
            done = True
        else:
            done = False
            
        self.state += random.randint(-1,1)

        info = {}
        
        return self.state, reward, done, info
        
    def render(self): #visualizations
        pass
    def reset(self): #reset environment after each episode
        self.state = 38 + random.randint(-3,3)
        self.shower_length = 60 
        return np.array([self.state], dtype=np.float32)

In [7]:
env = CustomEnv()
env.observation_space.sample(), env.action_space.sample() # an example state, an example action. Good for testing your environment 

  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")


(array([88.90683], dtype=float32), 1)

# Testing the Environment

In [9]:

episodes = 10
for episode in range(1, episodes+1):
    
    state = env.reset()
    done = False
    score = 0
    
    while not done:
        #env.render()
        action = env.action_space.sample()
        n_state, reward, done, info = env.step(action)
        score += reward


    print('Episode:{} Score:{}'.format(episode,score))

Episode:1 Score:-46
Episode:2 Score:-60
Episode:3 Score:-12
Episode:4 Score:-22
Episode:5 Score:-14
Episode:6 Score:-60
Episode:7 Score:-38
Episode:8 Score:-38
Episode:9 Score:-30
Episode:10 Score:-22


# Create the Deep Learning Model

In [10]:
#Creating the Deep Learning Model
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
import gym
import random
from rl.agents.dqn import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

In [11]:
states = env.observation_space.shape
actions = env.action_space.n

states, actions 

((1,), 3)

In [14]:
def build_model(states, actions):
    model = Sequential()
    #model.add(Flatten(input_shape=(1,states))) flatten is needed if we have more than 1 dimensional actions
    model.add(Dense(24, activation='relu', input_shape = states))
    model.add(Dense(24, activation='relu')) # Two dense layers with 24 hyperparameters is customary. 
    model.add(Dense(actions, activation='linear'))
    return model
states = (1,)       # input shape for scalar state
actions = 3         # number of possible actions
model = build_model(states, actions)
model.summary() #we can look at our model here)


Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 24)                48        
                                                                 
 dense_4 (Dense)             (None, 24)                600       
                                                                 
 dense_5 (Dense)             (None, 3)                 75        
                                                                 
Total params: 723
Trainable params: 723
Non-trainable params: 0
_________________________________________________________________


In [15]:
del model

In [16]:
model = build_model(states, actions)


# Training Model with Keras-RL

In [17]:

def build_agent(model, actions): #building the 'agent'
    policy = BoltzmannQPolicy() #Makes the agent try less optimal strategies for better results in the long term. 
    memory = SequentialMemory(limit=50000, window_length=1) #Stores past experiences
    dqn = DQNAgent(model=model, memory=memory, policy=policy, 
                  nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
    return dqn

dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)

Training for 50000 steps ...
Interval 1 (0 steps performed)


ValueError: Error when checking input: expected dense_6_input to have 2 dimensions, but got array with shape (1, 1, 1)