The environment can provide a variety of rewards to guide the bot's decision-making process and encourage desired behaviors. For instance, positive rewards can be given when the bot successfully identifies and removes weeds, waters the plants optimally, or performs efficient soil nutrient management. Rewards can also be tied to achieving specific gardening goals, such as promoting healthy plant growth, maximizing crop yield, or maintaining an aesthetically pleasing garden layout. On the other hand, negative rewards or penalties can be assigned when the bot damages plants, fails to address plant diseases promptly, or exhibits inefficient resource utilization. By designing a reward system that aligns with the goals of gardening, the autonomous bot can learn to navigate the environment, adapt its actions, and develop effective gardening strategies that lead to a flourishing garden ecosystem.

## TLDR

* The game: We want our robot to learn to garden (with a number of additional constraints)
* The problem: We can't just have our robot flail around and try everything until it somehow stumbles upon the optimal solution
* The solution: We create a method with which to evaluate the current environment
* The analogy: In chess, there are many ways to evaluate the board long before either opponent wins. This reward function does that but for gardening.
* A potential confusion: This is different from the value produced by the critic model. The reward here is produced by the environment and is non-changing, objective, whatever makes the most sense. The value produced by the critic model is gradient based and is analogous to the chess player's brain. It gets better by practice and evaluates its current chess playing method to eventually maximize the reward provided by the environment.

## A potential recipe

note: this would be relevant for the simulation; we'd develop a new one for the actual thing

    1. Legality of state:
        - x for x overlapping area
    
    2. Conservation of water:
        -1 for each water used
        
    3. Diversity of garden:
        + x for x > 2 plant species
        0 for 2 species
        - x for x < 2 plant species
    
    4. Density of garden
        +% for %density > 75
        -% for %density < 75

    5. Health
        penalty for dying plants

In [None]:
from util_bs.quadtree import *

In [None]:
class reward():
    def __init__(self, environment):
        self.environment = environment # this will be in the form of a state dict
        
        
    def _overlap_penalty(self):
        plant_list = self.environment["plants"]
        qt = QuadTree(0, 0, 183, 91)
        
        for plant in plant_list:
            qt.insert(QuadTreeNode(plant.x_coord - plant.radius, 
                                         plant.y_coord - plant.radius, 
                                         plant.x_coord + plant.radius, 
                                         plant.y_coord + plant.radius))
        overlap_area = 0
        for i in range(len(plant_list)):
            for j in range(i+1, len(plant_list)):
                overlap = qt.calculate_overlap(plant_list[i], plant_list[j])
                if overlap > 0: overlap_area += overlap
        return overlap_area / (183 * 91) - 1

    def _water_penalty(self, dampening_factor=0.01):
        return dampening_factor * self.environment["water"]
    
    def _diversity_score(self):
        plant_list = self.environment["plants"]
        species_list = []
        for plant in plant_list:
            if plant.species not in species_list:
                species_list.append(plant.species)
        if len(plant_list) == 0:
            return 0
        if len(species_list) / len(plant_list) < 0.5:
            return  -len(species_list) / len(plant_list)
        return len(species_list) / len(plant_list)
    
    def _death_penalty(self):
        penalty = 0
        plant_list = self.environment["plants"]
        for plant in plant_list:
            if not plant.alive:
                penalty += 1
        return - penalty / len(plant_list)
    
    def get_composite_score(self):
        unnormalized_score = self._overlap_penalty() + self._water_penalty() + self._diversity_score() - self._death_penalty()
        return unnormalized_score / 4
