## Climate Change and the Optimal Decisions of Vineyard Farmers

#### Background

Wine industry is highly sensitive to the nuances of climate and terroir, is facing unprecedented challenges due to climate change. And climate change will affect vineyards production through shifts in temperature and precipitation patterns. 

One response to the climatic shift is "migration" -- relocate to a more climatically suitable area. But this strategy might incurr a potential loss of location-specific premium associated with established wine-producing regions.

Another possible stretagy is "adaptation". This includes introducing new grape varieties, using new techniques, rethinking vineyard orientations, etc. And this will increase the cost of production. 

#### Problem Statement

We aim to employ a Reinforcement Learning (RL) model to optimize strategies in the context of climate change. The model is designed to operate under varying temperature scenarios, with each scenario influencing the vineyard's state and the effectiveness of potential adaptation actions.  

The RL model framework involves:

__1.Space State ($S$):__
Defined by a range of temperature scenarios and other relevant vineyard conditions. Each state $s\in S$ represents specific environmental and climatic conditions.

__2.Action Space($A$):__

__3.Object Function:__


\begin{eqnarray}
\text{minimize}~\mathcal{C}_T &=& \sum{i\in{1,\dotsc,n}}c_{i}x_{i} \\
\text{subject to}~\sum_{i\in{1,\dotsc,n}}p_{i}x_{i} & \leq & P_{\text{max}}\\
\text{and}~\mathbf{C}\mathbf{x} & \leq & \mathbf{b} \\
\text{and}~x_{i}&\in&{0,1}\qquad{i=1,2,\dots,n}
\end{eqnarray}

In this model:

* $c_{i}\geq 0$ denotes the cost of adopting strategy$i$. 
* $p_{i}$ denotes the potential impact on the location premium by adopting strategy$i$, with $P_{max}$ representing the maximum permissible impact on the premium.
* $x_{i}\in{0,1}$ represents the binary decision of adopting or not adopting strategy$i$. 


#### List of Tasks
* __Task 1__: Specify the vectors
* __Task 2__: 

### Setup

In [1]:
include("CodeLib.jl");

In [2]:
using Plots
using Colors

## Task 1: Specify the Vectors

In this problem, we will choose Pinot Grins as baseline.

The optimal temperature for Pinot Grins to grow is between 13-15 celsius degree (55.4 - 59 Fahrenheit). 

There are several strategies can be taken

### Adaptation Strategies

__1.__ Adjust harvest dates

* Estimated Cost: $0 per acre
* Temperature Offset: 1 per 14 days - but date adjustment cannot be larger than 30 days

__2.__ Switch to another cultivar (assume from Pinot Grins to Cabernet Sauvignon)

* Estimated Cost: $824.6 per acre (the yield per harvested acre for Pinot Grins is 4.74, and price pre ton is 1800 -> 8532 per acre; the yield per harvested acre for Pinot Grins is 3.01, and price pre ton is 2560 -> 7705.6 per acre)
* Temperature Offset: 9 the optimal temperature for Cabernet Sauvignon is 16.5-20 (61.7-68)

__3.__ Production technology -- full-capacity watering

* Estimated Cost: 500-1200 USD per acre
* Temperature Offset: 2-4

__4.__ Production technology -- canopy manipulation

* Estimated Cost: 300-800 USD per acre
* Temperature Offset: 1-3

__5.__ Production technology -- changing row orientation

* Estimated Cost: 5000-10000 USD per acre (one-time cost)
* Temperature Offset: 2-4


### Migration

__6.__ Northward movement

* Estimated Cost (premium loss): $6930 per acre
* Temperature Offset: will reach the optimal temperature

In [3]:
# build a model of the states (temperature)
number_of_rows = 121 # temperature range from 0 - 120 fahrenheit
number_of_cols = 121

nstates = (number_of_rows*number_of_cols);
min_temp = 0
max_temp = 120
min_temp_range = 0:1:120
max_temp_range = 0:1:120 
𝒮 = [(min_temp, max_temp) for min_temp in min_temp_range for max_temp in max_temp_range if min_temp <= max_temp]

nactions = 6
𝒜 = 1:nactions
action_mapping = Dict(1 => "HarvestDate", 2 => "ChangeVineType", 3 => "FullCapacityWatering", 
                      4 => "CanopyManipulation", 5 => "ChangeRowOrientation", 6 => "NorthwardMovement")

Dict{Int64, String} with 6 entries:
  5 => "ChangeRowOrientation"
  4 => "CanopyManipulation"
  6 => "NorthwardMovement"
  2 => "ChangeVineType"
  3 => "FullCapacityWatering"
  1 => "HarvestDate"

In [4]:
# define costs and temperature offsets
action_costs = [550.00, 824.60, 750.00, 550.00, 7500.00, 6930.00] 
# temperature_offsets = [1, 9.00, 3.00, 2.00, 3.00, 100.00]

default_reward = 0.00

current_temp_min = 60
current_temp_max = 70

optimal_temp_min = 55
optimal_temp_max = 59

59

In [5]:
# define specific rewards
optimal_condition_reward = 100000.0
suboptimal_condition_reward = -5000.0

# function to adjust rewards based on strategy cost and temperature offset
function adjust_reward(state::Tuple{Int, Int}, action::Int)
    cost = action_costs[action]

    # Determine if the state meets the optimal temperature criteria
    if state[1] <= optimal_temp_max || state[2] >= optimal_temp_min
        base_reward = optimal_condition_reward
    else
        base_reward = suboptimal_condition_reward
    end

    # Calculate the final reward
    final_reward = base_reward - cost
    return final_reward
end

# set up rewards
rewards = Dict{Tuple{Tuple{Int, Int}, Int}, Float64}()
for state in 𝒮
    for action in 𝒜
        reward = adjust_reward(state, action)
        rewards[(state, action)] = reward
    end
end

# set up absorbing state
absorbing_state_set = Set{Tuple{Int, Int}}()
push!(absorbing_state_set, (optimal_temp_min,optimal_temp_max))

Set{Tuple{Int64, Int64}} with 1 element:
  (55, 59)

In [6]:
# rewards

In [7]:
# function rbf(x::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}},y::Tuple{Int,Int}; σ = 1.0)::Float64
#     d = sqrt((x[1] - y[1])^2 + (x[2] - y[2])^2);
#     return exp(-d/(2*σ^2))
# end;

# σ = 1.0

# # reward shaping
# is_reward_shaping_on = true;

# if (is_reward_shaping_on == true)
#     for s in 𝒮
#         for s′ in 𝒮
#            coordinate = (s,s′);
#             if (haskey(rewards, coordinate) == false && in(coordinate,absorbing_state_set) == false)
#                 rewards[coordinate] = default_reward + optimal_condition_reward*rbf(coordinate, (optimal_temp_min,optimal_temp_max), σ = σ);
#             end
#         end
#     end
# end

In [8]:
world_model = build(MyRectangularGridWorldModel, (
        nrows=number_of_rows, ncols=number_of_cols, rewards = rewards, actions = 𝒜 ));

In [9]:
world_model.moves

Dict{Int64, Tuple{Int64, Int64}} with 6 entries:
  5 => (-3, -3)
  4 => (-2, -2)
  6 => (-5, -11)
  2 => (-9, -9)
  3 => (-3, -3)
  1 => (-1, -1)

In [10]:
# using Pkg
# Pkg.add("Plots")

In [11]:
# rewards_array = zeros(121, 121)  # Initialize an array filled with zeros
# for ((x, y), reward) in world_model.rewards
#     if x in 1:121 && y in 1:121
#         rewards_array[x, y] = reward  # Assign the reward using separate indices
#     end

# end
# using Plots
# heatmap(rewards_array, color=:viridis, aspect_ratio=:equal, 
#         xlabel="X-axis", ylabel="Y-axis", title="Rewards Heatmap")

In [12]:
# rewards_array = zeros(121, 121)  # Initialize an array filled with zeros
# for ((x, y), reward) in world_model.rewards
#     if x in 1:121 && y in 1:121
#         rewards_array[x, y] = reward  # Assign the reward using separate indices
#     end

# end
# using Plots
# heatmap(rewards_array, color=:viridis, aspect_ratio=:equal, 
#         xlabel="X-axis", ylabel="Y-axis", title="Rewards Heatmap")

## Task 2: Q-learning agent

In [13]:
α = 0.6;  # learning rate
γ = 0.95; # discount rate
nstates = (number_of_rows*number_of_cols);
agent_model = build(MyQLearningAgentModel, (
    states = 𝒮,
    actions = 𝒜,
    α = α,
    γ = γ,
    Q = zeros(nstates,nactions)
));

In [14]:
# agent_model.Q

## Task 3: Simulate and visualize

In [15]:
startstate = (current_temp_min,current_temp_max); # start position
number_of_episodes = 20;
number_of_iterations = 100;

In [16]:
my_Q_dictionary = Dict{Tuple{Int,Int}, Array{Float64,2}}();
coordinate = startstate;
for i ∈ 1:number_of_episodes
    # run an episode, and grab the Q
    result = simulate(agent_model, world_model, coordinate, number_of_iterations, ϵ = 0.7);
    agent_model.Q = result.Q;
end
my_Q_dictionary[coordinate] = agent_model.Q;

In [17]:
my_Q_dictionary

Dict{Tuple{Int64, Int64}, Matrix{Float64}} with 1 entry:
  (60, 70) => [-1.97542e13 -1.97456e13 … -1.97526e13 -1.97466e13; 0.0 0.0 … 0.0…

In [18]:
Q = my_Q_dictionary[startstate];
my_π = policy(Q);

In [19]:
Q

14641×6 Matrix{Float64}:
 -1.97542e13  -1.97456e13  -1.97431e13  -1.9746e13   -1.97526e13  -1.97466e13
  0.0          0.0          0.0          0.0          0.0          0.0
  0.0          0.0          0.0          0.0          0.0          0.0
  0.0          0.0          0.0          0.0          0.0          0.0
 -1.16678e13  -1.19749e13  -1.16697e13  -1.19908e13  -1.2007e13   -1.17162e13
  0.0          0.0          0.0          0.0          0.0          0.0
  0.0          0.0          0.0          0.0          0.0          0.0
  0.0          0.0          0.0          0.0          0.0          0.0
  0.0          0.0          0.0          0.0          0.0          0.0
  0.0          0.0          0.0          0.0          0.0          0.0
 -9.34243e12  -9.35256e12  -9.34698e12  -9.18148e12  -9.21247e12  -9.42391e12
  0.0          0.0          0.0          0.0          0.0          0.0
  0.0          0.0          0.0          0.0          0.0          0.0
  ⋮                            