In [1]:
# Lets try to do some Reinforcement Learning -.-

### 5. Smart Charging Using Reinforcement Learning:
**Original Exercise:** <br>
Consider an electric taxi driver who can charge her vehicle at home. To simplify the problem, we assume that the vehicle always arrives at home at 2 p.m. and leaves the garage at 4 p.m. each day. We want to design an intelligent charging system (an automated agent). Therefore, instead of a flat charging rate, the charging agent adjusts the charging power every 15 minutes, which is bounded between 0 kW and the highest rate (e.g., 22 kW). Also, the vehicle's battery has a capacity that cannot be exceeded. After leaving the garage, the taxi needs enough energy to complete its working day. The energy demand is a stochastic value following a normal distribution (you should choose the parameters, e.g., 𝜇= 30 kWh, 𝜎 = 5 kWh) and must be generated exactly when the driver wants to leave. The agent’s goal is to avoid running out of energy (you should consider a very high penalty for running out of energy) and to minimize the recharging cost. The recharging cost follows an exponential function of the power (i.e., ![image.png](attachment:image.png)), where 𝛼𝑡 is the time coefficient and p is the charging rate.

The task is to create the environment (a very simple discrete event simulation) that receives the agent's decisions and returns the reward. In addition, you must define a Markov decision process, including states, actions, and reward function, and solve it using a reinforcement learning algorithm (e.g., deep q-network) to find optimal charging policies. To allow the use of discrete action methods, you can consider only limited charging options such as zero, low, medium, high.


**In Bulletpoints:**
- Problem description:
    - An electric taxi driver can charge her vehicle at home between 2 p.m. and 4 p.m. each day
    - The charging agent adjusts the charging power every 15 minutes within a range of 0 kW to 22 kW
    - The vehicle's battery has a limited capacity that cannot be exceeded
    - The taxi needs enough energy to complete its working day, which is a random value following a normal distribution (e.g., 𝜇= 30 kWh, 𝜎 = 5 kWh)
    - The agent’s goal is to avoid running out of energy (with a very high penalty) and to minimize the recharging cost, which is an exponential function of the power (i.e., ![image.png](attachment:image.png)), where 𝛼𝑡 is the time coefficient and p is the charging rate
- Task description:
    - Create the environment that simulates the charging process and the energy demand, and returns the reward to the agent based on its actions
    - Define a Markov decision process, including states, actions, and reward function, that models the problem
    - Solve the Markov decision process using a reinforcement learning algorithm (e.g., deep q-network) to find optimal charging policies
    - Consider only discrete action methods, such as zero, low, medium, high, for the charging power

In [2]:
# First try mit Hilfe von diesem Tutorial:
# https://www.section.io/engineering-education/building-a-reinforcement-learning-environment-using-openai-gym/

In [3]:
import numpy as np
from gym import Env
from gym.spaces import Box, Discrete
import random

In [129]:
class CustomEnv(Env):
    def __init__(self):
        
        # a range of 0 kW to 22 kW
        #self.action_space = Box(low=0, high=22)
        #a range from zero, low, medium to high
        self.action_space = Discrete(4)

        # The vehicle's battery has a limited capacity that cannot be exceeded (69KWh)
        #self.battery_space = Box(low=0, high=69)
        self.battery_space = Box(low=np.array([0]), high = np.array([69]))

        # [20,40] KWh loaded battery at initialization
        self.battery_state = 30 + random.randint(-10,10)

        # The charging agent adjusts the charging power every 15 minutes --> time is in [0,7] in 2 Hours
        self.time = 0


    def step(self, action):
        #print("-- New Step --")
        # Setting loading interval -1 /--> -15 minutes
        self.time += 1


        # Seting new battery state
        #zero
        load = 0
        if action == 2:
            #low
            load += 7
        if action == 3:
            #medium
            load += 14
        if action == 4:
            #high
            load += 22
        self.battery_state += load


        # Calculating Negative Reward from Energy Costs
        kw_price = 1 #random.randint(0,3)
        reward = kw_price * load * (-1)


        #Checking if 2 Hours are done
        #Giving panalty if car ran out of battery
        if self.time >= 7:
            #The taxi needs enough energy to complete its working day,
            # which is a random value following a normal distribution
            # (e.g., 𝜇= 30 kWh, 𝜎 = 5 kWh)
            kwh_needed = np.random.normal(loc=30, scale=5)
            print("kwh needed:" + str(kwh_needed))
            print("battery state:" + str(self.battery_state))
            # The agent’s goal is to avoid running out of energy (with a very high penalty) 
            if kwh_needed > self.battery_state:
                reward -= 100
            #print("Day done!")
            day_done = True
        else:
            day_done = False

        #print("Battery State: " + str(self.battery_state))
        #print("Reward: " + str(reward))
        # Returning the step information
        return self.battery_state, reward, day_done
    
    #def render(self):
        # blabala
    
    def reset(self):
        # [20,40] KWh loaded battery at initialization
        self.battery_state = 30 + random.randint(-10,10)
        # The charging agent adjusts the charging power every 15 minutes --> time is in [0,7] in 2 Hours
        self.time = 0
        #return self.battery_state

In [130]:
env = CustomEnv()

In [131]:
episodes = 7 #7 days
for episode in range(1, episodes+1):
    print("__ Day " + str(episode) + " ___")
    #battery_state = 
    env.reset()
    day_done = False
    score = 0 
    
    while not day_done:
        action = env.action_space.sample()
        n_state, reward, day_done = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))

__ Day 1 ___
kwh needed:29.42905358545278
battery state:65
Episode:1 Score:-28
__ Day 2 ___
kwh needed:29.99224850232707
battery state:76
Episode:2 Score:-42
__ Day 3 ___
kwh needed:37.11428092348993
battery state:27
Episode:3 Score:-107
__ Day 4 ___
kwh needed:31.505417424263786
battery state:54
Episode:4 Score:-28
__ Day 5 ___
kwh needed:26.08148921550992
battery state:81
Episode:5 Score:-56
__ Day 6 ___
kwh needed:21.20943629503527
battery state:59
Episode:6 Score:-35
__ Day 7 ___
kwh needed:27.363575893974534
battery state:73
Episode:7 Score:-42


In [138]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.optimizers import Adam

In [121]:
states = env.battery_space.shape
#states = int(69)
states

(1,)

In [120]:
actions = env.action_space.n
actions

4

In [143]:
def build_model(states, actions):
    model = Sequential()    
    model.add(Dense(69, activation='relu', input_shape=(69,)))
    model.add(Dense(36, activation='relu'))
    model.add(Dense(actions, activation='linear'))
    return model

In [144]:
model = build_model(states, actions)

TypeError: VariableMetaclass._variable_v1_call() got an unexpected keyword argument 'experimental_enable_variable_lifting'