In [1]:
# Lets try to do some Reinforcement Learning -.-

### 5. Smart Charging Using Reinforcement Learning:
**Original Exercise:** <br>
Consider an electric taxi driver who can charge her vehicle at home. To simplify the problem, we assume that the vehicle always arrives at home at 2 p.m. and leaves the garage at 4 p.m. each day. We want to design an intelligent charging system (an automated agent). Therefore, instead of a flat charging rate, the charging agent adjusts the charging power every 15 minutes, which is bounded between 0 kW and the highest rate (e.g., 22 kW). Also, the vehicle's battery has a capacity that cannot be exceeded. After leaving the garage, the taxi needs enough energy to complete its working day. The energy demand is a stochastic value following a normal distribution (you should choose the parameters, e.g., 𝜇= 30 kWh, 𝜎 = 5 kWh) and must be generated exactly when the driver wants to leave. The agent’s goal is to avoid running out of energy (you should consider a very high penalty for running out of energy) and to minimize the recharging cost. The recharging cost follows an exponential function of the power (i.e., ![image.png](attachment:image.png)), where 𝛼𝑡 is the time coefficient and p is the charging rate.

The task is to create the environment (a very simple discrete event simulation) that receives the agent's decisions and returns the reward. In addition, you must define a Markov decision process, including states, actions, and reward function, and solve it using a reinforcement learning algorithm (e.g., deep q-network) to find optimal charging policies. To allow the use of discrete action methods, you can consider only limited charging options such as zero, low, medium, high.


**In Bulletpoints:**
- Problem description:
    - An electric taxi driver can charge her vehicle at home between 2 p.m. and 4 p.m. each day
    - The charging agent adjusts the charging power every 15 minutes within a range of 0 kW to 22 kW
    - The vehicle's battery has a limited capacity that cannot be exceeded
    - The taxi needs enough energy to complete its working day, which is a random value following a normal distribution (e.g., 𝜇= 30 kWh, 𝜎 = 5 kWh)
    - The agent’s goal is to avoid running out of energy (with a very high penalty) and to minimize the recharging cost, which is an exponential function of the power (i.e., ![image.png](attachment:image.png)), where 𝛼𝑡 is the time coefficient and p is the charging rate
- Task description:
    - Create the environment that simulates the charging process and the energy demand, and returns the reward to the agent based on its actions
    - Define a Markov decision process, including states, actions, and reward function, that models the problem
    - Solve the Markov decision process using a reinforcement learning algorithm (e.g., deep q-network) to find optimal charging policies
    - Consider only discrete action methods, such as zero, low, medium, high, for the charging power

In [2]:
# First try mit Hilfe von diesem Tutorial:
# https://www.section.io/engineering-education/building-a-reinforcement-learning-environment-using-openai-gym/

In [55]:
import numpy as np
from gym import Env
from gym.spaces import Box, Discrete
import random
import math

In [64]:
class CustomEnv(Env):
    def __init__(self):
        
        # a range of 0 kW to 22 kW
        #self.action_space = Box(low=0, high=22)
        #a range from zero, low, medium to high
        self.action_space = Discrete(4)

        # The vehicle's battery has a limited capacity that cannot be exceeded (69KWh)
        #self.observation_space = Box(low=0, high=69)
        self.observation_space = Box(low=np.array([0]), high = np.array([69]))

        # [20,40] KWh loaded battery at initialization
        self.state = 20 + random.randint(-10,10)

        # The charging agent adjusts the charging power every 15 minutes --> time is in [0,7] in 2 Hours
        self.time = 0


    def step(self, action):
        # Setting loading interval +1 /--> +15 minutes
        self.time += 1

        # Seting new battery state
        #zero
        load = 0
        if action == 2:
            #low
            load += 7
        if action == 3:
            #medium
            load += 14
        if action == 4:
            #high
            load += 22
        self.state += load

        # Calculating Negative Reward from Energy Costs
        reward = self.time * math.exp(load) * (-1)

        #Checking if 2 Hours are done
        #Giving panalty if car ran out of battery
        if self.time >= 8:
            #The taxi needs enough energy to complete its working day, 
            # which is a random value following a normal distribution (e.g., 𝜇= 30 kWh, 𝜎 = 5 kWh)
            kwh_needed = np.random.normal(loc=30, scale=5)
            print("kwh needed:" + str(kwh_needed))
            print("battery state:" + str(self.state))
            # The agent’s goal is to avoid running out of energy (with a very high penalty) 
            if kwh_needed > self.state:
                reward -= 100000000
            done = True
        else:
            done = False

        info = {}

        #print("Battery State: " + str(self.state))
        #print("Reward: " + str(reward))
        # Returning the step information
        return self.state, reward, done, info
    
    def reset(self):
        # [20,40] KWh loaded battery at initialization
        self.state = 20 + random.randint(-10,10)
        # The charging agent adjusts the charging power every 15 minutes --> time is in [0,7] in 2 Hours
        self.time = 0
        return self.state

In [65]:
env = CustomEnv()

In [66]:
env.action_space.sample()

0

In [67]:
episodes = 7 #7 days
for episode in range(1, episodes+1):
    print("__ Day " + str(episode) + " ___")
    state = env.reset()
    done = False
    score = 0 
    
    while not done:
        action = env.action_space.sample()
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))

__ Day 1 ___
kwh needed:29.361839521506543
battery state:57
Episode:1 Score:-12033738.273756767
__ Day 2 ___
kwh needed:25.035256627217468
battery state:68
Episode:2 Score:-15634974.327300526
__ Day 3 ___
kwh needed:34.453600617956326
battery state:92
Episode:3 Score:-21654564.54707498
__ Day 4 ___
kwh needed:40.65310940578921
battery state:57
Episode:4 Score:-9624149.1727935
__ Day 5 ___
kwh needed:25.97050506272285
battery state:78
Episode:5 Score:-6034965.083992453
__ Day 6 ___
kwh needed:24.132976501031376
battery state:68
Episode:6 Score:-13240724.090555258
__ Day 7 ___
kwh needed:33.641605802440786
battery state:87
Episode:7 Score:-28862514.81995464


In [68]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.optimizers import Adam

In [69]:
states = env.observation_space.shape
actions = env.action_space.n

In [70]:
actions

4

In [71]:
states

(1,)

In [72]:
def build_model(states, actions):
    model = Sequential()    
    #model.add(Dense(69, activation='relu', input_shape=(69,)))
    model.add(Dense(69, activation='relu', input_shape=states))
    model.add(Dense(36, activation='relu'))
    model.add(Dense(actions, activation='linear'))
    return model

In [73]:
del model

In [74]:
model = build_model(states, actions)

In [75]:
model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_12 (Dense)            (None, 69)                138       
                                                                 
 dense_13 (Dense)            (None, 36)                2520      
                                                                 
 dense_14 (Dense)            (None, 4)                 148       
                                                                 
Total params: 2,806
Trainable params: 2,806
Non-trainable params: 0
_________________________________________________________________


In [76]:
from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

In [77]:
def build_agent(model, actions):
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit=5000, window_length=1)
    dqn = DQNAgent(model=model, memory=memory, policy=policy, 
                  nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
    return dqn

In [78]:
dqn = build_agent(model, actions)
dqn.compile(Adam(learning_rate=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=60000, visualize=False, verbose=1)

Training for 60000 steps ...
Interval 1 (0 steps performed)


  updates=self.state_updates,


    1/10000 [..............................] - ETA: 43:56 - reward: -1096.6332kwh needed:26.327347175040543
battery state:65




   12/10000 [..............................] - ETA: 25:28 - reward: -3473.3383kwh needed:30.516970520707222
battery state:50
   23/10000 [..............................] - ETA: 14:08 - reward: -263250.0209kwh needed:28.761599734422948
battery state:24
   29/10000 [..............................] - ETA: 11:46 - reward: -3657061.1545kwh needed:23.087559443346244
battery state:17
   36/10000 [..............................] - ETA: 10:01 - reward: -6024395.3900kwh needed:28.560689535955632
battery state:108
   47/10000 [..............................] - ETA: 8:18 - reward: -5331458.9537kwh needed:22.28676930951992
battery state:74
   53/10000 [..............................] - ETA: 7:41 - reward: -4728373.4601kwh needed:30.239940726048363
battery state:83
   61/10000 [..............................] - ETA: 7:01 - reward: -4108906.1013kwh needed:31.031504824209783
battery state:82
   69/10000 [..............................] - ETA: 6:31 - reward: -3633083.3474kwh needed:27.31930512097836
ba

<keras.callbacks.History at 0x199212d7220>

In [79]:
results = dqn.test(env, nb_episodes=150, visualize=False)
print(np.mean(results.history['episode_reward']))

Testing for 150 episodes ...
kwh needed:19.528533360610425
battery state:67
Episode 1: reward: -39478.794, steps: 8
kwh needed:27.65883483956314
battery state:85
Episode 2: reward: -39478.794, steps: 8
kwh needed:29.274606369941612
battery state:73
Episode 3: reward: -39478.794, steps: 8
kwh needed:35.96839824282466
battery state:79
Episode 4: reward: -39478.794, steps: 8
kwh needed:31.798328665893063
battery state:79
Episode 5: reward: -39478.794, steps: 8
kwh needed:27.47179632636866
battery state:81
Episode 6: reward: -39478.794, steps: 8
kwh needed:34.520607899687256
battery state:76
Episode 7: reward: -39478.794, steps: 8
kwh needed:31.700415476314596
battery state:74
Episode 8: reward: -39478.794, steps: 8
kwh needed:32.54540693502544
battery state:77
Episode 9: reward: -39478.794, steps: 8
kwh needed:30.456730945534115
battery state:76
Episode 10: reward: -39478.794, steps: 8
kwh needed:29.67526474037167
battery state:85
Episode 11: reward: -39478.794, steps: 8
kwh needed:27.421