# DQN Kubernetes

---

In this notebook, we train a Agent to solve a scenario with pods and variable load generated by Apache HTTP server benchmarking tool

### 1. Start the Environment

We begin by importing some necessary packages in order to connect Kubernetes and Prometheus
minikube service -n monitoring prometheus-service

pip install prometheus-api-client

pip install kubernetes

conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch

pip install matplotlib

In [1]:
import numpy as np
import time
from kubernetes import client, config
from prometheus_api_client import PrometheusConnect
import k8senv 

In [2]:
CUDA_LAUNCH_BLOCKING="1"

Next, we will start the environment. 
We create a Deployment that the agent will assign number of pods, if already exist will send a error (409) 
Note: The deployments exists even the number of pods is 0

In [None]:
config.load_kube_config()
apps_api = client.AppsV1Api()
deployment = client.V1Deployment()
deployment.api_version = "apps/v1"
deployment.kind = "Deployment"
deployment.metadata = client.V1ObjectMeta(name="httpdia")
name = "httpdia"
spec = client.V1DeploymentSpec(
    selector=client.V1LabelSelector(match_labels={"app":"httpdia"}),
    template=client.V1PodTemplateSpec(),
)
container = client.V1Container(
    image="httpd",
    resources = {"limits": {"cpu":"500m"} , "requests": {"cpu":"200m"}},
    name=name, 
)
spec.template.metadata = client.V1ObjectMeta(
    name="httpdia",
    labels={"app":"httpdia"},
)
spec.template.spec = client.V1PodSpec(containers = [container])
dep = client.V1Deployment(
    metadata=client.V1ObjectMeta(name=name),
    spec=spec,
)
dep.spec.replicas = 2

In [None]:
try:
    apps_api.create_namespaced_deployment(namespace="default", body=dep)
except:
    print("Pod Existente")


We can modify the parameter manual and we can validate with
kubectl get pods

In [None]:
dep.spec.replicas = 3
changeddeploy= apps_api.replace_namespaced_deployment(name=name, namespace="default", body=dep)

We chech the conection with the enviroment

We check number of pods

In [None]:
config.load_kube_config()
v1=client.CoreV1Api()
ret=v1.list_namespaced_pod('default',watch=False)
pods=ret.items
pods_names=[pod.metadata.name for pod in pods]
number_pods= 0
for pod in pods_names:
    if (pod.find("httpdia")>-1):
        number_pods = number_pods+1
print("Number of Pods: " + str(number_pods))

We check and read metrics from Prometheus , number of pods from kubernetes and score

In [3]:
lapse="1m"
namespace_kubernetes ="default"
hostprometheus ="http://127.0.0.1:55167/"

In [4]:
#k8senv.check_kuber_prometheus(lapse,namespace_kubernetes,hostprometheus)

### 2. Examine the State and Action Spaces

The simulation contains a single agent that change the number of pods.  At each time step, it has one action at its disposal:

- `n` - n pods assigned

The state space has `9` dimensions and contains Number of Pods assigned, File descriptors, Receive Packets, Transmit Packets, 
  Dropped Packets, CPU Usage Seconds, CPU Throttled Seconds, Memory Working Bytes, Memory Usage bytes. 
.  The reward is calculated : score = (1  - (float(dropped_packets)/number_pods) - float(cpu_throttled_seconds)) / number_pods. 
we penalize cpu throttled cpu and dropped pakets, the score is divided by number of pods in order to reward the minimal use of pods.

Run the code cell below to print some information about the environment. ( check that is generating load to the pods using CreatePodLoadGenerator Notebook)

In [5]:
# We initialize the enviroment
initialpods = 10
#env_info = k8senv.step(initialpods,hostprometheus,lapse,namespace_kubernetes)

### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action (uniformly) at random with each time step.    

In [6]:
#state = env_info['vector_observations']            # get the current state
score = 0                                          # initialize the score
rangenumberpodsrandom = 30
#print(state)

In [11]:
archivocarga="cargapod.csv"
dataloaded = False

In [12]:
k8senv.enviroment(archivocarga)

True


In [13]:
import pandas as pd
data = pd.read_csv(archivocarga)

In [14]:
print(data)

       pods  file_descriptors  receive_packets  transmit_packets  \
0      10.0               0.0              0.0               0.0   
1      10.0               0.0              0.0               0.0   
2      10.0               0.0              0.0               0.0   
3      10.0               0.0              0.0               0.0   
4      10.0               0.0              0.0               0.0   
...     ...               ...              ...               ...   
14650  20.0             308.3           7219.0            5033.7   
14651  20.0             307.6           7046.1            4859.8   
14652  20.0             296.5           6769.0            4585.5   
14653  20.0             270.3           5923.1            3991.4   
14654  20.0             270.3           5923.1            3991.4   

       dropped_packets  cpu_usage_seconds  cpu_throttled_seconds  \
0                  0.0           0.000754                    0.0   
1                  0.0           0.000650      

In [15]:
max_t = 1
t=0

while True:
    t=t+1
    action = np.random.randint(rangenumberpodsrandom)                         # select an action
    
    ## next two lines is comented when using loadfile
    #env_info = k8senv.step(action,hostprometheus,lapse,namespace_kubernetes, archivocarga)  # send the action to the environment
    #time.sleep(4)
    env_info = k8senv.step(action,hostprometheus,lapse,namespace_kubernetes)  # send the action to the environment
    next_state = env_info['vector_observations']                              # get the next state
    reward = env_info['rewards'][0]                                           # get the reward
    done = env_info['local_done'][0]                                             # see if episode has finished
    if t >= max_t :
        done=True
    score += reward                                                           # update the score
    state = next_state                                                        # roll over the state to next time step
    
    print(" action: " + str(action) +" reward:" + str(reward) )
    if done:                                                                  # exit loop if episode finished
        break
    
print("Score: {}".format(score))

True
proceso con archivo
 action: 20 reward:-2.32
Score: -4.64


When finished, you can close the environment.

In [None]:
##env.close()

### 4. Train Agent DQN

In [None]:
from collections import deque
import torch

In [None]:
from dqn_agent import Agent
agent = Agent(state_size=9, action_size=20, seed=0)

In [None]:
def dqn(n_episodes=5, max_t=100, eps_start=1.0, eps_end=0.01, eps_decay=0.995):
    """Deep Q-Learning.
    
    Params
    ======
        n_episodes (int): maximum number of training episodes
        max_t (int): maximum number of timesteps per episode
        eps_start (float): starting value of epsilon, for epsilon-greedy action selection
        eps_end (float): minimum value of epsilon
        eps_decay (float): multiplicative factor (per episode) for decreasing epsilon
    """
    scores = []                        # list containing scores from each episode
    scores_window = deque(maxlen=100)  # last 100 scores
    eps = eps_start                    # initialize epsilon
    initialaction=10
    for i_episode in range(1, n_episodes+1):
        # The reset is step too
        env_info = k8senv.step(initialaction,hostprometheus,lapse,namespace_kubernetes, archivocarga)
        state = np.array(env_info['vector_observations'])
        score = 0
        for t in range(max_t):
            action = agent.act(state, eps)
            action = action.astype(int)
            action= action.item()
            print("Iteracion:"+str(t)+" Accion:"+str(action)+" Score:"+str(score))
            ## Next 3 lines is comented when using load file
            #env_info = k8senv.step(action,hostprometheus,lapse,namespace_kubernetes, archivocarga)
            #print(env_info['vector_observations'])
            #time.sleep(2)
            env_info = k8senv.step(action,hostprometheus,lapse,namespace_kubernetes, archivocarga)
            print(env_info['vector_observations'])
            next_state =  np.array(env_info['vector_observations'])
            reward = env_info['rewards'][0] 
            done = False
            agent.step(state, action, reward, next_state, done)
            state = next_state
            score += reward
            if t >= max_t :
                done=True            
            if done:
                break 
        scores_window.append(score)       # save most recent score
        scores.append(score)              # save most recent score
        eps = max(eps_end, eps_decay*eps) # decrease epsilon
        print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)), end="")
        if i_episode % 100 == 0:
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
        if np.mean(scores_window)>=13.0:
            print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i_episode-100, np.mean(scores_window)))
            torch.save(agent.qnetwork_local.state_dict(), 'checkpoint_project1.pth')
            break
    return scores

scores = dqn()

In [None]:
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(len(scores)), scores)
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show()

In [None]:
agent.qnetwork_local.load_state_dict(torch.load('checkpoint_project1.pth'))
env_info = env.reset(train_mode=False)[brain_name] # reset the environment
score = 0

In [None]:
while True:
    action = agent.act(state).item()
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}".format(score))

In [None]:
env.close()