# Exploring actions of the DQN
here we will see the actions of a debug DQN run on the last of 1000 episodes.
Parameters:
```
    num_dcs = 10
    num_customers = 2
    num_commodities = 4
    orders_per_day = 2
    dcs_per_customer = 3
    demand_mean = 100
    demand_var = 20

    num_steps = 50
    num_episodes = 1000
```
**Commit hash:** 74ab129039874c95bbeee44585d11c75ed881e13

In [34]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
%config InlineBackend.figure_format = "retina"
from pathlib import Path
import os
project_dir = Path(globals()['_dh'][0]+"/..").resolve()
os.chdir(project_dir)
print("Running notebook from: " + os.path.abspath(""))

from IPython.core.interactiveshell import InteractiveShell
# InteractiveShell.ast_node_interactivity = "last_expr" # Default jupyter behavior
InteractiveShell.ast_node_interactivity = "all" # All expressions are shown.

Running notebook from: /Users/aleph/Documents/jota/tesis/ts_mcfrl


# 1 - Actions on episode 1


In [49]:
def add_is_m(df):
    df['unit_cost']=(df.customer_cost/df.customer_units).fillna(0)
    df['is_m'] = df.unit_cost==1000
    return df

In [75]:
details_1 = pd.read_csv("python/data/results/two_customers_dqn_debug/ep_1/movement_detail_report.csv")
details_1 = add_is_m(details_1)

In [52]:
to_customer_1 = details_1.query('source_kind=="DC"').query('destination_kind=="C"')

On episode 1, customer movements were distributed between many DCs

In [27]:
to_customer_1.groupby(["source_name","destination_name"]).size().reset_index().sort_values(["destination_name",'source_name'])

Unnamed: 0,source_name,destination_name,0
0,dcs_3,c_10,42
2,dcs_5,c_10,24
4,dcs_6,c_10,2
6,dcs_8,c_10,2
8,dcs_9,c_10,22
1,dcs_3,c_11,26
3,dcs_5,c_11,36
5,dcs_6,c_11,2
7,dcs_8,c_11,2
9,dcs_9,c_11,26


# 2 - Checking actions on episode 999

In [76]:
details_999 = pd.read_csv("python/data/results/two_customers_dqn_debug/ep_999/movement_detail_report.csv")
details_999 = add_is_m(details_999)

In [56]:
to_customer_999 = details_999.query('source_kind=="DC"').query('destination_kind=="C"')

By the end of episode 999, the DQN was serving from a single DC 100% of the time

In [57]:
to_customer_999.groupby(["source_name","destination_name"]).size().reset_index()

Unnamed: 0,source_name,destination_name,0
0,dcs_9,c_10,92
1,dcs_9,c_11,92


# 3 - Compare the number of big Ms
The number of Big Ms should have dropped from ep 1 to 999

Total big M numbers

In [78]:
display("To customer M summary ep 1")
to_customer_1.groupby(['is_m']).size().reset_index().sort_values(['is_m'])
display("To customer M summary ep 999")
to_customer_999.groupby(['is_m']).size().reset_index().sort_values(['is_m'])

'To customer M summary ep 1'

Unnamed: 0,is_m,0
0,False,2
1,True,182


'To customer M summary ep 999'

Unnamed: 0,is_m,0
0,True,184


Per Customer Big M numbers

In [70]:
display("To customer M summary ep 1")
to_customer_1.groupby(['is_m','source_name','destination_name']).size().reset_index().sort_values(['destination_name','is_m'])
display("To customer M summary ep 999")
to_customer_999.groupby(['is_m','source_name','destination_name']).size().reset_index().sort_values(['destination_name','is_m'])

'To customer M summary ep 1'

Unnamed: 0,is_m,source_name,destination_name,0
1,True,dcs_3,c_10,42
3,True,dcs_5,c_10,24
5,True,dcs_6,c_10,2
7,True,dcs_8,c_10,2
8,True,dcs_9,c_10,22
0,False,dcs_8,c_11,2
2,True,dcs_3,c_11,26
4,True,dcs_5,c_11,36
6,True,dcs_6,c_11,2
9,True,dcs_9,c_11,26


'To customer M summary ep 999'

Unnamed: 0,is_m,source_name,destination_name,0
0,True,dcs_9,c_10,92
1,True,dcs_9,c_11,92


# Total cost
Total cost actually went up in the last EP compared to the first.

In [74]:
to_customer_1.customer_cost.sum()
to_customer_999.customer_cost.sum()

91071240

91748000

Unnamed: 0,source_name,destination_name,source_time,destination_time,source_kind,destination_kind,movement_type,transportation_units,transportation_cost,inventory_units,inventory_cost,customer_units,customer_cost,unit_cost,is_m
5,dcs_3,c_10,5,5,DC,C,Delivery,0,0,0,0,638,638000,1000.0,True
6,dcs_3,c_11,5,5,DC,C,Delivery,0,0,0,0,358,358000,1000.0,True
9,dcs_3,c_10,5,5,DC,C,Delivery,0,0,0,0,638,638000,1000.0,True
10,dcs_3,c_11,5,5,DC,C,Delivery,0,0,0,0,358,358000,1000.0,True
18,dcs_3,c_10,6,6,DC,C,Delivery,0,0,0,0,640,640000,1000.0,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
454,dcs_3,c_10,49,49,DC,C,Delivery,0,0,0,0,635,635000,1000.0,True
455,dcs_9,c_10,50,50,DC,C,Delivery,0,0,0,0,637,637000,1000.0,True
456,dcs_9,c_11,50,50,DC,C,Delivery,0,0,0,0,359,359000,1000.0,True
457,dcs_9,c_10,50,50,DC,C,Delivery,0,0,0,0,637,637000,1000.0,True


# Conclusions
The DQN is not generalizing correctly, even for two Customers with two DCs out of 10. It should have at least learned which DCs are valid to avoid big M costs.

Could it be that the Big M cost signals are irrelevant compared to the overall size of the network? Could it be some neural net parameter tuning issue? WE need to get into the nitty gritty
