# Introduction
In this experiment we test the json generation code that I just implemented. The goal is to:

- Run a standard example run for ReAgent using ReAgent_workflow
- Read the input json input pandas, regenerate the json for ReAgent, run the model again
- Check if the model still runs successfully (should get score of 200)

# Run vanilla model
This went swimmingly, the run finished and show a score of 200. The run is in the `cart_pole_vanilla_run` directory. 

# Read jsonlines into pandas
Now we read the jsonlines input data used for `cart_pole_vanilla_run` into pandas. We use the `jsonlines` package to get the json into Python. 

In [2]:
import jsonlines
import os
import pandas as pd

#with jsonlines.open('generated_cartpole_data.json') as reader:
#    for obj in reader:
#        print(obj)

print(os.getcwd())
json_data = [obj for obj in jsonlines.open('small_example_generated_json.json')]

/home/paul/reagent_experiments/02april2020_json_generation_test


In [3]:
json_data[0]

{'ds': '2019-01-01',
 'mdp_id': '0',
 'sequence_number': 0,
 'state_features': {'0': 0.008422686718404293,
  '1': -0.042249470949172974,
  '2': 0.02246319130063057,
  '3': -0.020789798349142075},
 'action': '1',
 'reward': 1.0,
 'action_probability': 0.975,
 'possible_actions': ['0', '1'],
 'metrics': {'reward': 1.0}}

Next we need to generate the appropriate Pandas DataFrame. First we make a DataFrame from the json data using `json_normalize`:

In [4]:
df = pd.json_normalize(json_data)
df.head()

Unnamed: 0,ds,mdp_id,sequence_number,action,reward,action_probability,possible_actions,state_features.0,state_features.1,state_features.2,state_features.3,metrics.reward
0,2019-01-01,0,0,1,1.0,0.975,"[0, 1]",0.008423,-0.042249,0.022463,-0.02079,1.0
1,2019-01-01,0,1,1,1.0,0.975,"[0, 1]",0.007578,0.152543,0.022047,-0.306302,1.0
2,2019-01-01,0,2,1,1.0,0.975,"[0, 1]",0.010629,0.347344,0.015921,-0.591951,1.0
3,2019-01-01,0,3,0,1.0,0.025,"[0, 1]",0.017575,0.54224,0.004082,-0.879576,1.0
4,2019-01-01,0,4,1,1.0,0.975,"[0, 1]",0.02842,0.347062,-0.013509,-0.585612,1.0


A number of static stuff can be dropped from the dataframe:

In [5]:
# Note that metrics reward is autoset to reward
df = df.drop(columns=['ds', 'action_probability', 'possible_actions', 'metrics.reward'])
df.head()

Unnamed: 0,mdp_id,sequence_number,action,reward,state_features.0,state_features.1,state_features.2,state_features.3
0,0,0,1,1.0,0.008423,-0.042249,0.022463,-0.02079
1,0,1,1,1.0,0.007578,0.152543,0.022047,-0.306302
2,0,2,1,1.0,0.010629,0.347344,0.015921,-0.591951
3,0,3,0,1.0,0.017575,0.54224,0.004082,-0.879576
4,0,4,1,1.0,0.02842,0.347062,-0.013509,-0.585612


Next we need to rename some columns for the state features:

In [6]:
df = df.rename(columns={'state_features.0': '0', 
                   'state_features.1': '1',
                   'state_features.2': '2',
                   'state_features.3': '3'})
df.head()

Unnamed: 0,mdp_id,sequence_number,action,reward,0,1,2,3
0,0,0,1,1.0,0.008423,-0.042249,0.022463,-0.02079
1,0,1,1,1.0,0.007578,0.152543,0.022047,-0.306302
2,0,2,1,1.0,0.010629,0.347344,0.015921,-0.591951
3,0,3,0,1.0,0.017575,0.54224,0.004082,-0.879576
4,0,4,1,1.0,0.02842,0.347062,-0.013509,-0.585612


Finally, we need to set the index:

In [7]:
df = df.set_index(['mdp_id', 'sequence_number', 'action', 'reward'])
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,0,1,2,3
mdp_id,sequence_number,action,reward,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0,1,1.0,0.008423,-0.042249,0.022463,-0.02079
0,1,1,1.0,0.007578,0.152543,0.022047,-0.306302
0,2,1,1.0,0.010629,0.347344,0.015921,-0.591951
0,3,0,1.0,0.017575,0.54224,0.004082,-0.879576
0,4,1,1.0,0.02842,0.347062,-0.013509,-0.585612


## Generate new json input data 
Now we have the correct DataFrame, we can start to generate new json output:

In [8]:
from ReAgent_workflow.process_json import reagent_df_to_json_lines

json_lines_input = reagent_df_to_json_lines(df, 
                         ds_value = '2019-01-01', 
                         mdp_id_var = 'mdp_id',
                         sequence_number_var = 'sequence_number',
                         possible_actions = ['0', '1'], 
                         action_var = 'action',
                         reward_var = 'reward',
                         action_probability = 0.975,
                         progress=False,
                         indent=2)
print(json_lines_input[2])

{
  "state_features": {
    "0": 0.010628562420606613,
    "1": 0.34734421968460083,
    "2": 0.01592136360704899,
    "3": -0.591950535774231
  },
  "ds": "2019-01-01",
  "mdp_id": "0",
  "sequence_number": 2,
  "possible_actions": [
    "0",
    "1"
  ],
  "action": "1",
  "reward": 1.0,
  "metrics": {
    "reward": 1.0
  },
  "action_probability": 0.975
}


This compared to the reference from the documentation of ReAgent:

    {
        "ds": "2019-01-01",
        "mdp_id": "0",
        "sequence_number": 0,
        "state_features": {
            "0": -0.04456399381160736,
            "1": 0.04653909429907799,
            "2": 0.013269094750285149,
            "3": -0.020998265594244003
        },
        "action": "0",
        "reward": 1.0,
        "action_probability": 0.975,
        "possible_actions": [
            "0",
            "1"
        ],
        "metrics": {
            "reward": 1.0
        }
    }
   
Two observations:

- the order in which the variables are stored in the json structure is different.  This should not be a problem as I assume that ReAgent reads the data based on labels, not on position. 
- json.dumps adds a `\` for each double quote. This is needed for valid json, but ReAgent skips this. This also should not pose a problem as the contents of the json structure is not different and is valid json.

But the ultimate test will be to run ReAgent with the newly generated data. 

# Now for all data
### Generate JSON

In [9]:
json_data = [obj for obj in jsonlines.open('generated_cartpole_data.json')]
df = pd.json_normalize(json_data)
df = df.drop(columns=['ds', 'action_probability', 'possible_actions', 'metrics.reward'])
df = df.rename(columns={'state_features.0': '0', 
                   'state_features.1': '1',
                   'state_features.2': '2',
                   'state_features.3': '3'})
df = df.set_index(['mdp_id', 'sequence_number', 'action', 'reward'])
print(df.head())
json_lines_input = reagent_df_to_json_lines(df, 
                         ds_value = '2019-01-01', 
                         mdp_id_var = 'mdp_id',
                         sequence_number_var = 'sequence_number',
                         possible_actions = ['0', '1'], 
                         action_var = 'action',
                         reward_var = 'reward',
                         action_probability = 0.975,
                         json_path='reagent_workflow_generated_cartpole_data.json')

                                             0         1         2         3
mdp_id sequence_number action reward                                        
0      0               1      1.0     0.008423 -0.042249  0.022463 -0.020790
       1               1      1.0     0.007578  0.152543  0.022047 -0.306302
       2               1      1.0     0.010629  0.347344  0.015921 -0.591951
       3               0      1.0     0.017575  0.542240  0.004082 -0.879576
       4               1      1.0     0.028420  0.347062 -0.013509 -0.585612


### Quick check, read the JSON

In [10]:
json_data = [obj for obj in jsonlines.open('reagent_workflow_generated_cartpole_data.json')]
json_data[0]['ds']

'2019-01-01'

In [11]:
import json

# This code mimics the check code from ReAgent_workflow
with open('reagent_workflow_generated_cartpole_data.json') as json_file:
    print(next(json_file))
    raw_training_first100 = [json.loads(next(json_file)) for line in range(100)]
raw_training_first100[2]

{"state_features": {"0": 0.008422686718404293, "1": -0.042249470949172974, "2": 0.02246319130063057, "3": -0.020789798349142075}, "ds": "2019-01-01", "mdp_id": "0", "sequence_number": 0, "possible_actions": ["0", "1"], "action": "1", "reward": 1.0, "metrics": {"reward": 1.0}, "action_probability": 0.975}



{'state_features': {'0': 0.017575446516275406,
  '1': 0.5422396659851074,
  '2': 0.004082352388650179,
  '3': -0.8795760273933411},
 'ds': '2019-01-01',
 'mdp_id': '0',
 'sequence_number': 3,
 'possible_actions': ['0', '1'],
 'action': '0',
 'reward': 1.0,
 'metrics': {'reward': 1.0},
 'action_probability': 0.975}

So far, so good. 

The final step is to run ReAgent with this newly generated data in `reagent_workflow_generated_cartpole_data.json`. 