# Generation of Sample Input Data for Scalability Analysis
---

This notebook exemplifies the data assembly to include in the scalability analysis of the `respy` function `_full_solution`.  

The first step includes to generate **sample input data** from either of the Keane and Wolpin models. We have simulated both `kw_94_one` and `kw_97_basic` under a chosen `params` pd.Dataframe. Data was extracted for 
periods:
- `"per1"`
- `"per8"`
- `"per18"`
- `"per28"`
- `"per38"`
- `"per48"` (only for `kw_97_basic`)

and saved under an appropriate file format (in our case `x` `@` [`.pickle`, `.npy`]). To minimize the effort during the timing analysis it is recommended to save each model as a separate file. For scalability analyses (timing analyses) of dynamic models it is indispensable to have data for various periods at hand. The reason: computational effort differs for the current period.

To exclude any data impurity we generated sample input data for `kw_97_basic` twice. For each of the sample data we will create a separate file. The following data sets are gernerate as `x` `@` [`.pickle`, `.npy`].

- `kw_94_one_input_params.x`
- `kw_97_basic_one_input_params.x`
- `kw_97_basic_two_input_params.x`




In [1]:
from pathlib import Path
import numpy as np
import pickle

%load_ext nb_black

<IPython.core.display.Javascript object>

In [2]:
path_out_raw_data = Path("./resources/raw_input_data")
# Specific path of the sample data (available upon request)
path_in_raw_data = Path("../../development")

PERIODS = [1, 8, 18, 28, 38, 48]

<IPython.core.display.Javascript object>

In [3]:
kw_94_one_input_params = {}
for period in PERIODS[:-1]:
    filename = Path(f"{path_in_raw_data}/inputs_kw_94_one_per{period}.pickle")
    infile = np.load(filename, allow_pickle=True)
    label = "per" + str(period)
    kw_94_one_input_params[label] = infile

    pickle.dump(
        kw_94_one_input_params,
        open(f"{path_out_raw_data}/kw_94_one_input_params.pickle", "wb"),
    )

    np.save(
        f"{path_out_raw_data}/kw_94_one_input_params",
        kw_94_one_input_params,
        allow_pickle=True,
    )

<IPython.core.display.Javascript object>

In [4]:
for num in ["one", "two"]:
    kw_97_basic_input_params = {}

    for period in PERIODS[:-1]:
        filename = Path(
            f"{path_in_raw_data}/inputs_kw_97_basic_{num}_per{period}.pickle"
        )
        infile = np.load(filename, allow_pickle=True)
        label = "per" + str(period)
        kw_97_basic_input_params[label] = infile

        pickle.dump(
            kw_97_basic_input_params,
            open(f"{path_out_raw_data}/kw_97_basic_{num}_input_params.pickle", "wb"),
        )

        np.save(
            f"{path_out_raw_data}/kw_97_basic_{num}_input_params",
            kw_97_basic_input_params,
            allow_pickle=True,
        )

<IPython.core.display.Javascript object>

In [5]:
input_params_pickle = pickle.load(
    open(f"{path_out_raw_data}/kw_97_basic_two_input_params.pickle", "rb")
)

input_params_npy = np.load(
    f"{path_out_raw_data}/kw_97_basic_two_input_params.npy", allow_pickle=True
).item()

<IPython.core.display.Javascript object>

### Different data format: Adjustment in the scripts

If, for some reasons, `.npy` is preferred, some lines in the script files have to be changed. Foremost, in `config.py` the `DATA_FORMAT` should be set to `"npy"`. 

In `caller_scalability_analysis.py` the following lines

```python
input_params = pickle.load(open(INPUT_DATA, "rb"))[PERIOD]
pickle.dump(input_params, open(PATH_AUXINPUT_PARAMS, "wb"))
```

have to be replaced by: 

```python
input_params = np.load(INPUT_DATA, allow_pickle=True).item()[PERIOD]
np.save(PATH_AUXINPUT_PARAMS, input_params, allow_pickle=True)
```

In `exec_time_scalability.py` the following line

```python
input_params = pickle.load(open(PATH_AUXINPUT_PARAMS, "rb"))
```

has to be replaced by:

```python
input_params = np.load(PATH_AUXINPUT_PARAMS, allow_pickle=True).item()
```

The last change includes (if not already done) to change the imports in those modules. Instead `import pickle` we need to `import numpy as np`. 

## Sample Input Data
---

The resulting input data can be accessed via the period keys. In our case, the function `_full_solution` takes the arguments: 
- wages
- nonpecs 
- continuation_values
- period_draws_emax_risk 
- optim_paras

In [6]:
kw_94_one_input_params["per38"]

{'wages': array([[2.58095537e+04, 1.95650482e+04, 1.00000000e+00, 1.00000000e+00],
        [2.89261121e+04, 2.41369707e+04, 1.00000000e+00, 1.00000000e+00],
        [2.78475425e+04, 2.25051624e+04, 1.00000000e+00, 1.00000000e+00],
        ...,
        [2.06299669e+04, 1.52677799e+04, 1.00000000e+00, 1.00000000e+00],
        [2.40165871e+04, 1.90248253e+04, 1.00000000e+00, 1.00000000e+00],
        [2.79591557e+04, 2.36590267e+04, 1.00000000e+00, 1.00000000e+00]]),
 'nonpecs': array([[    0.,     0., -4000., 17750.],
        [    0.,     0., -4000., 17750.],
        [    0.,     0., -4000., 17750.],
        ...,
        [    0.,     0., -4000., 17750.],
        [    0.,     0., -4000., 17750.],
        [    0.,     0., -4000., 17750.]]),
 'continuation_values': array([[28915.67125505, 28842.92118423, 29783.14135469, 28524.45462067],
        [32999.89824636, 33153.27006676, 34143.3583467 , 32584.09861607],
        [31544.09926129, 31612.3687966 , 32584.09861607, 31132.49391295],
        .

<IPython.core.display.Javascript object>