This notebook presents the various elements of the DTW component of the predictive system. The actual module is stored in the `libdtw.py` file.

## Preliminaries: Data Loading
The `libdtw.py` module contains 2 functions:
- `load_data(n_to_keep=50, data_path = "data/ope3_26.pickle")`
- `assign_ref(data)`

and one class:
- `Dtw(json_obj = False)`

We first illustrate the usage of the two funcitons.

`load_data(n_to_keep=50, data_path = "data/ope3_26.pickle")` load the .pickle in the `data_path` position. This has to be a dictionary where the keys are the batch IDs and the values are lists of dictionaries representing the PVs (with keys `name, start, end, values`). 

The function first identifies the median (with respect to the duration parameter) batch to be used as reference. Then, it selects the first `n_to_keep` batches closer to the reference one in terms of duration. The output dictionary has the `reference` key explicitly declared.

In [None]:
def load_data(n_to_keep=50, data_path = "data/ope3_26.pickle"):
    """
    Load data of operation 3.26, only the n_to_keep batches with duration closer to the median one
    are selected
    """
    with open(data_path, "rb") as infile:
        data = pickle.load(infile)

    operation_length = list()
    pv_dataset = list()
    for _id, pvs in data.items():
        operation_length.append((len(pvs[0]['values']), _id))
        pv_list = list()
        for pv_dict in pvs:
            pv_list.append(pv_dict['name'])
        pv_dataset.append(pv_list)

    median_len = np.median([l for l, _id in operation_length])

    # Select the batches closer to the median bacth
    # center around the median
    centered = [(abs(l-median_len), _id) for l, _id in operation_length]
    selected = sorted(centered)[:n_to_keep]

    med_id = selected[0][1]

    all_ids = list(data.keys())
    for _id in all_ids:
        if _id not in [x[1] for x in selected]:
            _ = data.pop(_id)

    data['reference'] = med_id

    return data

`assign_ref(data)` computes the median batch (acording to the duration parameter) and sets it as `reference` of the data set. It is useful in case the data loaded with `load_data()` undergoes modification prior to its actual use

In [None]:
def assign_ref(data):
    data = copy(data)
    operation_length = list()
    pv_dataset = list()
    for _id, pvs in data.items():
        operation_length.append((len(pvs[0]['values']), _id))
        pv_list = list()
        for pv_dict in pvs:
            pv_list.append(pv_dict['name'])
        pv_dataset.append(pv_list)

    median_len = np.median([l for l, _id in operation_length])

    # Select the ref_len=50 closest to the median bacthes
    # center around the median
    centered = [(abs(l-median_len), _id) for l, _id in operation_length]
    selected = sorted(centered)

    med_id = selected[0][1]  # 5153

    all_ids = list(data.keys())
    for _id in all_ids:
        if _id not in [x[1] for x in selected]:
            _ = data.pop(_id)

    data['reference'] = med_id

    return data

## Dtw class
The `Dtw(json_obj)` contains all the methods used to set up the DTW algorithm, optimizing the variables weights, computing the alignment. it contains the following methods (divided by topic here for clarity):

##### data handling
- `__init__()`
- `convert_data_from_json()`
- `add_query`
- `get_scaling_parameters`
- `remove_const_feats`
- `scale_pv`
- `convert_to_mvts`

##### dtw implementation
- `comp_dist_matrix`
- `comp_acc_dist_matrix`
- `comp_acc_element`
- `get_warping_path`
- `call_dtw`
- `dtw`
- `get_ref_prefix_length`
- `itakura`
- `extreme_itakura`
- `check_open_ended`

##### step pattern selection utilities
- `time_distortion`
- `avg_time_distortion`
- `avg_distance`
- `get_p_max`
- `get_global_p_max`

##### variables weights optimization
- `reset_weights`
- `compute_mld`
- `extract_single_feat
- `weight_optimization_single_batch`
- `weight_optimization_step`
- `optimize_weights`
- `get_weight_variables`

##### visualization
- `distance_cost_plot`
- `plot_weights`
- `plot_by_name`
- `do_warp`
- `plot_warped_curves`

##### misc
- `online_scale`
- `online_query`
- `generate_train_set`

We now examine each method.



### Data handling
`__init__(json_obj=False, random_weights = True, scaling='group')` initialize the `Dtw` class. 

- `json_obj` is the dictionary containing the data in the output format of `load_data()`. 
- `random_weights` if True, initialize the variable weights to randomly chosen values in the [0.1, 1] interval. 
- `scaling` is the scaling strategy for the PVs: `group` scales the PVs according to the values of the PVs in the reference batch, `single` scales the PVs as individual entities

The initialization consists in structuring the data inside the Dtw object, removing the constant features (in the reference batch) filtering the batches that does not contain all the PVs of the reference batch, and finally setting the variables weights.

In [None]:
def __init__(self, json_obj=False, random_weights = True, scaling='group'):
    """
    Initialization of the class.
    json_obj: contains the data in the usual format
    """
    if not json_obj:
        pass
    else:
        self.convert_data_from_json(deepcopy(json_obj))
        #self.scale_params = self.get_scaling_parameters()
        self.remove_const_feats()
        self.reset_weights(random=random_weights)
        self.scaling = scaling

In [None]:
def convert_data_from_json(self, json_obj):
        """
        Returns a dictionary containing all the data, organized as:
        ref_id: the ID of the reference batch
        reference: reference batch in the usual format (list of dictionaries)
        queries: list of dictionaries in which the keys are the query batch's ID and the values are
        the actual batches (list of dictionaries)
        num_queries: number of query batches in the data set
        """
        ref_id = json_obj["reference"]
        reference = json_obj[ref_id]
        queries = {key: batch for key, batch in json_obj.items() if key !=
                   "reference" and key != ref_id}

        self.data = {"ref_id": ref_id,
                     "reference": reference,
                     "queries": queries,
                     "num_queries": len(queries),
                     "warpings": dict(),
                     "distances": dict(),
                     'warp_dist': dict(),
                     "queriesID": list(queries.keys()),
                     "time_distortion": defaultdict(dict),
                     "distance_distortion": defaultdict(dict),
                     'warpings_per_step_pattern': defaultdict(dict),
                     'feat_weights': 1.0}

        self.data_open_ended = {"ref_id": ref_id,
                                "reference": reference,
                                "queries": defaultdict(list),
                                'warp_dist': dict()}
        scale_params = dict()

        for pv_dict in self.data['reference']:
            pv_name = pv_dict['name']
            pv_min = min(pv_dict['values'])
            pv_max = max(pv_dict['values'])
            scale_params[pv_name] = (pv_min, pv_max)

        self.scale_params = scale_params