# Welcome to the Parameter Estimation Feature Example

The goal of this notebook is to instruct ProgPy users on how to use the estimate_params feature for PrognosticModels.

First some background. Parameter estimation is used to tune the parameters of a general model so its behavior matches the behavior of a specific system. For example, parameters of the battery model can be tuned to configure the model to describe the behavior of a specific battery.

Generally, parameter estimation is done by tuning the parameters of the model so that simulation best matches the behavior observed in some available data. In ProgPy, this is done using the prog_models.PrognosticsModel.estimate_params() method. This method takes input and output data from one or more runs, and uses scipy.optimize.minimize function to estimate the parameters of the model. For more information, refer to our Documentation [here](https://nasa.github.io/progpy/prog_models_guide.html#parameter-estimation)

A few definitions:
* __`keys`__ `(list[str])`: Parameter keys to optimize
* __`times`__ `(list[float])`: Array of times for each run
* __`inputs`__ `(list[InputContainer])`: Array of input containers where inputs[x] corresponds to times[x]
* __`outputs`__ `(list[OutputContainer])`: Array of output containers where outputs[x] corresponds to times[x]
* __`method`__ `(str, optional)`: Optimization method- see scipy.optimize.minimize for options
* __`tol`__ `(int, optional)`: Tolerance for termination. Depending on the provided minimization method, specifying tolerance sets solver-specific options to tol
* __`error_method`__ `(str, optional)`: Method to use in calculating error. See calc_error for options
* __`bounds`__ `(tuple or dict, optional)`: Bounds for optimization in format ((lower1, upper1), (lower2, upper2), ...) or {key1: (lower1, upper1), key2: (lower2, upper2), ...}
* __`options`__ `(dict, optional)`: Options passed to optimizer. See scipy.optimize.minimize for options

#### Example 1) Simple Example

Now we will show an example demonstrating the model parameter estimation feature. In this example, we will be estimating the parameters for a model from data . In general, the data will usually be collected from the physical system or from a different model (model surrogacy). 

First, we will import a model from the ProgPy Package. For this example we're using the simple ThrownObject model.

In [37]:
from prog_models.models import ThrownObject

Now we can build a model with a best guess for the parameters.

We will use a guess that our thrower is 20 meters tall. However, given our times, inputs, and outputs, we can clearly tell this is not true! Let's see if parameter estimation can fix this!

In [38]:
m = ThrownObject(thrower_height=20)

Next, we will collect data from the system. Let's pretend we threw the ball once, and collected position measurements.

In [39]:
times = [0, 1, 2, 3, 4, 5, 6, 7, 8]
inputs = [{}]*9
outputs = [
    {'x': 1.83},
    {'x': 36.95},
    {'x': 62.36},
    {'x': 77.81},
    {'x': 83.45},
    {'x': 79.28},
    {'x': 65.3},
    {'x': 41.51},
    {'x': 7.91},
]

For this example, we will define specific parameters that we want to estimate.

We can pass the desired parameters to our __keys__ keyword argument.

In [40]:
keys = ['thrower_height', 'throwing_speed']

To really see what `estimate_params()` is doing, we will print out the state before executing the estimation

In [41]:
# Printing state before
print('Model configuration before')
for key in keys:
    print("-", key, m.parameters[key])
print(' Error: ', m.calc_error(times, inputs, outputs, dt=1e-4))

Model configuration before
- thrower_height 20
- throwing_speed 40
 Error:  127.89964688122261


Notice that the error is quite high. This indicates that the parameters are not accurate

Now, we will run `estimate_params()` with the data to correct these parameters.

In [42]:
m.estimate_params(times = times, inputs = inputs, outputs = outputs, keys = keys, dt=0.01)

       message: Optimization terminated successfully.
       success: True
        status: 0
           fun: 0.4911056934178954
             x: [ 7.293e-01  4.284e+01]
           nit: 46
          nfev: 91
 final_simplex: (array([[ 7.293e-01,  4.284e+01],
                       [ 7.293e-01,  4.284e+01],
                       [ 7.292e-01,  4.284e+01]]), array([ 4.911e-01,  4.911e-01,  4.911e-01]))

Now, let's see what the new parameters are after estimation.

In [44]:
print('\nOptimized configuration')
for key in keys:
    print("-", key, m.parameters[key])
print(' Error: ', m.calc_error(times, inputs, outputs, dt=1e-4))


Optimized configuration
- thrower_height 0.7292554347316501
- throwing_speed 42.84041990312872
 Error:  0.5271861742108361


Sure enough- parameter estimation determined that the thrower's height wasn't 20m, instead was closer to 1.9m, a much more reasonable height!

#### Example 2) Using Tol

An additional feature of the `estimate_params()` function is the tolerance feature, or `tol`. The `tol` makes our parameter estimation function to continue optimizing until we reach a particular error.

In our previous example, note that our total Error was roughly 0.5272 after the `estimate_params()` call. Now, let us see what happens to the parameters when we set a low tolerance and bounds to their respective keys!

First, let us create a more complicated example! In this example, we are selecting our thrower_height to be 29, our throwing_speed to be 3.1, and our g to be 10.

In [None]:
m = ThrownObject()
results = m.simulate_to_threshold(save_freq=0.5)
# Resetting parameters to their originally incorrectly set values.
m.parameters['thrower_height'] = 3.1
m.parameters['throwing_speed'] = 29
m.parameters['g'] = 10

# Furthermore, changing our keys values to encompass all incorrectly set parameters
keys = ['thrower_height', 'throwing_speed', 'g']

Then, we'll set the bounds of our values. Let's say that we know our thrower_height will be some value between -15 and 15, our throwing_speed somewhere between 24 and 42, and the gravity is somewhere between -20 and 10.

Note that we are calling `simulate_to_threshold()` here instead of using data that is readily available. Typically, either data has been collected by the user, or the user can utilize our Simulation features to predict how the system would change overtime! More information can be found [here](https://nasa.github.io/progpy/prog_models_guide.html#simulation)!

Now that we have all our information, it's time to call our `estimate_params()`!

In [62]:
m.estimate_params(times = results.times, inputs = results.inputs, outputs = results.outputs, keys = keys)

       message: Optimization terminated successfully.
       success: True
        status: 0
           fun: 7.801993286885543e-10
             x: [ 1.830e+00  4.000e+01 -9.810e+00]
           nit: 106
          nfev: 196
 final_simplex: (array([[ 1.830e+00,  4.000e+01, -9.810e+00],
                       [ 1.830e+00,  4.000e+01, -9.810e+00],
                       [ 1.830e+00,  4.000e+01, -9.810e+00],
                       [ 1.830e+00,  4.000e+01, -9.810e+00]]), array([ 7.802e-10,  1.269e-09,  1.517e-09,  2.457e-09]))

In [63]:
print('\nOptimized configuration')
for key in keys:
    print("-", key, m.parameters[key])
print(' Error: ', m.calc_error(results.times, results.inputs, results.outputs))


Optimized configuration
- thrower_height 1.8299724661033296
- throwing_speed 39.999991142242195
- g -9.809984003214833
 Error:  7.801993286885543e-10


Now, let's reset our parameters to their incorrect values, and then call `estimate_params()` with a low tolerance value passed in! in this case, we are passing in a value of __1e-9__ to `tol`

In [64]:
m.parameters['thrower_height'] = 3.1
m.parameters['throwing_speed'] = 29
m.parameters['g'] = 10
m.estimate_params(times = results.times, inputs = results.inputs, outputs = results.outputs, keys = keys, tol=1e-9)
print('\nOptimized configuration')
for key in keys:
    print("-", key, m.parameters[key])
print(' Error: ', m.calc_error(results.times, results.inputs, results.outputs))


Optimized configuration
- thrower_height 1.830000000159591
- throwing_speed 39.99999999968104
- g -9.809999999841754
 Error:  1.8703500551846052e-20


Note, if we were to set a high tolerance, such as 10, our error would consequently be very high!

For more information on how the `tol` feature works, please consider scipy's `minimize()` documentation located [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html)

#### Example 3) Handling Noise with multiple Runs

In the previous two examples, we have demonstrated how to use `estimate_params()` using a clearly defined ThrownObject model. However, unlike most models, we assumed that there would be 0 noise to our system!

In this example, we'll show how `estimate_params()` may not necessarily produce optimal results when handling a system with noise.

In [158]:
m = ThrownObject(process_noise = 5)
results = m.simulate_to_threshold(save_freq=0.5)
# Resetting parameters to their originally incorrectly set values.
m.parameters['thrower_height'] = 3.1
m.parameters['throwing_speed'] = 29
m.parameters['g'] = 10

# Furthermore, changing our keys values to encompass all incorrectly set parameters
keys = ['thrower_height', 'throwing_speed', 'g']

In [159]:
m.estimate_params(times = results.times, inputs = results.inputs, outputs = results.outputs, keys = keys, tol=1e-9)
print('\nOptimized configuration')
for key in keys:
    print("-", key, m.parameters[key])
print(' Error: ', m.calc_error(results.times, results.inputs, results.outputs))


Optimized configuration
- thrower_height -0.22878604854419288
- throwing_speed 37.456713399968095
- g -6.849388595539937
 Error:  4.478373002713862


In [160]:
count = 1
while count <= 10:
    m = ThrownObject(process_noise = 5)
    results = m.simulate_to_threshold(save_freq=0.5)
    # Resetting parameters to their originally incorrectly set values.
    m.parameters['thrower_height'] = 3.1
    m.parameters['throwing_speed'] = 29
    m.parameters['g'] = 10

    m.estimate_params(times = results.times, inputs = results.inputs, outputs = results.outputs, keys = keys, tol=1e-9)
    print(f'Estimate Call Number {count} - Error {m.calc_error(results.times, results.inputs, results.outputs)}')
    count += 1

Estimate Call Number 1 - Error 1.3634043303765429
Estimate Call Number 2 - Error 1.8628084188422855
Estimate Call Number 3 - Error 3.7516413511693907
Estimate Call Number 4 - Error 1.2408446971449003
Estimate Call Number 5 - Error 0.6420996052854577
Estimate Call Number 6 - Error 1.697869897826421
Estimate Call Number 7 - Error 0.9685780910889981
Estimate Call Number 8 - Error 13.121310439877108
Estimate Call Number 9 - Error 1.037720790595792
Estimate Call Number 10 - Error 2.3734203467645654


Behind the scenes, `estimate_params()` applies the `calc_error()` method to each run independently (e.g., Run 0 = (times[]))

`estimate_params()` creates a structure of 'runs' by taking each index of times, inputs, and outputs and placing them into a tuple.

                `runs = [(t, u, z) for t, u, z in zip(times, inputs, outputs)]`

In [161]:
runs = [[], [], []]
count = 1
while count <= 2:
    m = ThrownObject(process_noise = 5)
    results = m.simulate_to_threshold(save_freq=0.5)
    # Resetting parameters to their originally incorrectly set values.
    m.parameters['thrower_height'] = 3.1
    m.parameters['throwing_speed'] = 29
    m.parameters['g'] = 10
    
    runs[0].append(results.times)
    runs[1].append(results.inputs)
    runs[2].append(results.outputs)
    count+=1

print(runs)

[[[0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0], [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5]], [[{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}], [{}, {}, {}, {}, {}, {}, {}, {}]], [[{'x': 1.83}, {'x': 22.654459486480604}, {'x': 37.70311353702431}, {'x': 50.89405789118343}, {'x': 65.482622411559}, {'x': 79.8989962785338}, {'x': 87.75708133725598}, {'x': 93.25853719393618}, {'x': 97.45138519354721}, {'x': 103.97048677716953}, {'x': 104.96897057179955}, {'x': 104.71395470333907}, {'x': 104.0551771159573}], [{'x': 1.83}, {'x': 23.043648825583684}, {'x': 41.49166447140896}, {'x': 57.26935513400038}, {'x': 68.92804661696344}, {'x': 74.71602765251818}, {'x': 78.04692888305398}, {'x': 83.22977152288806}]]]


In [162]:
m.estimate_params(times = runs[0], inputs = runs[1], outputs = runs[2], keys = keys)
print('\nOptimized configuration')
for key in keys:
    print("-", key, m.parameters[key])
print(' Error: ', m.calc_error(runs[0], runs[1], runs[2], dt=1e-4))


Optimized configuration
- thrower_height 4.299458871395924
- throwing_speed 36.37013527024516
- g -6.247809502308351
 Error:  32.12391529391421


Using our optimization function, which runs `calc_error()` as a subroutine (more information about `calc_error()` found in our Calculating Error Example), given each of the runs.

You can also adjust the metric that is used to estimate parameters by setting the error_method to a different `calc_error()` method.
e.g., m.estimate_params([(times, inputs, outputs)], keys, dt=0.01, error_method='MAX_E')
Default is Mean Squared Error (MSE)
See calc_error method for list of options.

* Cover multiple inputs, range of inputs at different levels, to make sure the model works for all runs, then do it multiple times,
* Or if there is noise, and you'll need multiple runs.

=============
* ONE RUN WITH NOISE, THEN SHOW IT'S NOT VERY GOOD
* Why multiple runs with the noise, both noises work, run estimate_params(), not very good of a job with one run.
* Add tolerance, and change the error calculation metric, (just a note for error calculation metric).