# Speed

## Background



In the previous project, we implemented a function called `square_root_linear_update` that updates a previous state estimate with one new measurement. 

If Kalman filters are used to estimate the parameters of The Technology of Skill Formation, the update step has to be carried out repeatedly for all observations in a dataset. This can be achieved by calling the above function in a loop. `code/update.py` contains a function called `pandas_batch_update` that does this. 

You can see that in `pandas_batch_update` most inputs got one dimension more than before. This was problematic in the case of root_cov, since it already was a DataFrame. Therefore, we used lists of DataFrames. 

Unfortunately, `pandas_batch_update` is extremely slow. Our main task during this project is to make it fast.

To measure the speed of the pandas function and our function, we use the data from Cunha, Heckman and Schennach 2010. The initial state estimate is zero for every individual. The initial covariance is also identical for all individuals and can be constructed from estimated parameters. 

For the benchmark we are going to update the initial state estimate with the first measurement of cognitive skills (birthweight). The factor loading for this measurement was normalized to 1. Unfortunately, the parameters of the measurement variance are not reported, so we just fix a value.

For simplicity, we filled all missing observations with the average birthweight. This is not necessary for estimation, but writing a Kalman update that can handle missing data and is fast is too difficult for this assignment.

The speed of our function will be measured on the same data and in the same way as we measure the speed of `pandas_batch_update.py` in timing.py.

## Steps


1. We run `timing.py`. It will print out the runtime of `pandas_batch_update`. Originally, it took about 5 seconds on our laptop. 

2. Adjust our tests from last time for a function called `fast_batch_update` with the following interface:

    ```python
    def fast_batch_update(states, root_covs, measurements, loadings, meas_var):
        """Update state estimates for a whole dataset.
        
        Let nstates be the number of states and nobs the number of observations.
        
        Args:
            states (np.ndarray): 2d array of size (nobs, nstates)
            root_covs (np.ndarray): 3d array of size (nobs, nstates, nstates)
            measurements (np.ndarray): 1d array of size (nobs)
            loadings (np.ndarray): 1d array of size (nstates)
            meas_var (float):
        
        Returns:
            updated_states (np.ndarray): 2d array of size (nobs, nstates)
            updated_root_covs (np.ndarray): 3d array of size (nobs, nstates, nstates)
        
        """
    ```
    

3. We implemented the function `fast_batch_update` in the module `update.py` and make it as fast as possible. Extend `timing.py` such that it also measures and prints the runtime of `fast_batch_update`. We use Numba to replace NumPy in certain cases. 