# Example Capacity Test using Captest

The captest module of the Captest package contains the `CapData` class and a few top level functions. `CapData` objects hold simulated data from PVsyst (or other simulation) or measured data from a DAS or SCADA system and provide methods for loading, filtering, visualizing and regressing.

This example goes through typical steps of performing a capacity test following the ASTM E2848 standard using the Captest package.

## Imports

In [None]:
%matplotlib inline

import pandas as pd

# import captest as pvc
from captest import capdata as pvc
from bokeh.io import output_notebook, show

# uncomment below two lines to use cptest.scatter_hv in notebook
import holoviews as hv
hv.extension('bokeh')

#if working offline with the CapData.plot() method may fail
#run 'export BOKEH_RESOURCES=inline' at the command line before
#running the jupyter notebook

output_notebook()

## Load and Plot Measured Data

We begin by instantiating a `CapData` object, which we will use to load and store the measured data.  In this example we will calculate reporting conditions from the measured data, so we load and filter the measured data first.

In [None]:
das = pvc.CapData('das')

The `load_data` method by default will look for and attempt to load all files ending with '.csv' in a 'data' folder.  In this case we have a single file and provide the filename, so only the file specified is loaded.

In [None]:
das.load_data(fname='example_meas_data.csv', source='AlsoEnergy')

The `load_data` method loads the data into a pandas DataFrame, which it assigns to the `data` attribute of the `CapData` object.  Here we use the pandas DataFrame `head` method to return the first three rows.

In [None]:
das.data.head(3)

In addition to loading data, by default the `load_data` method calls the `group_columns` method, which attempts to infer the type of measurement recorded in each column of the data.  For each inferred measurement type, `group_columns` creates an abbreviated name and a list of columns that contain measurements of that type. This information is stored in a python [dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) where the abbreviated names are the keys and the corresponding values are the lists of columns.  The python dictionary created by `group_columns` is stored in the `column_groups` attribute.

The `review_column_groups` method prints the `group_columns` dictionary in an easy to read format to facilitate checking the grouping and identifying which key is linked to which group.

In [None]:
das.review_column_groups()

The `view` method uses the dictionary stored in the `column_groups` attribute to allow easy access to columns of data of a certain type without renaming columns or typing long column names.  The `column_groups` dictionary also enables much of the functionality of `CapData` methods to perform common capacity testing tasks, like generating scatter plots, filtering data, and performing regressions, with minimal user input.

In [None]:
das.view('irr-poa-').iloc[100:103, :]

pvcaptest does not attempt to determine which columns of data or groups of columns are the data to be used in the regressions. The link between regression variables and the imported data is made by a dictionary stored in the `regression_cols` attribute.  pvcaptest provides the convience method `set_regression_cols` for this purpose. `regression_cols` should be set immediately after loading data as many other `CapData` methods rely on this attribute.

In [None]:
das.set_regression_cols(power='-mtr-', poa='irr-poa-', t_amb='temp-amb-', w_vel='wind--')

Once the regression columns are set, the `rview` method, similar to the `view` method, will return the data for each type of sensor identified in the `column_groups` attribute.  The difference is that you pass `rview` one of the following:
- any of 'power', 'poa', 't_amb', 'w_vel'
- a list of some subset of any of the previous four strings
- 'all' to return data for all four

Here we are again accessing the same POA irradiance data as above with `view`.

In [None]:
das.rview('poa').iloc[100:103, :]

For datasets that have multiple measurements of the same value, like the two POA irradiance measurements in this sample data, these values must be aggregated prior to filtering or regressing the data.  The `agg_sensors` method provides a convient way to do this for all the groups of measurements in `column_groups` in one step.

The desired aggregations are specified by passing a dictionary to the `agg_map` argument where the keys are groups from `column_groups` and the values are aggregation functions.  Here we are using string functions that are recognized by pandas.  Most of the common aggregation functions (mean, median, max, sum, min, etc.) are available as string functions.  If you would like to apply a different aggregation function, please refer to the pandas documentation for `DataFrame.agg`. By default, the `agg_sensors` method adds a new column to the dataframe in the `data` attribute for the results of each aggregation and copies over the `data_filtered` attribute with the new dataframe.

There is a also a method, `filter_sensors`, for filtering data on comparisons between measurements of the same value described below.

In [None]:
das.agg_sensors(agg_map={'-inv-':'sum', 'irr-poa-':'mean', 'temp-amb-':'mean', 'wind--':'mean'},
                inv_sum_vs_power=False)

The `plot` method creates a group of time series plots that are useful for visually inspecting the imported data.

`plot` uses the structure of the `column_group` attribute to create a layout of plots.  A single plot is generated for each measurement type and each column with measurements of that type are plotted as a separate line on the plot.  In this example there are two different weather stations, which each have pyranometers measuring plane of array and global horizontal irradiance. This arrangement of sensors results in two plots which each have two lines.

In [None]:
das.plot(marker='line', width=900, height=250, ncols=1)

## Filtering Measured Data
The `CapData` class provides a number of convience methods to apply filtering steps as defined in ASTM E2848.  The following section demonstrates the use of the more commonly used filtering steps to remove measured data points.

In [None]:
# Uncomment and run to copy over the filtered dataset with the unfiltered data.
das.reset_filter()

A common first step is to review the scatter plot of the POA irradiance against the power production.  The `scatter` method returns a basic non-interactive version of this plot as shown below.

If you have the optional dependency Holoviews installed, `scatter_hv` will return an interactive scatter plot.  Additionally, `scatter_hv` includes an option to return a timeseries plot of power that is linked to the scatter plot, so points selected in the scatter plot will be highlighted in the time series.

In [None]:
# Uncomment the below line to use scatter_hv with linked time series
das.scatter_hv(timeseries=True)

In [None]:
das.scatter()

In this example, we have multiple measurements of the same value from different sensors.  In this case a common first step is to compare measurements from the different sensors and remove data for timestamps where the measurements differ above some acceptable threshold.  The `filter_sensors` method provides a convient method to accomplish this taks for the groups of measurements identified as regression values.

In [None]:
das.filter_sensors()

The `get_summary` method will return a dataframe summarizing the filtering steps that have been applied, the agruments passed to them, the number of points prior to filtering, and the number of points after filtering.

In [None]:
das.get_summary()

The `custom_filter` method provides a means to update the summary data when using filtering functions not defined as `CapData` methods.  The `custom_filter` method allows passing any function or method that takes a DataFrame as the first argument and returns the filtered dataframe with rows removed.  Passed methods can be user-defined or Pandas DataFrame methods.

Below, we use the `custom_filter` method with the pandas DataFrame `dropna` method to removing missing data and update the summary data.

In [None]:
das.filter_custom(pd.DataFrame.dropna)

The `filter_irr` method provides a convient way to remove remove data based on the irradiance measurments.  Here we use it to simply remove periods of low irradiance.

In [None]:
das.get_summary()

In [None]:
das.filter_irr(200, 2000)

We can re-run the `scatter` method to see the results of the filtering steps.

In [None]:
das.scatter()

The `filter_outliers` method uses scikit-learn's elliptic envelope to remove outlier points.  A future release will include a way to interactively select points to be removed.

In [None]:
das.filter_outliers()

In [None]:
das.scatter()

The `reg_cpt` method performs a regression on the data stored in df_flt using the regression equation specified by the standard.  The regression equation is stored in the `reg_fml` attribute as shown below.  Regressions are performed using the statsmodels package.

The `reg_cpt` method has an option to filter data based on the regression results as specified in the standard, as demonstrated below.

In [None]:
das.regression_formula

In [None]:
das.reg_cpt(filter=True, summary=False)

In [None]:
das.get_summary()

____
#### Calculation of Reporting Conditions

The `rep_cond` method provide a variety of ways to calculate reporting conditions.  Using `rep_cond` the reporting conditions are always calculated from the data store in the df_flt attribute.  Refer to the example notebook "Reporting Conditions Examples" for a thourough explanation of the `rep_cond` functionality.  By default the reporting conditions are calcualted following the guidance of ASTM E2939-13.

In [None]:
das.rep_cond()

----

Previously we used the irradiance filter to filter out data below 200 W/m<sup>2</sup>.  The irradiance filter can also be used to filter irradiance based on a percentage band around a reference value.  This approach is shown here to remove data where the irradiance is outside of +/- 50% of the reporting irradiance.

In [None]:
das.filter_irr(0.5, 1.5, ref_val=das.rc['poa'][0])

In [None]:
das.scatter()

The regression method is used again without the filter option to perform the final regression of the measured data. The result of the regression is statsmodels object containing the regression coefficients and other information generated when performing the regression.  This object is stored in the CapData `ols_model` attribute.

In [None]:
das.reg_cpt()

In [None]:
das.reg_cpt()

In [None]:
das.ols_model.params

In [None]:
das.ols_model.pvalues

## Load and Filter PVsyst Data

To load and filter the modeled data, typically from PVsyst, we simply create a new CapData object, load the PVsyst data, and apply the filtering methods as appropriate.

In [None]:
sim = pvc.CapData('sim')

To load pvsyst data we use the load_data method with the load_pvsyst option set to True.  By default the `load_data` method will search for a csv file that includes 'pvsyst' in the filename in a 'data' directory in the same directory as this file.  If you have saved the pvsyst file in a different location, you can use the path and fname arguments to load it.

In [None]:
sim.load_data(load_pvsyst=True)

In [None]:
sim.column_groups

In [None]:
sim.set_regression_cols(power='real_pwr--', poa='irr-poa-', t_amb='temp-amb-', w_vel='wind--')

In [None]:
# sim.plot()

In [None]:
# Write over cptest.flt_sim dataframe with a copy of the original unfiltered dataframe
sim.reset_filter()

As a first step we use the `filter_time` method to select a 60 day period of data centered around the measured data.

In [None]:
sim.filter_time(test_date='10/11/1990', days=60)

In [None]:
sim.scatter()

In [None]:
sim.filter_irr(200, 930)

In [None]:
sim.scatter()

In [None]:
sim.get_summary()

The `filter_pvsyt` method removes data for times when shade is present or the 'IL Pmin', IL Vmin', 'IL Pmax', 'IL Vmax' output values are greater than 0.

In [None]:
sim.filter_pvsyst()

In [None]:
sim.filter_irr(0.5, 1.5, ref_val=das.rc['poa'][0])

In [None]:
sim.reg_cpt()

## Results

The `get_summary` and `res_summary` functions display the results of filtering on simulated and measured data and the final capacity test results comparing measured capacity to expected capacity, respectively.

In [None]:
pvc.get_summary(das, sim)

In [None]:
pvc.captest_results_check_pvalues(sim, das, 6000, '+/- 7', print_res=True)

Uncomment and run the below lines to produce a scatter plot overlaying the final measured and PVsyst data.

In [None]:
%%opts Scatter (alpha=0.3)
%%opts Scatter [width=600]
das.scatter_hv().relabel('Measured') * sim.scatter_hv().relabel('PVsyst')