# boxplots
This page contains instructions and documentation for creating plots used to visualize curve ensembles. 

## spaghetti_plot
Plots a random selection of curves. 

### Parameters
`client` (`bigquery.Client`): BigQuery client object.

`table_name` (`str`): BigQuery table name containing data in 'dataset.table' form. 

`reference_table` (`str`): BigQuery table name containing reference table in 'dataset.table' form.

`geo_level` (`str`): The name of a column from the reference table. The geographical level used to determine what places are included. 

`geo_values` (`str` or `listlike` or `None`): The source(s) to be included. A value or subset of values from the `geo_level` column. If None, then all values will be included. 

`geo_column` (`str`, optional): Name of column in original table containing geography identifier. Defaults to 'basin_id'.

`reference_column` (`str`, optional): Name of column in original table containing the geography corresponding to data in `source_column` and `target_column`. Defaults to 'basin_id'. 

`value` (`str`, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to 'value'.

`n` (`int`, optional): Number of curves to plot. Defaults to 25. 

---
### Returns
`fig` (`plotly.graph_objects.Figure`): Plotly Figure containing visualization. 

---

## functional_boxplot

A functional boxplot uses curve-based statistics that treat entire curves as a single data point, as opposed to each observation in a curve. Always plots the median and interquartile range. 

<img src="../images/sample_func_box_plot.png" alt="Sample Functional Boxplot Plot" width="1400" style="margin:auto;"/>

### Parameters
`client` (`bigquery.Client`): BigQuery client object.

`table_name` (`str`): BigQuery table name containing data in 'dataset.table' form. 

`reference_table` (`str`): BigQuery table name containing reference table in 'dataset.table' form.

`geo_level` (`str`): The name of a column from the reference table. The geographical level used to determine what places are included. 

`geo_values` (`str` or `listlike` or `None`): The source(s) to be included. A value or subset of values from the `geo_level` column. If None, then all values will be included. 

`geo_column` (`str`, optional): Name of column in original table containing geography identifier. Defaults to 'basin_id'.

`reference_column` (`str`, optional): Name of column in original table containing the geography corresponding to data in `source_column` and `target_column`. Defaults to 'basin_id'. 

`value` (`str`, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to 'value'.

`num_clusters` (`int`, optional): Number of clusters that curves will be broken into based on `grouping_method`. Defaults to 1. *Note: raising `num_clusters` above one significantly increases runtime.*

`num_features` (`int`, optional): Number of features the kmeans algorithm will use to group curves if `num_clusters` in greater than 1. Must be less than or equal to number of run_ids in table. 

`grouping_method` (`str`, optional): Method used to group curves. Must be one of: 
- `'mse'` *(default)*: Fixed-time pairwise mean squared error between curves.

- `'abc'`: Fixed-time pairwise area between curves. Also called mean absolute error.  

`kmeans_table` (`str`, optional): BigQuery table name containing clustering information in 'dataset.table' form. Used when kmeans has already been performed with `delete_data=False`. Allows function to skip costly kmeans algorithm. 

`centrality_method` (`str`, optional): Method used to determine curve centrality within their group. Must be one of:

- `'mse'` *(default)*: Summed fixed-time mean squared error between curves.

- `'abc'`: Summed fixed-time pairwise area between curves. Also called mean absolute error.

- `'mbd'`: Modified band depth. For more information, see [Sun and Genton (2011)](https://www.tandfonline.com/doi/abs/10.1198/jcgs.2011.09224).

`threshold` (`float`, optional): Number of interquantile ranges from median curve must be to not be considered an outlier. Defaults to 1.5. 

`dataset` (`str` or `None`, optional): Name of BigQuery dataset to store intermediate tables. If `None`, then random hash value will be used. Defaults to `None`. 

`delete_data` (`bool`, optional): If True, then intermediate data tables will not be deleted. Defaults to False. 

---
### Returns
`fig` (`plotly.graph_objects.Figure`): Plotly Figure containing visualization. 

---

## fixed_time_boxplot

A fixted-time boxplot uses fixed-time statistics that rank each point at each time step, and use those to construct confidence intervals for each time step. Always plots the median and interquartile range. 

<img src="../images/sample_ft_box_plot.png" alt="Sample Functional Boxplot Plot" width="1400" style="margin:auto;"/>

### Parameters
`client` (`bigquery.Client`): BigQuery client object.

`table_name` (`str`): BigQuery table name containing data in 'dataset.table' form. 

`reference_table` (`str`): BigQuery table name containing reference table in 'dataset.table' form.

`geo_level` (`str`): The name of a column from the reference table. The geographical level used to determine what places are included. 

`geo_values` (`str` or `listlike` or `None`): The source(s) to be included. A value or subset of values from the `geo_level` column. If None, then all values will be included. 

`geo_column` (`str`, optional): Name of column in original table containing geography identifier. Defaults to 'basin_id'.

`reference_column` (`str`, optional): Name of column in original table containing the geography corresponding to data in `source_column` and `target_column`. Defaults to 'basin_id'. 

`value` (`str`, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to 'value'.

`num_clusters` (`int`, optional): Number of clusters that curves will be broken into based on `grouping_method`. Defaults to 1. *Note: raising `num_clusters` above one significantly increases runtime.*

`num_features` (`int`, optional): Number of features the kmeans algorithm will use to group curves if `num_clusters` in greater than 1. Must be less than or equal to number of run_ids in table. 

`grouping_method` (`str`, optional): Method used to group curves. Must be one of: 
- `'mse'` *(default)*: Fixed-time pairwise mean squared error between curves.

- `'abc'`: Fixed-time pairwise area between curves. Also called mean absolute error.  

`kmeans_table` (`str`, optional): BigQuery table name containing clustering information in 'dataset.table' form. Used when kmeans has already been performed with `delete_data=False`. Allows function to skip costly kmeans algorithm. 

`dataset` (`str` or `None`, optional): Name of BigQuery dataset to store intermediate tables. If `None`, then random hash value will be used. Defaults to `None`. 

`delete_data` (`bool`, optional): If True, then intermediate data tables will not be deleted. Defaults to False. 

`confidence` (`float`, optional): From 0 to 1. Confidence level of interval that will be graphed. Also determines which points are considered outliers. 

`full_range` (`bool`, optional): If True, then mesh will be drawn around entire envelope, including outliers. Defaults to False. 

`outlying_points` (`bool`, optional): If True, then outlying points will be graphed. Defaults to True. 

---
### Returns
`fig` (`plotly.graph_objects.Figure`): Plotly Figure containing visualization. 

---

## fetch_fixed_time_quantiles

Allows calculation of custom fixed-time quantiles. Always fetches median. 

### Parameters
`client` (`bigquery.Client`): BigQuery client object.

`table_name` (`str`): BigQuery table name containing data in 'dataset.table' form. 

`reference_table` (`str`): BigQuery table name containing reference table in 'dataset.table' form.

`confidences` (`list` of `float`): List of confidences to gather, from 0 to 1. For example, entering `.5` will result in the 25th and 75th percentiles being calculated. 

`geo_level` (`str`): The name of a column from the reference table. The geographical level used to determine what places are included. 

`geo_values` (`str` or `listlike` or `None`): The geographies to be included. A value or subset of values from the `geo_level` column. If None, then all values will be included. 

`geo_column` (`str`, optional): Name of column in original table containing geography identifier. Defaults to 'basin_id'.

`reference_column` (`str`, optional): Name of column in original table containing the geography corresponding to data in `source_column` and `target_column`. Defaults to 'basin_id'. 

`value` (`str`, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to 'value'.

`num_clusters` (`int`, optional): Number of clusters that curves will be broken into based on `grouping_method`. Defaults to 1. *Note: raising `num_clusters` above one significantly increases runtime.*

`num_features` (`int`, optional): Number of features the kmeans algorithm will use to group curves if `num_clusters` in greater than 1. Must be less than or equal to number of run_ids in table. 

`grouping_method` (`str`, optional): Method used to group curves. Must be one of: 
- `'mse'` *(default)*: Fixed-time pairwise mean squared error between curves.

- `'abc'`: Fixed-time pairwise area between curves. Also called mean absolute error.  

`kmeans_table` (`str`, optional): BigQuery table name containing clustering information in 'dataset.table' form. Used when kmeans has already been performed with `delete_data=False`. Allows function to skip costly kmeans algorithm. 

`dataset` (`str` or `None`, optional): Name of BigQuery dataset to store intermediate tables. If `None`, then random hash value will be used. Defaults to `None`. 

`delete_data` (`bool`, optional): If True, then intermediate data tables will not be deleted. Defaults to False. 

---
### Returns
`df` (`pandas.DataFrame`): pandas dataframe containing quantiles and median. 

---