# importation_plots
This page contains instructions and documentation for creating Sankey diagrams, relative risk charts, and area plots using importation data. 

## area_plot
Area plots are designed to show a change in comparable values over time; they compare apples to apples. In this example, we use H1N1 predictions from 2010 and show the origin countries for cases travelling from Asia to Europe.

<img src="../images/sample_area_plot.png" alt="Sample Area Plot" width="1400" style="margin:auto;"/>

### Parameters
`client` (`bigquery.Client`): BigQuery client object.

`table_name` (`str`): BigQuery table name containing importation data in 'dataset.table' form. 

`reference_table` (`str`): BigQuery table name containing reference table in 'dataset.table' form.

`source_geo_level` (`str`): The name of a column from the reference table. The geographical level used to determine what sources are included. 

`target_geo_level` (`str`): The name of a column from the reference table. The geographical level used to determine what targets are included. 

`source_values` (`str` or `listlike` or `None`, optional): The source(s) to be included. A value or subset of values from the `source_geo_level` column. If None, then all values will be included. Defaults to None. 

`target_values` (`str` or `listlike` or `None`, optional): The target(s) to be included. A value or subset of values from the `target_geo_level` column. If None, then all values will be included. Default to None. 

`source_column` (`str`, optional): Name of column in original table containing source identifier. Defaults to 'source_basin'.

`target_column` (`str`, optional): Name of column in original table containing target identifier. Defaults to 'target_basin'.

`reference_column` (`str`, optional): Name of column in original table containing the geography corresponding to data in `source_column` and `target_column`. Defaults to 'basin_id'. 

`output_resolution` (`str` or `None`, optional): The name of a column from the reference table. If None, then `target_geo_level` will be used. Desired geographical resolution for area plot. Defaults to None. 

`domestic` (`bool`, optional): Whether or not domestic cases will be included. Defaults to True. *Note: can produce unexpected results.*

`cutoff` (`float`, optional): From 0 to 1, inclusive. All sources or targets that contribute below this percentage of cases will be grouped into an 'Other' category. Set to 0 for no 'Other' category. Defaults to 0.05. 

`value` (`str`, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to 'importations'.

`display` (`str`, optional): Whether the source or target of the importations will be visualized. Defaults to 'source'. 

---
### Returns
`fig` (`plotly.graph_objects.Figure`): Plotly Figure containing visualization. 

---

## sankey
Sankey diagrams are designed to show the relative volume of flow from sources to targets. In this example, we use H1N1 predictions from 2010 and show the origin and destination countries for cases travelling from Asia to Europe. 

<img src="../images/sample_sankey_plot.png" alt="Sample Sankey Plot" width="400" style="margin:auto;"/>

### Parameters
`client` (`bigquery.Client`): BigQuery client object.

`table_name` (`str`): BigQuery table name containing importation data in 'dataset.table' form. 

`reference_table` (`str`): BigQuery table name containing reference table in 'dataset.table' form.

`date_range` (`list of str`): Start and end date, inclusive. Dates should be formatted as they are in your original table. 

`source_geo_level` (`str`): The name of a column from the reference table. The geographical level used to determine what sources are included. 

`target_geo_level` (`str`): The name of a column from the reference table. The geographical level used to determine what targets are included. 

`source_values` (`str` or `listlike` or `None`, optional): The source(s) to be included. A value or subset of values from the `source_geo_level` column. If None, then all values will be included. Defaults to None. 

`target_values` (`str` or `listlike` or `None`, optional): The target(s) to be included. A value or subset of values from the `target_geo_level` column. If None, then all values will be included. Default to None. 

`source_column` (`str`, optional): Name of column in original table containing source identifier. Defaults to 'source_basin'.

`target_column` (`str`, optional): Name of column in original table containing target identifier. Defaults to 'target_basin'.

`reference_column` (`str`, optional): Name of column in original table containing the geography corresponding to data in `source_column` and `target_column`. Defaults to 'basin_id'. 

`source_resolution` (`str` or `None`, optional): The name of a column from the reference table. Desired geographical resolution for source nodes. If None, then `source_geo_level` will be used. Defaults to None. 

`target_resolution` (`str` or `None`, optional): The name of a column from the reference table. Desired geographical resolution for target nodes. If None, then `target_geo_level` will be used. Defaults to None. 

`domestic` (`bool`, optional): Whether or not domestic cases will be included. Defaults to True. *Note: can produce unexpected results.*

`cutoff` (`float`, optional): From 0 to 1, inclusive. All sources or targets that contribute below this percentage of cases will be grouped into an 'Other' category. Set to 0 for no 'Other' category. Defaults to 0.05. 

`n_sources` (`int`, optional): The maximum number of source nodes in the sankey. Only this number of sources minus one will show, the rest will be aggregated into 'Other' regardless of `cutoff`. Must be a positive integer. 

`n_targets` (`int`, optional): The maximum number of target nodes in the sankey. Only this number of targets minus one will show, the rest will be aggregated into 'Other' regardless of `cutoff`. Must be a positive integer. 

`value` (`str`, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to 'importations'.

---
### Returns
`fig` (`plotly.graph_objects.Figure`): Plotly Figure containing visualization. 

---

## relative_risk
Sankey diagrams are designed to show a the relative volume of flow from sources to targets. 

<img src="../images/sample_rr_plot.png" alt="Sample Relative Risk Plot" width="600" style="margin:auto;"/>

### Parameters
`client` (`bigquery.Client`): BigQuery client object.

`table_name` (`str`): BigQuery table name containing importation data in 'dataset.table' form. 

`reference_table` (`str`): BigQuery table name containing reference table in 'dataset.table' form.

`date_range` (`list of str`): Start and end date, inclusive. Dates should be formatted as they are in your original table. 

`source_geo_level` (`str`): The name of a column from the reference table. The geographical level used to determine what sources are included. 

`target_geo_level` (`str`): The name of a column from the reference table. The geographical level used to determine what targets are included. 

`source_values` (`str` or `listlike` or `None`, optional): The source(s) to be included. A value or subset of values from the `source_geo_level` column. If None, then all values will be included. Defaults to None. 

`target_values` (`str` or `listlike` or `None`, optional): The target(s) to be included. A value or subset of values from the `target_geo_level` column. If None, then all values will be included. Default to None. 

`source_column` (`str`, optional): Name of column in original table containing source identifier. Defaults to 'source_basin'.

`target_column` (`str`, optional): Name of column in original table containing target identifier. Defaults to 'target_basin'.

`reference_column` (`str`, optional): Name of column in original table containing the geography corresponding to data in `source_column` and `target_column`. Defaults to 'basin_id'. 

output resolution

`domestic` (`bool`, optional): Whether or not domestic cases will be included. Defaults to True. *Note: can produce unexpected results.*

`cutoff` (`float`, optional): From 0 to 1, inclusive. All sources or targets that contribute below this percentage of cases will be grouped into an 'Other' category. Set to 0 for no 'Other' category. Defaults to 0.05. 

`n` (`int`, optional): The maximum number of bars in the relative risk chart. Only this number of targets minus one will show, the rest will be aggregated into 'Other' regardless of `cutoff`. Must be a positive integer. 

`value` (`str`, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to 'importations'.

---
### Returns
`fig` (`plotly.graph_objects.Figure`): Plotly Figure containing visualization. 

---

## Fetching data from importation plots
`epidemic_intelligence.importation_plots` offers functions for extracting the data from importation plots to a pandas dataframe. These functions have one parameter `fig`, which is the plotly Figure generated by the graphing functions. 

---

### fetch_area_plot_data

#### Parameters
`fig` (`plotly.graph_objects.Figure`): Figure objected returned by `area_plot`. 

#### Returns
`df` (`pandas.DataFrame`): pandas dataframe containing data. 

---

### fetch_sankey_data

#### Parameters
`fig` (`plotly.graph_objects.Figure`): Figure objected returned by `sankey`. 

#### Returns
`df` (`pandas.DataFrame`): pandas dataframe containing data. 

---

### fetch_relative_risk_data

#### Parameters
`fig` (`plotly.graph_objects.Figure`): Figure objected returned by `relative_risk`. 

#### Returns
`df` (`pandas.DataFrame`): pandas dataframe containing data. 