---
# Welcome to the interflow Quickstarter!
---
### Introduction
**interflow** is an open-source python package for collecting, calculating, organizing, and visualizing cross-sectoral resource interdependencies and flows.

This Jupyter Notebook serves a ready-to-go introduction to the package using sample water and energy data for United States counties for the year 2015.

For more information on the package, including the user guide, the methodology behind the sample data, and other information, please visit [the interflow documentation](https://pnnl.github.io/interflow/index.html)

To visit the GitHub repository where interflow is stored, [go here](https://github.com/pnnl/interflow)

## Importing the package

Click on the cell below and hit ctrl-enter to import the interflow package and its modules.

In [None]:
# import the package
import interflow

## Loading sample data
The interflow package comes with sample data for all counties in the United States for the year 2015. To load the sample input data, run the cell below.

In [None]:
# read in the sample data
data_input = interflow.read_sample_data()

## Running the model
Now that our input data is prepared, we can run some or all of it through the model to start collecting, computing, and organizing our water and energy flows.

#### Selecting a Region
The US sample data comes with data for analyzing over 3,000 different US counties. The interflow package is capable of running an individual county at a time or the entire dataset of counties. The cell below selects a single county to run through the model. The counties are presented here under their Federal Information Processing Standards (FIPS) code rather than a name. The cell below sets the region for analysis equal to the FIPS code for New York County, NY (36061). 

To select other counties from this dataset, any FIPS code can be chosen from the input dataset or retrieved from the list presented here: [County FIPS List](https://pnnl.github.io/interflow/county_list.html)

#### Running the model for a single region
Run the cell below to run the model for the select region

In [None]:
# set the region equal to the FIPS code for New York County, NY
region = '36061'

In [None]:
# run the model for the select region
output = interflow.calculate(data=data_input, region_name=region)

## Observing the output dataset
The output dataset is a Pandas DataFrame of flow values between the source sector node (Columns S1 through S5) to the target sector node (Columns T1 through T5) in indicated units. The number after S or T indicates the level of sector granularity where S1 is the major source sector name, S2 is the subsector/application under that sector for that row.

The cell below shows the first five rows of the output. We can read the first row of output as the flow value between the S (source) node to the T (target) node for the county specified.

In [None]:
output.head()

The US sample data uses acronyms for the major sector names. The table in the appendix of this quickstarter shows the definition of each of these abbreviations.

## Visualizing the flows between sectors
The output dataset itself provides the values between nodes for both water and energy, however, it is not very intuitive on its own for understanding the relationships between nodes and how resources pass from one to the next. The various visualization tools integrated into the model can help with this.

### Sankey Diagrams
Sankey diagrams show flows between nodes and are able to represent how resources are passed along in a network. The cell below will produce two sankey diagrams with the sample data run output for the indicated region, one for water flows (given in million gallons per day) and one for energy flows (given in billion british thermal units per day).

Only one region can be shown at a time. To see the sankey diagrams for an alternative county, change the county code  up above and re-rerun the .calculate() function cell to update the output that is fed into the cell below.

The sankey diagrams are capable of being produced at different levels of granularity. The 'output_level' parameter in the '.plot_sankey() function adjusts. The output_level has been set to level 1 below to start to show the lowest level of granularity available. Changing this value to an integer between 1 and 5 inclusive will change the diagram to split out flows to that level of granularity.

For more information on this output, see the [key outputs documentation](https://pnnl.github.io/interflow/user_guide.html#single-unit-sankey-diagrams)

In [None]:
# plot sankey diagrams for water and energy
viz = interflow.plot_sankey(data=output, region_name= region, 
                       unit_type1 = 'mgd', unit_type2='bbtu', output_level=1, strip='total')

### Stacked Sector Bar Charts
In addition to sankey diagrams which show the flows from sector to sector, it's useful to see the flow breakdown within each sector. For more information on this output, see the [key outputs documentation](https://pnnl.github.io/interflow/user_guide.html#single-region-stacked-barcharts-of-sectors)

#### Inflow bar charts
The plot_sector_bar() function allows us to see the breakdown of inflows or outflows to a sector broken up by its subsectors/applications. Setting inflow equal to True will display the values by subsector for each sector in the specified unit. Additionally, the chosen units can be adjusted. The code below is currently set to display energy (bbtu) flows for the given county. Changing the 'unit_type' parameter to 'mgd' for the sample data will show the water flow values instead for the indicated sectors.

The sectors shown below include the electricity generation sector (EGS) and the residential sector (RES). To adjust the list of sector included for the chosen county, see the acronym list at the end of this notebook.

In [None]:
# create a list of sectors that you want to see a barchart of
sectors = ['EGS', 'RES']

In [None]:
# plot a stacked barchart of inflows to the specified sectors
interflow.plot_sector_bar(data=output, unit_type='bbtu', region_name =region, 
                     sector_list=sectors, inflow=True, strip='total')

#### Outflow barcharts
To observe where outflows from the sector as a whole end up you can set the inflow parameter to False, as shown below.

In [None]:
# plot a stacked barchart of inflows to the specified sectors
interflow.plot_sector_bar(data=output, unit_type='bbtu', region_name =region, 
                     sector_list=sectors, inflow=False, strip='total')

### Regional Shaded Maps
Now that we've looked at the values across all sectors within a specific region and the values within specific sectors in a  region, we can additionally look at how values compare across all regions.

The .plot_map() function generates a choropleth map where the included regions are shaded according to the value of the chosen flow. In the above visualization examples we've only run the model for one of the 3,000+ regions available. The .plot_map() function will only display regions you give it. Therefore, running the map with our current output would only show one region shaded. To shade all counties in the US, the full run output for all counties needs to be supplied.

To avoid the computation time required to run the model for all 3,000+ counties here, the full output for all counties has been created and stored in the repository files and can be loaded below.

Note that this function additionally requires a geoJSON datafile which is also included in the package datafiles for US counties. 

For more information on this visualization, visit the [key output documentation](https://pnnl.github.io/interflow/user_guide.html#choropleth-map-displaying-single-flow-values-across-regions)

#### Load map data

In [None]:
# load full sample data output for all counties
full_output = interflow.load_sample_data_output()

# load GeoJSON file of counties
geo = interflow.load_sample_geojson_data()

#### Generate Choropleth Map
The choropleth map comes with a dropdown menu of flow values from node to node. Selecting a new value will update the map. Additionally, the map can be generated for various levels of data granularity which will update the dropdown menu to reflect this. The map is currently configured to display flow values at level 2 granularity.

In [None]:
# plot flow values in a choropleth map at level 2 granularity
interflow.plot_map(data=full_output, jsonfile = geo, level=2)

---
## Appendix

### 1. Useful Links

#### [Interflow GitHub Repository](https://github.com/pnnl/interflow)

#### [Interflow Documentation](https://pnnl.github.io/interflow/)

#### [Sample Data Methodology and References](https://pnnl.github.io/interflow/sample_data.html)


### 2. Sample Data Acronym Guide

| Acronym | Description|
| --- | --- |
| AGR | Agriculture Sector|
| CVL | Conveyance Losses |
| COM | Commercial Sector |
| CMP | Consumption/Evaporation |
| EGS | Electricity Generation Supply |
| EPD | Energy Production Demand |
| ESV | Energy Services |
| GWD | Ground Discharge |
| IND | Industrial Sector |
| INX | Discharge to Industrial Sector |
| IRX | Discharge to Irrigation |
| MIN | Mining Sector |
| OCD | Ocean Discharge |
| PRD | Produced Water |
| PWD | Public Water Demand |
| PWI | Public Water Imports |
| PWS | Public Water Supply |
| PWX | Public Water Exports |
| REJ | Rejected Energy |
| RES | Residential Sector |
| SRD | Surface Discharge |
| TRA | Transportation Sector |
| WSW | Water Supply Withdrawals |
| WWD | Wastewater Treatment |
| WWI | Wastewater Imports |
| WWS | Wastewater Supply |
