# Geoprocessing

---

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Geoprocessing" data-toc-modified-id="Geoprocessing-1">Geoprocessing</a></span><ul class="toc-item"><li><span><a href="#Code-from-Notebook-1,-needed-for-in-memory-objects-for-Geoprocessing" data-toc-modified-id="Code-from-Notebook-1,-needed-for-in-memory-objects-for-Geoprocessing-1.1">Code from Notebook 1, needed for in-memory objects for Geoprocessing</a></span></li></ul></li><li><span><a href="#Geoprocessing" data-toc-modified-id="Geoprocessing-2">Geoprocessing</a></span><ul class="toc-item"><li><span><a href="#Point-in-Polygon-Counts-via-Spatial-Join" data-toc-modified-id="Point-in-Polygon-Counts-via-Spatial-Join-2.1">Point-in Polygon Counts via Spatial Join</a></span><ul class="toc-item"><li><span><a href="#Join-Features---Esri-Description" data-toc-modified-id="Join-Features---Esri-Description-2.1.1">Join Features - Esri Description</a></span></li><li><span><a href="#Next" data-toc-modified-id="Next-2.1.2">Next</a></span></li></ul></li></ul></li></ul></div>

In [46]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [47]:
import glob
import geopandas as gpd
import pandas as pd
import seaborn as sns
import missingno as msno
import matplotlib.pyplot as plt

In [48]:
from tools.tools import read_json, get_current_time
from capstone.etl.viirs_join_basins import viirs_join_basins, compile_basin_data
from capstone.etl.census_parse import parse_census
from capstone.etl.census_retrieval import census_retrieval
from capstone.etl.generate_basins import generate_us_basins
from capstone.etl.eia_retrieval import eia_retrieval
from capstone.etl.eia_parse import eia_parse_county, eia_parse_data

In [49]:
config = read_json('../config.json')

current_date = get_current_time('yyyymmdd')

wd = f"{config['workspace_directory']}/data"

In [50]:
plt.style.use('ggplot')

In [51]:
basin_colors_hex = {  # manually defined dictionary of EIA basin-level standardized colors 
    "Anadarko Region":    "#2BA2CF", 
    "Appalachia Region":  "#769F5D",
    "Bakken Region":      "#F6C432", 
    "Eagle Ford Region":  "#48366B", 
    "Haynesville Region": "#807B8F",
    "Niobrara Region":    "#9D3341",
    "Permian Region":     "#6F4B27",
}

## Code from Notebook 1, needed for in-memory objects for Geoprocessing

In [52]:
census_shp = census_retrieval(f"{wd}/input/census")
census = gpd.read_file(census_shp)
census.columns = [c.lower() for c in census.columns]

eia_xls = eia_retrieval(f"{wd}/input/eia")
eia_cnty = eia_parse_county(eia_xls)
eia_data = eia_parse_data(eia_xls)  # parse the target variable(s) data

census_gdf = parse_census(census_shp)
basins_list, all_basins = generate_us_basins(
    census_gdf,
    eia_cnty,
    f"{wd}/input/basins",
)  # this code creates individual files for basin geographies as well as an all_basins geography file/object.

 parse eia data
    for anadarko region
    for appalachia region
    for bakken region
    for eagle ford region
    for haynesville region
    for niobrara region
    for permian region
generating us basins
    permian region
    appalachia region
    haynesville region
    eagle ford region
    anadarko region
    niobrara region
    bakken region


# Geoprocessing

In [56]:
# get lists of all the retrieved viirs data for both 2.1c and 3.0 viirs

viirs_2_1c_files = glob.glob(f"{wd}/input/viirs21c/*.csv")  # get viirs
viirs_2_1c_files.sort()  # sort so dates are consecutive for tracking

print(f'Total 2.1c files: {len(viirs_2_1c_files)}')

viirs_3_0_files = glob.glob(f"{wd}/input/viirs30/*.csv")  # get viirs files
viirs_3_0_files.sort()  # sort so dates are consecutive for tracking

print(f'Total 3.0 files: {len(viirs_3_0_files)}')

Total 2.1c files: 2095
Total 3.0 files: 829


## Point-in Polygon Counts via Spatial Join

While Join Features tool was not used (rather GeoPandas S-Join for Spatial Join), this illustration better shows how a given geography 2d or 3d polygon, is intersected with points, we can count those features inside. 

### Join Features - Esri Description 

> Joins attributes from one layer to another based on spatial, temporal, or attribute relationships, or a combination of those relationships. [https://pro.arcgis.com/en/pro-app/tool-reference/geoanalytics-desktop/join-features.htm](https://pro.arcgis.com/en/pro-app/tool-reference/geoanalytics-desktop/join-features.htm)

[![join](https://pro.arcgis.com/en/pro-app/tool-reference/geoanalytics-desktop/GUID-EB8FA998-105A-4D93-93E3-5FAA1057137D-web.png)](https://pro.arcgis.com/en/pro-app/tool-reference/geoanalytics-desktop/GUID-EB8FA998-105A-4D93-93E3-5FAA1057137D-web.png)


Geopandas code inside `tools.geoprocessing.py` which is used inside `viirs_join_basins(...)`  in this project repository:
```python
import geopandas as gpd


def point_in_polygon(point_gdf, poly_gdf):
    return gpd.sjoin(
        point_gdf,
        poly_gdf,
        how="inner",
        op='intersects',  # warning CRS of frames do not match
    )
```

In [58]:
viirs_join_basins( 
    wd,
    all_basins,
    viirs_2_1c_files,
    '21c',
)   # spatially join viirs 2.1c to basins geometries

viirs_join_basins(
    wd,
    all_basins,
    viirs_3_0_files,
    '30',
)  # spatially join viirs 3.0 to basins geometries

selecting viirs for basins 21c
    20130101
    20140101
    20150101
    20160101
    20170101
selecting viirs for basins 30
    20180101
    20190101
    20200101


In [59]:
basins_int_viirs_21c = compile_basin_data(wd, '21c')
basins_int_viirs_30  = compile_basin_data(wd, '30')  
# above function saves master compiled 2.1c and 3.0 files

    20130101
    20140101
    20150101
    20160101
    20170101
    20180101
    20190101
    20200101


In [60]:
print(basins_int_viirs_21c.shape)
print(basins_int_viirs_30.shape)  # check the shapes

(1009001, 129)
(525536, 46)


### Next

[Feature Engineering and Exploratory Data Analysis of Processed Data](https://git.generalassemb.ly/danielmartinsheehan/capstone/blob/master/notebooks/04_feature_engineernig_and_exploratory_data_analysis_processed_data.ipynb)