# FIREX Campaign Spatial Intersect Exercise

In this exercise we are going to be looking at data from the FIREX campaign and comparing it with wildfire data shapes from the USGS. The `geopandas` lesson page will be a helpful reference. https://rwegener2.github.io/sarp_lessons/lessons/tabular_data/3_geopandas.html

In [3]:
import geopandas as gpd
import pandas as pd
import numpy as np

In [4]:
firex_all = pd.read_csv('../data/firexaq-mrg60-dc8_merge_20190722_R1_thru20190905.csv', skiprows=676)

_Note: This file was originally a `.ict` file but I used the File Explorer to re-name it to a `.csv` file. Pandas can open `.ict` files just fine, but I find that Excel has a hard time with them. If you rename the same file from `.ict` to `.csv`, however, Excel will open it up._

In [68]:
firex_all.describe()

Unnamed: 0,Fractional_Day,Time_Start,Time_Stop,Day_Of_Year_YANG,Latitude_YANG,Longitude_YANG,MSL_GPS_Altitude_YANG,HAE_GPS_Altitude_YANG,Pressure_Altitude_YANG,Radar_Altitude_YANG,...,jBrONO2_BrO_NO2_CAFS_HALL,jBrONO2_Br_NO3_CAFS_HALL,jBrCl_Br_Cl_CAFS_HALL,jCHBr3_NoProductsSpecified_CAFS_HALL,smoke_age_HOLMES,smoke_age_corr_HOLMES,smoke_age_rise_HOLMES,smoke_rise_HOLMES,fire_distance_HOLMES,smoke_agemethod_HOLMES
count,9081.0,9081.0,9081.0,9081.0,9081.0,9081.0,9081.0,9081.0,9081.0,9081.0,...,9081.0,9081.0,9081.0,9081.0,9081.0,9081.0,9081.0,9081.0,9081.0,9081.0
mean,226.760197,81625.67889,81685.67889,225.815108,-1722.146676,-1869.220822,3156.048955,-2292.452929,15359.291967,12676.498769,...,-330.35958,-330.358731,-330.350287,-330.3598,-822610.04945,-822536.417093,-826164.556287,-825761.166757,-800607.89781,-826244.062768
std,13.324944,11555.139635,11555.139635,13.39038,41942.1806,41936.002445,42262.242721,84775.483466,9758.67121,20621.489347,...,18173.79062,18173.790636,18173.790789,18173.79,386958.865866,387131.448999,379073.762924,379953.825504,444493.726463,378901.670616
min,203.7625,52470.0,52530.0,203.0,-999999.0,-999999.0,-999999.0,-999999.0,508.5333,-999999.0,...,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0
25%,215.930556,73230.0,73290.0,215.0,34.776165,-116.70067,2541.03,2382.89,7842.78,5530.44,...,0.000104,0.000588,0.006327,1.928e-07,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0
50%,226.048611,81930.0,81990.0,225.0,38.273844,-112.260533,4742.48,4663.48,14643.2,10821.5,...,0.000207,0.00117,0.010991,8.174e-07,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0
75%,238.902778,90030.0,90090.0,238.0,45.356357,-95.753198,7715.07,7685.27,23975.3,20293.0,...,0.000256,0.001452,0.01298,1.246e-06,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0,-999999.0
max,248.965972,107670.0,107730.0,248.0,48.971215,-83.123175,12502.5,12475.5,38978.6,39361.4,...,0.000634,0.003593,0.032215,3.133e-06,112476.6,206980.5,796.85,9099.08,932455.4,7.0


In [69]:
firex_all = firex_all.replace({-999999: np.nan, -888888: np.nan, -66666: np.nan})

## Processing the FIREX Dataset

1. Convert the pandas dataframe into a geodataframe. Be sure to specify the CRS as `epsg:4326`.

2. How many rows and columns are in this dataframe?

3. We have a pretty big dataframe. Let's filter it down to just the following columns:
```
['Fractional_Day', ' Time_Start', ' Time_Stop', ' Day_Of_Year_YANG',
       ' Latitude_YANG', ' Longitude_YANG', ' Pressure_Altitude_YANG', ' Potential_Temp_YANG', 
       ' Sat_Vapor_Press_H2O_YANG', ' Smoke_flag_SCHWARZ', ' BC_mass_90_550_nm_SCHWARZ', 'geometry']
```

4. To keep filtering, let's filter it down to just the Fractional Day 203.

## Processing the Wildfires Shapefile

1. The `data` folder has a file at `wildfires_2019_usgs/wildfires_2019_usgs.shp` which contains all of the burn areas of fires in 2019. Open that file up using `gpd.read_file()` and give the filepath to that `.shp` file as an argument. Assign output to a variable called `wildfires`.

2. Look at the new dataframe. How many rows and columns does it have? What is the type of the geometry (Point, Polygon, Multipolygon, Line, etc.)?

3. What is the CRS of this dataframe?

4. If you run `wildfires.plot()` you'll notice that the points are very difficult or impossible to see. To make the shapes easier to see let's make them larger, or buffer them by 100,000m. Use the `.buffer(100000)` method on the dataframe and then plot it to see the shape of the data more clearly.

This image visually shows what a buffer operation does to points, lines, and polygons.

![Buffers](https://pro.arcgis.com/en/pro-app/2.8/tool-reference/analysis/GUID-267CF0D1-DB92-456F-A8FE-F819981F5467-web.png)

5. From that shape we can see that the wildfire data is spread out between the contiguous US (CONUS) Hawai'i and Alaska. To make our file more manegable let's cut it down to only the fires in CONUS.

In order to do that we are going to create a new shape that covers the area we want to keep and intersect the two areas. We are going to create a shape that is a box using `shapely`.

In [21]:
from shapely.geometry import box

In [None]:
conus = box(-3000000, -2000000, 2000000, 2000000)  # In units of the dataset

Now that we have a shape `conus` defined, use the `.intersects()` method on the `wildfires` dataframe with `conus` as an argument. This will return a True/False boolean array indicating if each row in the dataframe intersects the `conus` shape.

Use the True/False dataframe from the previous step to filter the `wildfires` dataframe.  Assign this dataframe to a new variable called `wildfires_conus`.

6. Plot the new `wildfires_conus` dataframe again, again buffering by 10000m, to confirm that you have spatially filtered the dataframe.

Nice work! You have used two spatial data operations -- buffer and intersects -- in Python AND used those to spatially filter a dataframe!