# Fire perimeter data retrieval and selection

#### Author: Naomi Moraes
#### Link: https://github.com/nmoraescommit/eds220-hw4/tree/main

## About
- Purpose: The purpose of this notebook is to explore, clean, and analyze the California fire perimeter shapefile, published by CAL FIRE. This is to obtain the Thomas Fire perimeter boundary for use in "hwk4-task2-false-color-MORAES".
- Highlights: Working with this dataset was illuminating in looking at how state agencies store fire data and the aspects state agecies deem important to record, the fact that there is a start and end date to the observations, as well as learning how to store updated shape files. I consider the practice in setting up an entirely new project from scratch to be quite valuable, along with the process of independantly learning to access data for a continous workflow, important.
- About the data: This dataset was published and maintained by CAL FIRE, but accessed through Data.gov.  The statewide fire history geospatial dataset is updated annually from the previous fire season, during spring, from units across the state and cooperating agencies. The first version was released in May 2015 - according the the CalFire site.
- References: 
    - Fire perimeter data: Publisher CAL FIRE. (2024, May 14). State of California - california fire perimeters (all). Catalog. https://catalog.data.gov/dataset/california-fire-perimeters-all-b3436 
    - Assignment Reference and Cleaned Landsat Data Access : Galaz-Garcia, C. (n.d.). Assignment 4. assignment4 – EDS 220 - Working with Environmental Datasets. https://meds-eds-220.github.io/MEDS-eds-220-course/assignments/assignment4.html 

## Set-Up

In this section we will import the appropriate libraries and data to complete this workbook.

### Import Libraries

In [1]:
# Import libraries
import os
import pandas as pd
import geopandas as gpd

# Display all columns when looking at dataframes
pd.set_option("display.max.columns", None)

### Import Data

In [12]:
# Create data filepath
fp = os.path.join('data','California_Fire_Perimeters_(all).shp')

# Create dataframe for CA fire perimeter shapefile
ca_fire_perimeter = gpd.read_file(fp)

## Explore Data

In this section we will take a preliminary look at the imported fire perimeter data - in order to understand how to extract the Thomas Fire perimenter data.

In [3]:
# Check dataframe head
ca_fire_perimeter.head(3)

Unnamed: 0,YEAR_,STATE,AGENCY,UNIT_ID,FIRE_NAME,INC_NUM,ALARM_DATE,CONT_DATE,CAUSE,C_METHOD,OBJECTIVE,GIS_ACRES,COMMENTS,COMPLEX_NA,IRWINID,FIRE_NUM,COMPLEX_ID,DECADES,geometry
0,2023,CA,CDF,SKU,WHITWORTH,4808,2023-06-17,2023-06-17,5,1,1,5.72913,,,{7985848C-0AC2-4BA4-8F0E-29F778652E61},,,2020,"POLYGON ((-13682443.000 5091132.739, -13682445..."
1,2023,CA,LRA,BTU,KAISER,10225,2023-06-02,2023-06-02,5,1,1,13.6024,,,{43EBCC88-B3AC-48EB-8EF5-417FE0939CCF},,,2020,"POLYGON ((-13576727.142 4841226.161, -13576726..."
2,2023,CA,CDF,AEU,JACKSON,17640,2023-07-01,2023-07-02,2,1,1,27.8145,,,{B64E1355-BF1D-441A-95D0-BC1FBB93483B},,,2020,"POLYGON ((-13459243.000 4621236.000, -13458968..."


In [4]:
# Check dataframe tail
ca_fire_perimeter.tail(3)

Unnamed: 0,YEAR_,STATE,AGENCY,UNIT_ID,FIRE_NAME,INC_NUM,ALARM_DATE,CONT_DATE,CAUSE,C_METHOD,OBJECTIVE,GIS_ACRES,COMMENTS,COMPLEX_NA,IRWINID,FIRE_NUM,COMPLEX_ID,DECADES,geometry
22258,0,CA,CCO,MRN,UKNOWN,,1899-12-30,1899-12-30,14,6,1,2927.24,1917-34(Yr Not Report)MarinCo FireChief Garber...,,,,,0,"POLYGON ((-13658666.186 4605853.097, -13658738..."
22259,0,CA,CCO,MRN,UKNOWN,,1899-12-30,1899-12-30,14,6,1,62.0127,1917-34(Yr Not Report)MarinCo FireChief Garber...,,,,,0,"POLYGON ((-13644249.827 4580277.586, -13644243..."
22260,0,CA,CCO,MRN,UKNOWN,,1899-12-30,1899-12-30,14,6,1,40.0137,1917-34(Yr Not Report)MarinCo FireChief Garber...,,,,,0,"POLYGON ((-13640708.376 4580839.378, -13640603..."


In [5]:
# Check columns
ca_fire_perimeter.columns

Index(['YEAR_', 'STATE', 'AGENCY', 'UNIT_ID', 'FIRE_NAME', 'INC_NUM',
       'ALARM_DATE', 'CONT_DATE', 'CAUSE', 'C_METHOD', 'OBJECTIVE',
       'GIS_ACRES', 'COMMENTS', 'COMPLEX_NA', 'IRWINID', 'FIRE_NUM',
       'COMPLEX_ID', 'DECADES', 'geometry'],
      dtype='object')

In [6]:
# Check column datatypes
ca_fire_perimeter.dtypes

YEAR_            int64
STATE           object
AGENCY          object
UNIT_ID         object
FIRE_NAME       object
INC_NUM         object
ALARM_DATE      object
CONT_DATE       object
CAUSE            int64
C_METHOD         int64
OBJECTIVE        int64
GIS_ACRES      float64
COMMENTS        object
COMPLEX_NA      object
IRWINID         object
FIRE_NUM        object
COMPLEX_ID      object
DECADES          int64
geometry      geometry
dtype: object

In [7]:
# Check CRS - and type
ca_fire_perimeter.crs

<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

### Data Exploration Analysis

After the exploration of the California perimeter shape file, imported into this notebook, I see that is has 22,260 fire perimeter observations with columns for varied descriptive markers including: year, fire name, alarm date, and geometery. I observe that some of the column data types int64 and float64, however, I may want to change the date columns into datetime objects for manipulation. The CRS of this shapefile is a projected coordinate reference system, EPSG: 3857 and is a popular crs for web mapping services.

## Clean Data

In this section, I will convert the column names to a lower snake case, as well as the date columns to DateTime objects - for ease of future data manipulation.

In [8]:
# Simplify column names by replacing spaces and no capitilization
ca_fire_perimeter.columns = (ca_fire_perimeter.columns
                  .str.lower()
                  .str.replace(' ','_')
                )

# Make dates into DateTime object
ca_fire_perimeter.alarm_date = pd.to_datetime(ca_fire_perimeter.alarm_date)
ca_fire_perimeter.cont_date = pd.to_datetime(ca_fire_perimeter.cont_date)

## Thomas Fire Boundary

Here, I will select for the Thomas Fire Boundary (2017), and save it as a new geospatial file.

In [9]:
# Select Thomas Fire in 2017
thomas_fire_boundary = ca_fire_perimeter[(ca_fire_perimeter['alarm_date'] > '2016-12-31') & 
                                         (ca_fire_perimeter['alarm_date'] < '2018-01-01') &
                                         (ca_fire_perimeter['fire_name'] == 'THOMAS')]

In [10]:
# View dataframe
thomas_fire_boundary

Unnamed: 0,year_,state,agency,unit_id,fire_name,inc_num,alarm_date,cont_date,cause,c_method,objective,gis_acres,comments,complex_na,irwinid,fire_num,complex_id,decades,geometry
2654,2017,CA,USF,VNC,THOMAS,3583,2017-12-04,2018-01-12,9,7,1,281791.0,CONT_DATE based on Inciweb,,,,,2010,"MULTIPOLYGON (((-13316089.016 4088553.040, -13..."


In [11]:
# Save dataframe as geospatial file in /data folder
thomas_fire_boundary.to_file('data/thomas_fire_boundary.geojson', driver = 'GeoJSON')

### File Shape Explanation

I chose to convert the alarm_date and cont_date variables into DateTime objects, and wanted them to retain that data type. As I would need to convert DateTime objects back into strings to save as shapefile, I chose to store the new data frame as a GeoJSON file.