# Wildfire Incident Data Collection
For wildfire incidence data collection we used [National Interagency Fire Center - WFIGS (Wildland Fire Interagency Geospacial Services Database)](https://data-nifc.opendata.arcgis.com/datasets/nifc::wfigs-wildland-fire-locations-full-history/explore?filters=eyJDb250cm9sRGF0ZVRpbWUiOls5OTg5MDM4NTU2ODcuNjEsMTY3MzA0NDMyMDAwMF19&location=37.093867%2C-117.175606%2C7.00&showTable=true).
The data collected from WFIGS includes the point locations for all reported wildland fires in the United States. However, the dataset we got focuses on South-Western portion of the US.

From their website: 
> The Wildland Fire Interagency Geospatial Services (WFIGS) Group provides authoritative geospatial data products under the interagency Wildland Fire Data Program. Hosted in the National Interagency Fire Center ArcGIS Online Organization (The NIFC Org), WFIGS provides both internal and public facing data, accessible in a variety of formats.
> This service contains all wildland fire incidents from the IRWIN (Integrated Reporting of Wildland Fire Information) integration service and historical data converted to the IRWIN schema.

In [1]:
# Library imports
import pandas as pd

In [2]:
# Import of the source data
df = pd.read_csv('../../data/raw/WFIGS_-_Wildland_Fire_Locations_Full_History (3).csv', low_memory = False)
df.head()

Unnamed: 0,X,Y,OBJECTID,ABCDMisc,ADSPermissionState,CalculatedAcres,ContainmentDateTime,ControlDateTime,DailyAcres,DiscoveryAcres,...,IsDispatchComplete,OrganizationalAssessment,StrategicDecisionPublishDate,CreatedOnDateTime_dt,ModifiedOnDateTime_dt,Source,GlobalID,IsCpxChild,CpxName,CpxID
0,-111.414812,40.072836,7,,DEFAULT,,2019/10/31 16:30:00+00,2019/11/05 18:30:00+00,170.0,0.1,...,0,Type 4 Incident,,2019/10/27 00:14:29+00,2019/11/13 00:15:39+00,IRWIN,{BFD53772-94E7-43F0-9D2C-62444A07CA68},,,
1,-112.439311,34.403275,13,,DEFAULT,,2019/09/09 17:00:00+00,2019/09/09 17:00:00+00,0.1,0.5,...,0,,,2019/09/05 20:14:11+00,2019/09/14 19:28:38+00,IRWIN,{E656CA4D-EECE-4746-AEE3-4D645C4F1F13},,,
2,-108.895411,40.239896,31,,DEFAULT,,2019/07/30 18:00:00+00,2019/08/03 14:00:00+00,90.0,1.0,...,0,,,2019/07/28 22:52:13+00,2019/08/10 18:31:55+00,IRWIN,{50C0D06E-E3DC-4094-BF22-B9D4B7BA68B1},,,
3,-108.552111,38.145376,35,,DEFAULT,,2018/07/28 03:14:59+00,2018/07/28 14:39:59+00,0.1,0.1,...,0,,,2018/07/28 17:50:47+00,2018/07/29 21:56:13+00,IRWIN,{496DBF20-6556-490D-8B22-1E79AEEE74C7},,,
4,-111.348611,33.195755,51,,DEFAULT,,2020/07/23 05:29:59+00,2020/07/23 05:29:59+00,8.0,2.5,...,0,,,2020/07/22 22:56:30+00,2020/08/09 00:10:34+00,IRWIN,{9E1157E6-8784-42A5-9F43-D740C4ED357F},,,


In [3]:
# Checking dimensions
df.shape

(33564, 96)

In [4]:
# Checking data types, and missing values
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33564 entries, 0 to 33563
Data columns (total 96 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   X                                33564 non-null  float64
 1   Y                                33564 non-null  float64
 2   OBJECTID                         33564 non-null  int64  
 3   ABCDMisc                         4181 non-null   object 
 4   ADSPermissionState               33564 non-null  object 
 5   CalculatedAcres                  967 non-null    float64
 6   ContainmentDateTime              33113 non-null  object 
 7   ControlDateTime                  33564 non-null  object 
 8   DailyAcres                       33139 non-null  float64
 9   DiscoveryAcres                   32900 non-null  float64
 10  DispatchCenterID                 33380 non-null  object 
 11  EstimatedCostToDate              1167 non-null   float64
 12  FinalFireReportApp

In [5]:
# Removing some columns that we do not need and ones that have mostly missing values
cols = ['FireBehaviorGeneral', 'FireBehaviorGeneral1', 'FireBehaviorGeneral2', 'FireBehaviorGeneral3', 'FireDepartmentID', 'FireStrategyConfinePercent', 'FireStrategyFullSuppPercent', 
        'FireStrategyMonitorPercent', 'FireStrategyPointZonePercent', 'FSJobCode', 'FSOverrideCode', 'GACC', 'ICS209ReportDateTime', 'ICS209ReportForTimePeriodFrom', 
        'ICS209ReportForTimePeriodTo', 'ICS209ReportStatus', 'IncidentManagementOrganization', 'WFDSSDecisionStatus', 'CreatedBySystem', 'ModifiedBySystem', 'IsDispatchComplete', 
        'OrganizationalAssessment', 'StrategicDecisionPublishDate', 'CreatedOnDateTime_dt', 'ModifiedOnDateTime_dt', 'Source', 'GlobalID', 'IsCpxChild', 'CpxName', 'CpxID', 
        'POOFips', 'POOJurisdictionalAgency', 'POOJurisdictionalUnit', 'POOJurisdictionalUnitParentUnit', 'POOLandownerCategory', 'POOLandownerKind', 'POOLegalDescPrincipalMeridian', 
        'POOLegalDescQtr', 'POOLegalDescQtrQtr', 'POOLegalDescRange', 'POOLegalDescSection', 'POOLegalDescTownship', 'POOPredictiveServiceAreaID', 'POOProtectingAgency', 'POOProtectingUnit']

df = df.drop(columns = cols)

In [6]:
# Adding id column to sync it with API results:
df['id'] = df.index

In [7]:
# Export of the collected dataset
df.to_csv('../../data/raw/wildfire_all.csv', index=False)