# 1. Data Collection and Data Merging
### (Wildfire data from __[NASA FIRMS](https://firms.modaps.eosdis.nasa.gov/download/)__ and __[Calfire](https://www.fire.ca.gov/incidents#incidentdisclaimer)__, weather data from __[POWER](https://power.larc.nasa.gov/data-access-viewer/)__)
#### Note: all codes here are actually run in a local computer in order not to break the server.

## 1.1 Merging Data

The raw data from NASA FIRMS are seperated in individual csv files containing data yearly. So, all csv files need to be combine into one complete csv first. Then, we filtered out the fire data for california only according to the following code: (output as `'modis_US.csv'`)

In [None]:
import pandas as pd
import os


# list of all files in the folder (There are too many original csv files, and in order not to break the server,
# they are not uploaded here and in fact this combination is run locally. Also, likewise for the 'weather.csv' below).
all_files = os.listdir()

# Filter out non-CSV files
csv_files = [f for f in all_files if f.endswith('.csv')]

# Create a list to hold the dataframes
df_list = []

for csv in csv_files:
    try:
        # Try reading the file using default UTF-8 encoding
        df = pd.read_csv(csv)
        df = df.loc[
            (df['latitude'] >= 32) & (df['latitude'] <= 42) & (df['longitude'] >= -124) & (df['longitude'] <= -114) & (df['type'] == 0)
        ]
        df_list.append(df)
    except UnicodeDecodeError:
        try:
            # If UTF-8 fails, try reading the file using UTF-16 encoding with tab separator
            df = pd.read_csv(csv, sep='\t', encoding='utf-16')
            df_list.append(df)
        except Exception as e:
            print(f"Could not read file {csv} because of error: {e}")
    except Exception as e:
        print(f"Could not read file {csv} because of error: {e}")

# Concatenate all data into one DataFrame
big_df = pd.concat(df_list, ignore_index=True)
fire_df_1 = big_df.loc[big_df['longitude'] <= -120]
fire_df_2 = big_df.loc[(big_df['longitude']>=-120) & (big_df['longitude'] < -1.25 * big_df['latitude'] - 71.25)]
big_df = pd.concat([fire_df_1, fire_df_2], ignore_index=True)

# Save the final result to a new CSV file
big_df.to_csv('nasa_usa_fire_data/modis_US.csv', index=False)

|     Attribute |Short Description| Long Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|--------------|-----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|      Latitude | Latitude                                                                                                    | Center of 1 km fire pixel, but not necessarily the actual location of the fire as one or more fires can be detected within the 1 km pixel.                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|     Longitude | Longitude                                                                                                   | Center of 1 km fire pixel, but not necessarily the actual location of the fire as one or more fires can be detected within the 1 km pixel.                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|    Brightness | Brightness temperature 21 (Kelvin)| Channel 21/22 brightness temperature of the fire pixel measured in Kelvin.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|          Scan | Along Scan pixel size                                                                                       | The algorithm produces 1 km fire pixels, but MODIS pixels get bigger toward the edge of scan. Scan and track reflect actual pixel size.                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|        Track	 |Along Track pixel size| 	The algorithm produces 1 km fire pixels, but MODIS pixels get bigger toward the edge of scan. Scan and track reflect actual pixel size.                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|     Acq_Date	 |Acquisition Date| 	Data of MODIS acquisition.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|     Acq_Time	 |Acquisition Time| 	Time of acquisition/overpass of the satellite (in UTC).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|    Satellite	 |Satellite| 	A = Aqua and T = Terra.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|    Confidence |	Confidence (0-100%)	| This value is based on a collection of intermediate algorithm quantities used in the detection process. It is intended to help users gauge the quality of individual hotspot/fire pixels. Confidence estimates range between 0 and 100% and are assigned one of the three fire classes (low-confidence fire, nominal-confidence fire, or high-confidence fire).                                                                                                                                                                                                                                               |
|       Version |	Version (Collection and source)	| Version identifies the collection (e.g., MODIS Collection 6.1) and source of data processing (Ultra Real-Time (URT suffix added to collection), Real-Time (RT suffix), Near Real-Time (NRT suffix) or Standard Processing (collection only). For example:<br>"6.1URT" - Collection 6.1 Ultra Real-Time processing.<br>"6.1RT" -  Collection 6.1 Real-Time processing.<br>"6.1NRT" - Collection 61 Near Real-Time processing.<br>"6.1" - Collection 61 Standard processing.<br>Find out more on collections and on the differences between FIRMS data sourced from LANCE FIRMS and the University of Maryland. |
|   Bright_T31	 |Brightness temperature 31 (Kelvin)	| Channel 31 brightness temperature of the fire pixel measured in Kelvin.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|          FRP	 |Fire Radiative Power (MW - megawatts)	| Depicts the pixel-integrated fire radiative power in MW (megawatts).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|        Type*	 |Inferred hot spot type	| 0 = presumed vegetation fire<br>1 = active volcano<br>2 = other static land source<br>3 = offshore                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|     DayNight	 |Day or Night	|D= Daytime fire, N= Nighttime fire|

In [10]:
big_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 320144 entries, 0 to 320143
Data columns (total 15 columns):
 #   Column      Non-Null Count   Dtype         
---  ------      --------------   -----         
 0   latitude    320144 non-null  float64       
 1   longitude   320144 non-null  float64       
 2   brightness  320144 non-null  float64       
 3   scan        320144 non-null  float64       
 4   track       320144 non-null  float64       
 5   acq_date    320144 non-null  datetime64[ns]
 6   acq_time    320144 non-null  int64         
 7   satellite   320144 non-null  object        
 8   instrument  320144 non-null  object        
 9   confidence  320144 non-null  int64         
 10  version     320144 non-null  float64       
 11  bright_t31  320144 non-null  float64       
 12  frp         320144 non-null  float64       
 13  daynight    320144 non-null  object        
 14  type        320144 non-null  int64         
dtypes: datetime64[ns](1), float64(8), int64(3), object(

Likewise, we used the same code merging individual weather data (output as `'weather.csv'` in `California_POWER` folder)

## 1.2 Data Merging

Once we get the `'modis_US.csv'` and `'weather.csv'`, along with another complete data set from __CalFire__, we can start merging data three of them into one table (output as `'weather&fire_2000-2022.csv'` in `fireclassification` folder).

In [None]:
import numpy as np


# read the csv files
fire_df = pd.read_csv('nasa_usa_fire_data/modis_US.csv')
weather_df = pd.read_csv('California_POWER/weather.csv')
calfire_df = pd.read_csv('nasa_usa_fire_data/calfire.csv')

# manipulate the data types and round the longitude and latitude to the nearest .25 and .75 in order to map with the weather data
# (all longitude and latitude in weather data are evenly spread and in the form of xx.25 and xx.75)
fire_df['acq_date'] = pd.to_datetime(fire_df['acq_date'])
fire_df['latitude_round_2'] = ((fire_df['latitude']+0.25) * 2).round(0)/2 - 0.25
fire_df['longitude_round_2'] = ((fire_df['longitude']+0.25) * 2).round(0)/2 - 0.25
fire_df = fire_df[['longitude_round_2', 'latitude_round_2', 'acq_date']]
fire_df.drop_duplicates(inplace=True)

# manipulate the data types and column names for merging
weather_df = weather_df.rename(columns={'YEAR': 'year', 'MO': 'month', 'DY': 'day'})
weather_df['date'] = pd.to_datetime(weather_df[['year', 'month', 'day']])
fire_df = fire_df.rename(columns={'latitude_round_2': 'LAT', 'longitude_round_2': 'LON', 'acq_date': 'date'})
df = pd.merge(weather_df, fire_df, on=['LAT', 'LON', 'date'], how='left', indicator='Fire')
# if there is data of wildfire then yes, otherwise no
df['Fire'] = np.where(df.Fire == 'both', True, False)
df.drop_duplicates(inplace=True)

# round the longitude and latitude in calfire dataset in the same format, and rename columns for merging as well
calfire_df['latitude_round_2'] = ((calfire_df['incident_latitude']+0.25) * 2).round(0)/2 - 0.25
calfire_df['longitude_round_2'] = ((calfire_df['incident_longitude']+0.25) * 2).round(0)/2 - 0.25
calfire_df = calfire_df.rename(columns={'latitude_round_2': 'LAT', 'longitude_round_2': 'LON', 'incident_dateonly_created': 'date'})
calfire_df['date'] = pd.to_datetime(calfire_df['date'])
df = pd.merge(df, calfire_df, on=['LAT', 'LON', 'date'], how='left', indicator='indicator')
# if some are missing in the NASA FIRMS data but in calfire data, here returns true as well
df.loc[df['indicator'] == 'both', 'Fire'] = True
df.drop(columns=['indicator'], inplace=True)
df.to_csv('weather&fire_2000-2022.csv', index=False)

## 2. Visualisation

we visualise the data by plotting a hotspot graph using seaborn and also several pie charts of cause of wildfire

### 2.1 Heatmap using seaborn (output as `'california_hotspot.png'`

In [None]:
import matplotlib.pyplot as plt
import matplotlib.style as style

import seaborn as sns


# read the csv files
fire_df = pd.read_csv('nasa_usa_fire_data/modis_US.csv')

# plot the heatmap accroding to longitude and latitude 
fire_df = fire_df[['longitude','latitude']]
fire_df_3 = fire_df_3.round(1)
a = fire_df_3.pivot_table(index='latitude', 
                   columns='longitude', 
                   values='Count', 
                   aggfunc='sum')
ax = sns.heatmap(a.iloc[::-1])
plt.scatter(fire_df_3.longitude, fire_df_3.latitude,s=fire_df_3.Count)
plt.grid()
plt.show()

### 2.2 Pie charts (output as `'fire_classification1.png'` and `'fire_classification2.png'`)

In [None]:
# use wildfire dataset from kaggle webisite
df = pd.read_csv('FW_Veg_Rem_Combined.csv')

# california data
df = df.loc[df['state'] == 'CA']
df['Count'] = 1

# manipulate the dataframe
a = df.groupby(['fire_size_class']).Count.count().reset_index()
a = a.set_index('fire_size_class')
a.rename(columns={"Fire Classification": ""}, inplace=True)

# graph plotting 
plt.figure(0)
a.plot.pie(y='', figsize=(5, 5), autopct='%.1f%%')
plt.title(f'Fire Classification')
plt.tight_layout()
plt.savefig('fire_classification1.png')

In [None]:
# get sub dataframes of calss B, C, D, E, F and G
df_b = df.loc[df['fire_size_class'] =='B']
b = df_b.groupby(['stat_cause_descr']).Count.count().reset_index()
df_c = df.loc[df['fire_size_class'] =='C']
c = df_c.groupby(['stat_cause_descr']).Count.count().reset_index()
df_d = df.loc[df['fire_size_class'] =='D']
d = df_d.groupby(['stat_cause_descr']).Count.count().reset_index()
df_e = df.loc[df['fire_size_class'] =='E']
e = df_e.groupby(['stat_cause_descr']).Count.count().reset_index()
df_f = df.loc[df['fire_size_class'] =='F']
f = df_f.groupby(['stat_cause_descr']).Count.count().reset_index()
df_g = df.loc[df['fire_size_class'] =='G']
g = df_g.groupby(['stat_cause_descr']).Count.count().reset_index()
b.rename(columns={"Count": "B"}, inplace=True)
b = b.set_index('stat_cause_descr')
c.rename(columns={"Count": "C"}, inplace=True)
c = c.set_index('stat_cause_descr')
d.rename(columns={"Count": "D"}, inplace=True)
d = d.set_index('stat_cause_descr')
e.rename(columns={"Count": "E"}, inplace=True)
e = e.set_index('stat_cause_descr')
f.rename(columns={"Count": "F"}, inplace=True)
f = f.set_index('stat_cause_descr')
g.rename(columns={"Count": "G"}, inplace=True)
g = g.set_index('stat_cause_descr')

# combine sub dataframes into one
df_class = pd.concat([b.T,c.T,d.T,e.T,f.T,g.T])

# graph plotting
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
for i, (idx, row) in enumerate(df_class.iterrows()):
    ax = axes[i // 3, i % 3]
    row = row[row.gt(row.sum() * .01)]
    ax.pie(row, labels=row.index, startangle=30, textprops={'fontsize': 8})
    ax.set_title(f'Class: {idx}')
plt.tight_layout()
fig.subplots_adjust(wspace=.2)
plt.savefig('fire_classification2.png')