# Initial Satelitte Data retrieval

The following dataset was gathered from [NASA FIRMs website](https://firms.modaps.eosdis.nasa.gov/download/) and encases all fire anomalies between 2015 and 2019 in Northern California. The initial data cleaning that follows will narrow down the scope of our search to Northern California using the proper longitute and latitude ranges comprising a square area of approximately 80,000 km^2. All anomalies contained in the final dataframe should be over land, and also with a confidence rating of over 75%. This confidence rating is a measurement of how sure that the satellite succesfully detected a fire anomaly. It should be noted that not all fire anomalies equate to wildfires.

The resulting dataframe we will use to query our Google API to retrieve satellite images that have experienced fires over the last 5 years. We will then try to use these images to build a CNN that is able to determine the probability that an area has experience a wildfire, and thus when fed a test image is able to determine a probability that this area will also experience a wildfire event.

# Importing Neccesary Libraries and Packages


In [1]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import requests
import random

import urllib.request
import warnings
warnings.filterwarnings('ignore')

from tqdm import tqdm
import os

The following csv's were downloaded from https://firms.modaps.eosdis.nasa.gov/country/. This archive contains archives of all fire anomalies recorded by the Modis intsrument satellites over the entire earth. To get each relevant dataset I merely selected the year, and the country, the United States, in which our target area (Northern California) was located. Thus each dataset you see below contains all the fire anomalies recorded over the US for each labeled year.

In [2]:
df_2015 = pd.read_csv('../data/modis_2015_United_States.csv')
df_2016 = pd.read_csv('../data/modis_2016_United_States.csv')
df_2017 = pd.read_csv('../data/modis_2017_United_States.csv')
df_2018 = pd.read_csv('../data/modis_2018_United_States.csv')
df_2019 = pd.read_csv('../data/modis_2019_United_States.csv')

Let's condense all of our dataframes into a single one so we can perform the proper masks in 2 or 3 fell strokes to get the data of our target area.

In [3]:
frames = [df_2015, df_2016, df_2017, df_2018, df_2019]
pre_final = pd.concat(frames)

In [4]:
pre_final.shape

(643545, 15)

In [5]:
pre_final.head()

Unnamed: 0,latitude,longitude,brightness,scan,track,acq_date,acq_time,satellite,instrument,confidence,version,bright_t31,frp,daynight,type
0,19.4104,-155.2771,306.4,1.1,1.1,2015-01-01,830,Terra,MODIS,68,6.2,284.0,12.1,N,2
1,19.4425,-155.0047,324.1,1.1,1.0,2015-01-01,830,Terra,MODIS,100,6.2,286.0,29.0,N,2
2,19.4601,-154.9925,313.0,1.1,1.0,2015-01-01,830,Terra,MODIS,86,6.2,288.0,16.7,N,2
3,19.4087,-155.2876,309.8,1.1,1.1,2015-01-01,830,Terra,MODIS,78,6.2,284.0,14.8,N,2
4,41.6333,-87.1361,301.0,1.9,1.3,2015-01-01,1717,Terra,MODIS,33,6.2,270.7,22.7,D,2


## Data Filtering

In [6]:
#mask to limit our dataset to latitudes between 38.0881 and 40.8366

pre_final_2 = pre_final[(pre_final['latitude'] >= 38.0881) & (pre_final['latitude'] <= 40.8336)] 

In [7]:
#mask to limit our dataset to longitudes between -123.1208 & -120.2933
pre_final_3 = pre_final_2[(pre_final_2['longitude'] >= -123.1208) & (pre_final_2['longitude'] <= -120.2933)]

In [10]:
#mask to only give us fire instances with a given confidence level from the Satellite of 75%
final_wf_df = pre_final_3[(pre_final_3['confidence'] >= 75)]

In [11]:
final_wf_df.shape


(10896, 15)

In [None]:
final_wf