# Initial Satelitte Data retrieval

The following dataset was gathered from [NASA FIRMs website](https://firms.modaps.eosdis.nasa.gov/download/) and encases all fire anomalies between 2015 and 2019 in Northern California. The initial data cleaning that follows will narrow down the scope of our search to Northern California using the proper longitute and latitude ranges comprising a square area of approximately 80,000 km^2. All anomalies contained in the final dataframe should be over land, and also with a confidence rating of over 75%. This confidence rating is a measurement of how sure that the satellite succesfully detected a fire anomaly. It should be noted that not all fire anomalies equate to wildfires.

The resulting dataframe we will use to query our Google API to retrieve satellite images that have experienced fires over the last 5 years. We will then try to use these images to build a CNN that is able to determine the probability that an area has experience a wildfire, and thus when fed a test image is able to determine a probability that this area will also experience a wildfire event.

# Importing Neccesary Libraries and Packages


In [1]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import requests
import random

import urllib.request
import warnings
warnings.filterwarnings('ignore')

from tqdm import tqdm
import os

The following csv's were downloaded from https://firms.modaps.eosdis.nasa.gov/country/. This archive contains archives of all fire anomalies recorded by the Modis intsrument satellites over the entire earth. To get each relevant dataset I merely selected the year, and the country, the United States, in which our target area (Northern California) was located. Thus each dataset you see below contains all the fire anomalies recorded over the US for each labeled year.

In [2]:
df_2015 = pd.read_csv('../data/modis_2015_United_States.csv')
df_2016 = pd.read_csv('../data/modis_2016_United_States.csv')
df_2017 = pd.read_csv('../data/modis_2017_United_States.csv')
df_2018 = pd.read_csv('../data/modis_2018_United_States.csv')
df_2019 = pd.read_csv('../data/modis_2019_United_States.csv')

Let's condense all of our dataframes into a single one so we can perform the proper masks in 2 or 3 fell strokes to get the data of our target area.

In [3]:
frames = [df_2015, df_2016, df_2017, df_2018, df_2019]
pre_final = pd.concat(frames)

In [4]:
pre_final.shape

(643545, 15)

In [5]:
pre_final.head()

Unnamed: 0,latitude,longitude,brightness,scan,track,acq_date,acq_time,satellite,instrument,confidence,version,bright_t31,frp,daynight,type
0,19.4104,-155.2771,306.4,1.1,1.1,2015-01-01,830,Terra,MODIS,68,6.2,284.0,12.1,N,2
1,19.4425,-155.0047,324.1,1.1,1.0,2015-01-01,830,Terra,MODIS,100,6.2,286.0,29.0,N,2
2,19.4601,-154.9925,313.0,1.1,1.0,2015-01-01,830,Terra,MODIS,86,6.2,288.0,16.7,N,2
3,19.4087,-155.2876,309.8,1.1,1.1,2015-01-01,830,Terra,MODIS,78,6.2,284.0,14.8,N,2
4,41.6333,-87.1361,301.0,1.9,1.3,2015-01-01,1717,Terra,MODIS,33,6.2,270.7,22.7,D,2


## Data Filtering

In [6]:
#mask to limit our dataset to latitudes between 38.0881 and 40.8366

pre_final_2 = pre_final[(pre_final['latitude'] >= 38.0881) & (pre_final['latitude'] <= 40.8336)] 

In [7]:
#mask to limit our dataset to longitudes between -123.1208 & -120.2933
pre_final_3 = pre_final_2[(pre_final_2['longitude'] >= -123.1208) & (pre_final_2['longitude'] <= -120.2933)]

In [8]:
#mask to only give us fire instances with a given confidence level from the Satellite of 75%
final_wf_df = pre_final_3[(pre_final_3['confidence'] >= 75)]

In [9]:
final_wf_df.shape


(10896, 15)

In [10]:
final_wf_df.head()

Unnamed: 0,latitude,longitude,brightness,scan,track,acq_date,acq_time,satellite,instrument,confidence,version,bright_t31,frp,daynight,type
383,38.8901,-122.9681,322.2,1.3,1.1,2015-01-07,2137,Aqua,MODIS,81,6.2,294.9,28.3,D,0
384,38.8884,-122.9837,321.4,1.3,1.1,2015-01-07,2137,Aqua,MODIS,81,6.2,293.4,26.8,D,0
851,39.1576,-120.6349,322.6,3.5,1.8,2015-01-12,2156,Aqua,MODIS,82,6.2,278.8,140.0,D,0
909,39.9387,-120.7503,327.4,1.1,1.0,2015-01-13,2101,Aqua,MODIS,85,6.2,276.4,34.5,D,0
911,39.934,-120.7438,332.2,1.1,1.0,2015-01-13,2101,Aqua,MODIS,88,6.2,277.7,40.3,D,0


In [11]:
final_wf_df.rename(columns={'latitude':'lat', 
                         'longitude':'lon',
                         'acq_date':'date'}, inplace = True) #renaming to reduce my own confusion when we go to query the google api
                                                            

# Setting up for our Google Static Map API Query

Below you will notice I have reduced the final dataframe to include the data, latitude, and longitude components. And then the creation of a new column, centered, which contains a combined tuple of latitude and longitude for a given fire instance. You may notice when we go to query the google api that a query for the date is not included. This is because the Google static map api does not allow you to retrieve historical satellite images, only its most recent image for the given area queried. At the beginning of this project my intention was to query the NASA Earth api to retrieve historical satellite images of the day of the fire instance. But the images retrieved were problematic and of low resolution, thus not very valuable when it comes to training a Convoluted Neural Network.

However I have decided to keep the dates of fire instances included for future work when this obstacle is overcome. The corresponding issues of training a CNN model with non historical satellite images for the day of recorded fire instances will be addressed in the attached ReadMe. Also what this means for model interpretability will also be addressed.

In [12]:
df_fire_final= final_wf_df[['date','lat','lon']] 

In [13]:
#The data for our columns must be converted to strings for when we go to query our api.
# our center column is created that creates a combined latitude, longitude tuple.

df_fire_final['date'] = df_fire_final['date'].astype(str) 
df_fire_final['lon'] = df_fire_final['lon'].astype(str) 
df_fire_final['lat'] = df_fire_final['lat'].astype(str)
df_fire_final['center']= df_fire_final[['lat','lon']].agg(','.join, axis = 1)

In [14]:
df_fire_final.head()

Unnamed: 0,date,lat,lon,center
383,2015-01-07,38.8901,-122.9681,"38.8901,-122.9681"
384,2015-01-07,38.8884,-122.9837,"38.8884,-122.9837"
851,2015-01-12,39.1576,-120.6349,"39.1576,-120.6349"
909,2015-01-13,39.9387,-120.7503,"39.9387,-120.7503"
911,2015-01-13,39.934,-120.7438,"39.934,-120.7438"


In [15]:
df_fire_final.dtypes

date      object
lat       object
lon       object
center    object
dtype: object

## Getting the satelitte imagery
We now have all the fire instances we need to get all the images we need when we query nasa's LandSAT api.

## Setting up download request

In [16]:
img_size = '350x350' # 

img_format = 'jpg' #

map_scale = '1' # For scale parameter. 

maptype = 'satellite' #

zoom = '15' 

In [17]:
key = open('../google_api/google_key.txt', 'r').read()

In [19]:
a = 'https://maps.googleapis.com/maps/api/staticmap?' # Base
b = 'center=' # Center 
# Enter Center
c = '&zoom=' # Zoom
# Enter Zoom
d = '&maptype=satellite' # Map type 
# No need to enter maptype - just keep satellite default
e = '&size=' # Image Size
# Enter image size
f = '&key='
# Enter key

# Creating the URL:
url1 = a + b
url2 = c + zoom + d + e + img_size + f + key


In [20]:
df_fire_final.shape
df_fire_final

Unnamed: 0,date,lat,lon,center
383,2015-01-07,38.8901,-122.9681,"38.8901,-122.9681"
384,2015-01-07,38.8884,-122.9837,"38.8884,-122.9837"
851,2015-01-12,39.1576,-120.6349,"39.1576,-120.6349"
909,2015-01-13,39.9387,-120.7503,"39.9387,-120.7503"
911,2015-01-13,39.934,-120.7438,"39.934,-120.7438"
...,...,...,...,...
104058,2019-12-04,40.8083,-122.6204,"40.8083,-122.6204"
104060,2019-12-04,38.5259,-120.4962,"38.5259,-120.4962"
105208,2019-12-06,38.872,-120.6605,"38.872,-120.6605"
105209,2019-12-06,38.5313,-120.5913,"38.5313,-120.5913"


In [None]:
i = 0 
with tqdm(total=df_fire_final.shape[0]) as pbar:

    for index, row in df_fire_final.iterrows():
        url= url1 + row['center'] + url2
        urllib.request.urlretrieve(url, os.path.join(os.path.pardir,'images','fire_images',)\
                                    + row['center']
                                    +'.jpg')
        pbar.update(1)

 51%|█████     | 5555/10896 [18:55<25:08,  3.54it/s]  

In [21]:
non_fire_size= 10000

df_fire_final['lat'] = df_fire_final.lat.astype(float) #to get randomized lats and lons we need to convert back to floats
df_fire_final['lon'] = df_fire_final.lon.astype(float)

new_lat = np.random.uniform(low= min(df_fire_final.lat),
                            high = max(df_fire_final.lat),  #randomizing our new coordinated between the square area we
                            size= (non_fire_size,))         # retrieved the wildfire images from

new_lon = np.random.uniform(low = min(df_fire_final.lon),
                            high = max(df_fire_final.lon),
                            size=(non_fire_size,))

new_coordinates= {'lat':new_lat,'lon':new_lon}

df_non_fire = pd.DataFrame(data = new_coordinates)


df_non_fire['lat'] = df_non_fire['lat'].astype(str)  #converting our new coordinates to string so we can use them when we call
df_non_fire['lon'] = df_non_fire['lon'].astype(str)  # on our API
df_non_fire['center'] = df_non_fire[['lat', 'lon']].agg(','.join, axis = 1) #column containing tuple of generated latitude and longitude



df_fire_final['lat'] = df_fire_final.lat.astype(str)
df_fire_final['lon'] = df_fire_final.lon.astype(str)

df_non_fire.head()

Unnamed: 0,lat,lon,center
0,40.2086159629523,-120.41976195821712,"40.2086159629523,-120.41976195821711"
1,40.49131352397505,-122.28765191516544,"40.49131352397505,-122.28765191516543"
2,38.83618917065366,-122.9333449538305,"38.83618917065366,-122.9333449538305"
3,38.75950220279785,-121.02896773543937,"38.75950220279785,-121.02896773543937"
4,40.265522585702456,-120.62407977635011,"40.265522585702456,-120.62407977635011"


In [22]:
os.path.join(os.path.pardir,'images','non_fire',)

'..\\images\\non_fire'

In [None]:
#i = 0 
with tqdm(total=df_non_fire.shape[0]) as pbar:

    for index, row in df_non_fire.iterrows():
        url= url1 + row['center'] + url2
        urllib.request.urlretrieve(url, os.path.join(os.path.pardir,'images','non_fire',)\
                                    + row['center']
                                    +'.jpg')
            
        pbar.update(1)

In [23]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D , MaxPool2D , Flatten , Dropout , BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
from keras.callbacks import ReduceLROnPlateau

In [None]:
labels = ['fire', 'no_fire']
img_size = 150
def get_training_data(data_dir):
    data = [] 
    for label in labels: 
        path = os.path.join(data_dir, label)
        class_num = labels.index(label)
        for img in os.listdir(path):
            try:
                img_arr = cv2.imread(os.path.join(path, img))
                resized_arr = cv2.resize(img_arr, (img_size, img_size)) # Reshaping images to preferred size
                data.append([resized_arr, class_num])
            except Exception as e:
                print(e)
    return np.array(data)

In [None]:
train = get_training_data('../images/wf_images/train')
test = get_training_data('../images/wf_images/test')
val = get_training_data('../images/wf_images/val')

In [None]:
classification_check = []
for i in train:
    if(i[1]==0):
        classification_check.append('Fire')
    else:
        classification_check.append('No Fire')
sns.set_style('darkgrid')
sns.countplot(classification_check)

In [None]:
class_check_2 = []
for i in val:
    if(i[1]==0):
        class_check_2.append('Fire')
    else:
        class_check_2.append('No Fire')
sns.set_style('darkgrid')
sns.countplot(class_check_2)

In [None]:
plt.figure(figsize = (5,5))
plt.imshow(train[0][0])
plt.title(labels[train[0][1]])

plt.figure(figsize = (5,5))
plt.imshow(train[-1][0])
plt.title(labels[train[-1][1]])

In [None]:
x_train = []
y_train = []

x_val = []
y_val = []

x_test = []
y_test = []

for feature, label in train:
    x_train.append(feature)
    y_train.append(label)

for feature, label in test:
    x_test.append(feature)
    y_test.append(label)
    
for feature, label in val:
    x_val.append(feature)
    y_val.append(label)

In [None]:
train.shape

In [None]:
# Normalize the data
x_train = np.array(x_train) / 255
x_val = np.array(x_val) / 255
x_test = np.array(x_test) / 255

In [None]:
# resize data for deep learning 
x_train = x_train.reshape(-1, img_size, img_size, 3)
y_train = np.array(y_train)

x_val = x_val.reshape(-1, img_size, img_size, 3)
y_val = np.array(y_val)

x_test = x_test.reshape(-1, img_size, img_size, 3)
y_test = np.array(y_test)

In [None]:
model = Sequential()
model.add(Conv2D(32, (3,3) , strides =1, padding ='same', activation = 'relu', input_shape = (150,150,3)))
model.add(MaxPool2D((2,2) , strides =2, padding ='same'))
model.add(Conv2D(64, (3,3), strides =1, padding = 'same', activation = 'relu'))
model.add(Dropout(0.1))
model.add(MaxPool2D((2,2), strides = 2, padding = 'same'))
model.add(Conv2D(64, (3,3), strides =1, padding = 'same', activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(128 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(Dropout(0.2))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Flatten())
model.add(Dense(units = 128 , activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(units = 1 , activation = 'sigmoid'))
model.compile(optimizer = "rmsprop" , loss = 'binary_crossentropy' , metrics = ['accuracy'])
model.summary()

In [None]:
history = model.fit(x_train, y_train, batch_size = 100, epochs= 40, verbose = 1, validation_data = (x_val, y_val))

In [None]:
y_pred= model.predict_classes(x_test)

In [None]:
print(classification_report(y_test, y_pred)) #model 1 performance