The dataset contains information about the Dollar Damage, Size, Location, and some information about the climate when the fire occured. Because it has information about Dollar Damage, which is what I aim to predict, this will be a supervised model.

After doing some data exploration, I found that there is for too much variance in Dollar Damage for any kind of regression model to be reliable. So instead, I have split the costs into classes, similar to how wildfire sizes are split into classes.

| Class A | Class B | Class C | Class D  | Class E |
|---------|---------|---------|----------|---------|
| < \$100  | \$100 - \$500  | \$500 - \$2,500 | $2,500 - \$10,000  | > \$10,000 |

These partitions were chosen so that each class makes up approximately one fifth of the dataset. This was done to prevent any one class from being so common that the model just predicts it every time.

With the data split into classes, a Neural Network model can be used to predict which class the fire is likely to be.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import classification_report, confusion_matrix, multilabel_confusion_matrix
from sklearn.metrics import mean_squared_error, accuracy_score, precision_score, recall_score
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor

In [4]:
dataset = pd.read_csv("Dollar Damage.csv")
dataset.dropna(inplace=True)

In [5]:
targetFeatures = ["Cost Class"]
inputFeatures = ["Approximate Size (Acres)", "Approximate Latitude", "Approximate Longitude", "Average Temperature In Year In County", "Average Precipitation In Year In County"]

X = dataset[inputFeatures]
y = dataset[targetFeatures]

scaler = MinMaxScaler(feature_range=(0, 1))
X_rescaled = scaler.fit_transform(X)
X = pd.DataFrame(data = X_rescaled, columns = X.columns)

categories = [['A', 'B', 'C', 'D', 'E']]
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(dataset["Cost Class"])

print("Pre-processed data :")
print(X)

print("Pre-processed class :")
print(y)

Pre-processed data :
      Approximate Size (Acres)  Approximate Latitude  Approximate Longitude  \
0                          0.0              0.529839               0.259252   
1                          0.0              0.621105               0.417000   
2                          0.0              0.761188               0.298177   
3                          0.0              0.591350               0.428583   
4                          0.0              1.000000               0.000000   
...                        ...                   ...                    ...   
1478                       1.0              0.210125               0.977571   
1479                       1.0              0.000000               0.903008   
1480                       1.0              0.270632               0.442858   
1481                       1.0              0.367306               0.651723   
1482                       1.0              0.520119               0.373262   

      Average Temperature In Y

Hyperparameters were chosen after applying the Grid Search Algorithm.

In [6]:
import warnings
from sklearn.exceptions import ConvergenceWarning
warnings.filterwarnings("ignore", category=ConvergenceWarning)

data_train, data_test, class_train, class_test = train_test_split(X, y, test_size=0.3)

mlp = MLPClassifier(solver = 'lbfgs', activation = 'logistic',
                    learning_rate_init = 0.2, batch_size = 10, hidden_layer_sizes = (14, 9), max_iter = 200)

mlp.fit(data_train, class_train)

pred = mlp.predict(data_test)

print("Accuracy : ", accuracy_score(class_test, pred))
print("Mean Square Error : ", mean_squared_error(class_test, pred))

pred

Accuracy :  0.42921348314606744
Mean Square Error :  1.8224719101123596


array([3, 2, 1, 0, 2, 0, 4, 0, 1, 4, 0, 2, 1, 3, 3, 1, 4, 0, 4, 1, 1, 1,
       4, 4, 0, 0, 0, 0, 1, 4, 4, 4, 4, 1, 3, 3, 4, 4, 4, 4, 0, 0, 2, 4,
       2, 4, 2, 0, 2, 4, 0, 4, 0, 0, 4, 0, 2, 0, 0, 0, 4, 4, 4, 2, 0, 0,
       1, 0, 1, 4, 4, 1, 0, 1, 4, 3, 4, 3, 4, 0, 0, 0, 4, 1, 1, 2, 1, 3,
       0, 1, 4, 4, 4, 1, 1, 0, 2, 1, 0, 4, 1, 1, 1, 1, 4, 1, 4, 4, 2, 2,
       0, 1, 4, 2, 4, 3, 2, 1, 0, 4, 2, 1, 4, 3, 4, 0, 0, 2, 0, 4, 1, 2,
       3, 0, 4, 1, 4, 2, 2, 1, 4, 0, 3, 1, 1, 0, 1, 4, 4, 4, 0, 0, 1, 1,
       1, 2, 0, 1, 2, 4, 4, 1, 2, 2, 4, 4, 4, 0, 1, 4, 0, 0, 1, 4, 0, 4,
       1, 1, 0, 3, 2, 2, 2, 3, 0, 4, 1, 4, 2, 0, 0, 0, 0, 4, 4, 0, 1, 1,
       0, 1, 2, 4, 1, 3, 1, 4, 4, 1, 0, 4, 3, 1, 0, 0, 2, 4, 3, 2, 1, 4,
       0, 4, 4, 0, 1, 4, 0, 1, 0, 2, 1, 2, 1, 1, 4, 4, 3, 3, 2, 1, 0, 4,
       0, 0, 1, 2, 1, 4, 4, 1, 4, 4, 0, 3, 0, 1, 0, 3, 2, 2, 4, 1, 0, 0,
       4, 1, 1, 4, 4, 1, 2, 3, 4, 0, 3, 4, 2, 0, 0, 4, 1, 3, 0, 1, 4, 4,
       0, 1, 0, 0, 1, 0, 1, 1, 4, 1, 1, 4, 2, 4, 0,

In [7]:
print("Weights of the neural network:")
for i, coef in enumerate(mlp.coefs_):
    print(f"Layer {i}:")
    print(coef)

Weights of the neural network:
Layer 0:
[[ -2.79752657   7.59271334   8.1867471   22.82337216 -33.93590325
   -1.03318483  25.75221129 -14.00994495 -11.59485717  15.6373161
  -28.62693651  17.01538701  17.83477055 -10.82617863]
 [ -3.04177843  -1.76559805  -0.06762998   1.20035585   3.30132877
   -1.15914823   2.27504115   2.47355143  -6.58278079   4.52413099
   -2.04999341   2.13125154  -4.76720286  -1.29893221]
 [ -4.64911977   1.42297829  -1.03710965  -1.77969873  -4.06588982
    1.15352245  -0.2922035   -7.29305131   6.21659874  -3.04165122
    0.97465543  -0.3610863    3.94668276  -1.30781773]
 [  3.76131028  -0.28908377  -4.96391339   0.09975069  -2.80785907
   -0.8559079   -2.6845838   -2.99408479   5.70037227  -0.23674395
   -2.61366992   0.2677442    3.23585988  -1.62301765]
 [ -2.16479944   1.72842271   2.18397007   0.54444074  -0.41417801
    2.00210011   1.08576246  -1.97546656   2.52037798   1.65763446
   -2.88867301   3.73310784   0.78281481  -2.32126845]]
Layer 1:
[[  8.

In [8]:
print("Confusion Matrix for each label : ")

classNum = 0
for matrix in multilabel_confusion_matrix(class_test, pred):
  print(f"Class: {categories[0][classNum]}")
  classNum += 1

  tn, fp, fn, tp = matrix.ravel()
  print(f"         Actual Positive | Actual Negative")
  print(f"-------------------|-----|----------------")
  print(f"Predicted Positive | {str(tp).rjust(3)} | {str(fp).rjust(3)}")
  print(f"-------------------|-----|----------------")
  print(f"Predicted Negative | {str(fn).rjust(3)} | {str(tn).rjust(3)}")
  print(f"TP: {tp}, FP: {fp}, FN: {fn}, TN: {tn}\n")

print("Classification Report : ")
print(classification_report(class_test, pred))

Confusion Matrix for each label : 
Class: A
         Actual Positive | Actual Negative
-------------------|-----|----------------
Predicted Positive |  51 |  70
-------------------|-----|----------------
Predicted Negative |  38 | 286
TP: 51, FP: 70, FN: 38, TN: 286

Class: B
         Actual Positive | Actual Negative
-------------------|-----|----------------
Predicted Positive |  32 |  75
-------------------|-----|----------------
Predicted Negative |  54 | 284
TP: 32, FP: 75, FN: 54, TN: 284

Class: C
         Actual Positive | Actual Negative
-------------------|-----|----------------
Predicted Positive |  14 |  40
-------------------|-----|----------------
Predicted Negative |  76 | 315
TP: 14, FP: 40, FN: 76, TN: 315

Class: D
         Actual Positive | Actual Negative
-------------------|-----|----------------
Predicted Positive |   9 |  28
-------------------|-----|----------------
Predicted Negative |  46 | 362
TP: 9, FP: 28, FN: 46, TN: 362

Class: E
         Actual Positive 

In [9]:
def gridSearch():
  max_iterations = 200 * np.arange(1,3)
  hidden_layer_siz = [
      (10, 1), (11, 1), (12, 1), (13, 1), (14, 1), (15, 1), (16, 1), (17, 1), (18, 1), (19, 1), (20, 1),
      (10, 2), (11, 2), (12, 2), (13, 2), (14, 2), (15, 2), (16, 2), (17, 2), (18, 2), (19, 2), (20, 2),
      (10, 3), (11, 3), (12, 3), (13, 3), (14, 3), (15, 3), (16, 3), (17, 3), (18, 3), (19, 3), (20, 3),
      (10, 4), (11, 4), (12, 4), (13, 4), (14, 4), (15, 4), (16, 4), (17, 4), (18, 4), (19, 4), (20, 4),
      (10, 5), (11, 5), (12, 5), (13, 5), (14, 5), (15, 5), (16, 5), (17, 5), (18, 5), (19, 5), (20, 5),
      (10, 6), (11, 6), (12, 6), (13, 6), (14, 6), (15, 6), (16, 6), (17, 6), (18, 6), (19, 6), (20, 6),
      (10, 7), (11, 7), (12, 7), (13, 7), (14, 7), (15, 7), (16, 7), (17, 7), (18, 7), (19, 7), (20, 7),
      (10, 8), (11, 8), (12, 8), (13, 8), (14, 8), (15, 8), (16, 8), (17, 8), (18, 8), (19, 8), (20, 8),
      (10, 9), (11, 9), (12, 9), (13, 9), (14, 9), (15, 9), (16, 9), (17, 9), (18, 9), (19, 9), (20, 9),
      (10, 10), (11, 10), (12, 10), (13, 10), (14, 10), (15, 10), (16, 10), (17, 10), (18, 10), (19, 10), (20, 10)
  ]

  learning_rates = 0.1 * np.arange(1, 3)

  param_grid = dict(learning_rate_init = learning_rates, hidden_layer_sizes = hidden_layer_siz, max_iter = max_iterations)
  # set model
  mlp = MLPClassifier(solver = 'lbfgs', activation = 'logistic',
                      learning_rate_init = 0.2, batch_size = 10, hidden_layer_sizes = (14, 9), max_iter = 200)

  # For Grid Search
  grid = GridSearchCV(estimator = mlp, param_grid = param_grid)

  # For Random Search
  # grid = RandomizedSearchCV(estimator = mlp, param_distributions = param_grid, n_iter = 10)

  grid.fit(X,y)

  print("Optimal Hyper-parameters : ", grid.best_params_)
  print("Optimal Accuracy : ", grid.best_score_)

#gridSearch()

In [10]:
in_h1_weights = [
    [-2.79752657, 7.59271334, 8.1867471, 22.82337216, -33.93590325, -1.03318483, 25.75221129, -14.00994495, -11.59485717, 15.6373161, -28.62693651, 17.01538701, 17.83477055, -10.82617863],
    [-3.04177843, -1.76559805, -0.06762998, 1.20035585, 3.30132877, -1.15914823, 2.27504115, 2.47355143, -6.58278079, 4.52413099, -2.04999341, 2.13125154, -4.76720286, -1.29893221],
    [-4.64911977, 1.42297829, -1.03710965, -1.77969873, -4.06588982, 1.15352245, -0.2922035, -7.29305131, 6.21659874, -3.04165122, 0.97465543, -0.3610863, 3.94668276, -1.30781773],
    [3.76131028, -0.28908377, -4.96391339, 0.09975069, -2.80785907, -0.8559079, -2.6845838, -2.99408479, 5.70037227, -0.23674395, -2.61366992, 0.2677442, 3.23585988, -1.62301765],
    [-2.16479944, 1.72842271, 2.18397007, 0.54444074, -0.41417801, 2.00210011, 1.08576246, -1.97546656, 2.52037798, 1.65763446, -2.88867301, 3.73310784, 0.78281481, -2.32126845]
]

h1_h2_weights = [
    [8.67458041, -5.27804294, 5.83462201, -20.43433028, 17.24188817, 18.23829326, 5.35591552, 10.51509487, -9.1273044],
    [-17.32373067, 18.17520612, -1.34150004, 17.19329046, -17.74199433, -15.99170409, -9.31073843, -9.2908081, 14.35362486],
    [5.2843947, -0.71757108, 6.6239344, -20.02125961, 16.01517493, 17.86557641, 3.48908361, 7.55757965, -5.24750163],
    [-14.59277987, 17.29428561, -0.5003475, 12.16526511, -14.96341468, -12.91869762, -7.27799019, -6.4656133, 13.04926496],
    [5.8503572, -6.16049599, -4.43631975, -8.21980866, 8.22887139, 16.69894279, 1.17911154, 7.87468932, -12.96819138],
    [13.52203354, -20.66082543, 1.16690876, -13.7672814, 12.49462291, 16.01026328, 6.43302721, 4.56348477, -15.2380759],
    [-11.73238866, 9.53555925, -1.89082434, 13.58643967, -16.01508179, -13.97526953, -5.51560546, -9.35468979, 10.58992491],
    [-12.63232827, 10.89529658, -6.33362123, 20.96464446, -19.06043045, -14.37991464, -8.07040021, -4.21005953, 3.34529503],
    [7.11017843, -8.81021813, -0.37361308, -14.29762269, 14.39808425, 12.31967206, 3.3635555, 5.01045695, -9.05111866],
    [-4.8484806, 9.05036276, 2.52930578, 1.57428098, -4.32084008, -1.27899243, -3.27380657, 0.2712458, 0.47782956],
    [-0.77486745, -8.54788242, -5.44013951, 5.39469059, -3.70009698, 1.18083617, -1.04047524, -3.11373537, -4.33872957],
    [2.93792209, -0.60307425, 1.88719943, -8.86642404, 5.93562587, 6.87983759, 1.83717431, 2.25556549, -2.89639578],
    [-12.21511657, 7.60591765, -1.69577967, 20.69896663, -20.08666351, -18.10271039, -6.67642113, -11.74725999, 9.46611251],
    [-10.01182236, 4.95538565, -4.67631441, 19.1844779, -17.82026676, -12.53877679, -5.3965819, -5.97266516, 3.4058362]
]

h2_out_weights = [
    [-17.0165142, -13.4486551, -3.55136379, -2.90748329, 37.0649957],
    [7.9798775, 2.41200991, -4.74868442, -3.48916473, -1.61660169],
    [-7.4041459, -7.81822858, 0.998689176, 2.4160574, 10.6512422],
    [7.09587868, 4.45170098, 0.188676466, 0.487402955, -12.310259],
    [5.01292682, 7.2346464, 12.4366904, 1.50213155, -25.5040746],
    [18.6317384, 14.9077309, 5.88398535, -6.77580514, -32.561061],
    [0.00585064702, 2.54040256, 0.125207275, 0.348220119, -2.99556539],
    [-2.88936646, 2.14464445, -6.89806136, -3.5797399, 10.7445089],
    [-17.6486625, -6.33747105, 3.54200847, 5.96305149, 14.7853068]
]

In [12]:
warnings.filterwarnings("ignore", category=RuntimeWarning)

def sigmoid(num):
  return (1/(1+np.exp(-num)))

size = 5
latitude = 38.544907
longitude = -121.740517
temperature = 50.3
precipitation = 3.64

inputs = [size, latitude, longitude, temperature, precipitation]
hidden1 = [0]*14
hidden2 = [0]*9
outputs = [0]*5

for i in range(14):
  nodeInput = 0
  for j in range(5):
    nodeInput += inputs[j] * in_h1_weights[j][i]
  hidden1[i] = sigmoid(nodeInput)

for i in range(9):
  nodeInput = 0
  for j in range(14):
    nodeInput += hidden1[j] * h1_h2_weights[j][i]
  hidden2[i] = sigmoid(nodeInput)

for i in range(5):
  nodeInput = 0
  for j in range(9):
    nodeInput += hidden2[j] * h2_out_weights[j][i]
  outputs[i] = sigmoid(nodeInput)

print("The values in the output layer are:")
print(outputs)
pickedClass_val = 0
for i in range(5):
  if(outputs[i] > pickedClass_val):
    pickedIndex = i
    pickedClass_val = outputs[i]

costClasses = ['less than $100', '$100 to $500', '$500 to $2,500', '$2,500 to $10,000', 'more than $10,000']
print(f"The fire is predicted to cost {costClasses[pickedIndex]}")

The values in the output layer are:
[0.00030304652126898366, 0.013253709834425196, 0.08844878242158305, 0.8308106451591845, 0.9999964632556898]
The fire is predicted to cost more than $10,000
