# Reproduce Staley (2016) Logistic Regression Results

Based on the calculations in the USGS Emergency Debris Flow Assessment database (https://landslides.usgs.gov/hazards/postfire_debrisflow/), the Intensity I15 is transformed into the total accumulation during the 15 minutes, i.e. I15/4.  This is the same as the value stored in the column "Acc015_mm".

In [1]:
import pandas as pd
pd.set_option("max_colwidth", None)

In [2]:
import sklearn
import numpy as np

In [5]:
xl=pd.ExcelFile("ofr20161106_appx-1.xlsx")
desc=xl.parse(xl.sheet_names[0])
modelData=xl.parse(xl.sheet_names[1])

In [6]:
modelData.columns

Index(['Fire Name', 'Year', 'Fire_ID', 'Fire_SegID', 'Database', 'State',
       'UTM_Zone', 'UTM_X', 'UTM_Y', 'Response', 'StormDate', 'GaugeDist_m',
       'StormStart', 'StormEnd', 'StormDur_H', 'StormAccum_mm',
       'StormAvgI_mm/h', 'Peak_I15_mm/h', 'Peak_I30_mm/h', 'Peak_I60_mm/h',
       'ContributingArea_km2', 'PropHM23', 'dNBR/1000', 'KF', 'Acc015_mm',
       'Acc030_mm', 'Acc060_mm'],
      dtype='object')

In [7]:
from sklearn.neighbors import KNeighborsClassifier, DistanceMetric
from sklearn.metrics import accuracy_score, f1_score, jaccard_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier, DistanceMetric

In [8]:
def get_scores(trues, preds):
    scores=[func(trues, preds) for func in [accuracy_score, jaccard_score, f1_score]]
    return scores

def get_scoredf(TrTr, TrPr, TeTr, TePr):
    train_scores=get_scores(TrTr, TrPr)
    test_scores=get_scores(TeTr, TePr)
    
    scoredf=pd.DataFrame({"Training": train_scores, "Test": test_scores}, index=["Accurary", "Jaccard", "F1"])
    return scoredf

In [9]:
modelData.columns

Index(['Fire Name', 'Year', 'Fire_ID', 'Fire_SegID', 'Database', 'State',
       'UTM_Zone', 'UTM_X', 'UTM_Y', 'Response', 'StormDate', 'GaugeDist_m',
       'StormStart', 'StormEnd', 'StormDur_H', 'StormAccum_mm',
       'StormAvgI_mm/h', 'Peak_I15_mm/h', 'Peak_I30_mm/h', 'Peak_I60_mm/h',
       'ContributingArea_km2', 'PropHM23', 'dNBR/1000', 'KF', 'Acc015_mm',
       'Acc030_mm', 'Acc060_mm'],
      dtype='object')

In [10]:
usecols=["Acc015_mm",
         "PropHM23",
         "dNBR/1000",
         "KF",
         "Response"]

usecols = usecols + ["Database"]

In [11]:
cdata=modelData[usecols].copy()
cdata=cdata.dropna()
len(cdata)

1243

Adjusting unrealistic $K_f$ values - commented out, as it's not sure if Staley did this:

In [12]:
#mask=cdata["KF"] > 0.64
#cdata.loc[mask,"KF"] = cdata["KF"].median()

In [13]:
#computing input data just as in Staley'16

cdata["PropHM23_x_i15"] = cdata["PropHM23"] * cdata["Acc015_mm"]
cdata["dNBR_x_i15"] = cdata["dNBR/1000"] * cdata["Acc015_mm"]
cdata["KF_x_i15"] = cdata["KF"] * cdata["Acc015_mm"]

In [14]:
usecols2=["PropHM23_x_i15","dNBR_x_i15", "KF_x_i15"]

In [15]:
trainX=cdata.query("Database == 'Training'")[usecols2]
trainY=cdata.query("Database == 'Training'")["Response"]

testX=cdata.query("Database == 'Test'")[usecols2]
testY=cdata.query("Database == 'Test'")["Response"]

In [16]:
clfl = LogisticRegression(random_state=0, penalty='l2').fit(trainX, trainY)
trainYp=clfl.predict(trainX)
testYp=clfl.predict(testX)

scoredf_lr=get_scoredf(trainY, trainYp, testY, testYp)
scoredf_lr

Unnamed: 0,Training,Test
Accurary,0.833129,0.647196
Jaccard,0.411255,0.386179
F1,0.582822,0.557185


In [17]:
clfl.coef_[0]

array([0.41588397, 0.65929292, 0.67975971])

In [18]:
params=pd.DataFrame({"Coefficient": clfl.coef_[0]}, index=clfl.feature_names_in_)
params.loc["Intercept", "Coefficient"] = clfl.intercept_
params

Unnamed: 0,Coefficient
PropHM23_x_i15,0.415884
dNBR_x_i15,0.659293
KF_x_i15,0.67976
Intercept,-3.606339


The thread score (a.k.a. Jaccard score) of 0.386 is identical to the score of 0.39 reported by Staley.

In [19]:
clfl.intercept_, clfl.coef_

(array([-3.60633884]), array([[0.41588397, 0.65929292, 0.67975971]]))

The intercept and the coefficients are similar, though not identical to those reported by Staley, which were
-3.63, 0.41, 0.67, and 0.70, respectively.