# Tree classifier comparison

Notebook to test out dififerent tree-based classifers and other classifiers on the team's "MLTable1".

The table consist of satellite detections (NASA FIRMS detections) considered as instances  which were merged with weather and soil data from the USDA SCAN network.This table was then merged with WFIGS  actual records of wildifres based on initial coordinates and date the fires started. Irrelevant columns were dropped and the table was cleaned of missing values.

Most FIRMS detections were not associated with a major fire. The ML activities are in attempt to train a model to predict if a FIRMS detection will lead to a fire. 

This will support the team's goal **“Given that a FIRMS (Fire Information for Resource Management System) detection is found, how might we predict if the detection will turn into a wildfire by referencing historical WFIGS (Wildland Fire
Interagency Geospatial Services) data?”.**

## Sections of notebook:

#### Load the Data
#### Prepare the Data
#### ML Algorithms
- Random Forest
- Extra Trees Classifier 
- Bagging Classifier 
- Decision Trees

In [1]:
#general imports to start
#general imports
import pandas as pd
import boto3
import geopy.distance  
import numpy as np
from sklearn.ensemble import BaggingClassifier, ExtraTreesClassifier, RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import train_test_split

pd.set_option('display.max_columns', 200)
%matplotlib inline 

### Load the Data

In [2]:
#load in the csvs
#TODO For Team: enter the credentails below to run
S3_Key_id=''
S3_Secret_key=''

def pull_data(Key_id, Secret_key, file):
    """
    Function which CJ wrote to pull data from S3 
    """
    BUCKET_NAME = "gtown-wildfire-ds"
    OBJECT_KEY = file
    client = boto3.client(
        's3',
        aws_access_key_id= Key_id,
        aws_secret_access_key= Secret_key)
    obj = client.get_object(Bucket= BUCKET_NAME, Key= OBJECT_KEY) 
    file_df = pd.read_csv(obj['Body'])
    return (file_df)

#Pull in the firms and scan df
file = 'MLTable1.csv'
df = pull_data(S3_Key_id, S3_Secret_key, file)
df.head()

Unnamed: 0.1,Unnamed: 0,brightness,scan,track,confidence,bright_t31,frp,Precipitation Accumulation (in) Start of Day Values,Precipitation Increment (in),Air Temperature Average (degF),Soil Moisture Percent -2in (pct) Start of Day Values,Relative Humidity Enclosure (pct),Wind Speed Average (mph),nearbydetections,FIRE_DETECTED,1,Aqua,Terra,MODIS,Arkansas-White-Red Region,California Region,Great Basin Region,Great Lakes Region,Hawaii Region,Lower Colorado Region,Lower Mississippi Region,Mid Atlantic Region,Missouri Region,New England Region,Ohio Region,Pacific Northwest Region,Rio Grande Region,Souris-Red-Rainy Region,South Atlantic-Gulf Region,Tennessee Region,Texas-Gulf Region,Upper Colorado Region,Upper Mississippi Region
0,1,312.5,1.2,1.1,85,269.1,21.9,11.5,0.0,2.0,36.1,78.0,7.1,0.0,False,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
1,3,309.8,4.0,1.9,68,289.9,82.0,7.8,0.0,-8.0,4.9,66.0,1.8,5.0,False,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,4,312.8,4.0,1.9,80,287.1,103.7,11.3,0.0,14.0,26.9,79.0,10.5,5.0,False,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,5,310.2,4.0,1.9,70,287.3,83.6,8.4,0.0,11.0,8.0,88.0,4.2,5.0,False,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
4,10,300.4,1.0,1.0,28,281.6,6.3,12.8,0.0,15.0,33.1,88.0,3.6,23.0,False,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0


In [3]:
#unnamed seems to be a column brought in that we dont want. drop it. 
df = df.drop(['Unnamed: 0'], axis=1)
df.head()

Unnamed: 0,brightness,scan,track,confidence,bright_t31,frp,Precipitation Accumulation (in) Start of Day Values,Precipitation Increment (in),Air Temperature Average (degF),Soil Moisture Percent -2in (pct) Start of Day Values,Relative Humidity Enclosure (pct),Wind Speed Average (mph),nearbydetections,FIRE_DETECTED,1,Aqua,Terra,MODIS,Arkansas-White-Red Region,California Region,Great Basin Region,Great Lakes Region,Hawaii Region,Lower Colorado Region,Lower Mississippi Region,Mid Atlantic Region,Missouri Region,New England Region,Ohio Region,Pacific Northwest Region,Rio Grande Region,Souris-Red-Rainy Region,South Atlantic-Gulf Region,Tennessee Region,Texas-Gulf Region,Upper Colorado Region,Upper Mississippi Region
0,312.5,1.2,1.1,85,269.1,21.9,11.5,0.0,2.0,36.1,78.0,7.1,0.0,False,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
1,309.8,4.0,1.9,68,289.9,82.0,7.8,0.0,-8.0,4.9,66.0,1.8,5.0,False,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,312.8,4.0,1.9,80,287.1,103.7,11.3,0.0,14.0,26.9,79.0,10.5,5.0,False,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,310.2,4.0,1.9,70,287.3,83.6,8.4,0.0,11.0,8.0,88.0,4.2,5.0,False,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
4,300.4,1.0,1.0,28,281.6,6.3,12.8,0.0,15.0,33.1,88.0,3.6,23.0,False,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0


In [4]:
df.shape

(1396691, 37)

In [5]:
# now get a sample of the df for quicker training
dftest = df.sample(frac=.4) #40% of df sampled 

## ML Preperation

In [6]:
#seperate data sets as labels and features
X = dftest.drop('FIRE_DETECTED', axis=1)
y = dftest['FIRE_DETECTED']

In [7]:
#train test splitting of data
#common syntax here is to use X_train, X_test, y_train, y_test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) #random state is a random seed

In [8]:
#create our scalar to get optimized result
sc = StandardScaler() #runs the standard scalar with default settings. you can refine this, see docs
#transform the feature data by using the fit_transform 
X_train = sc.fit_transform(X_train) 
X_test = sc.fit_transform(X_test) 
X_train[:5] #take a look at the standardized data

array([[ 0.03516332, -0.48458704, -0.81114768, -0.37643271, -0.3622946 ,
        -0.17980909, -0.43296504, -0.16849641, -0.0392129 , -0.61675596,
        -0.60465439, -1.11080104,  1.68892493,  0.66197535, -0.43027169,
        -0.41757458, -0.66197535, -0.24374335,  1.26017099, -0.20361456,
        -0.08033604, -0.08424934, -0.18533399, -0.146414  , -0.11705362,
        -0.25099072, -0.01395331, -0.14075715, -0.31169126, -0.06660551,
        -0.05530782, -0.33974796, -0.07954229, -0.29866308, -0.17007131,
        -0.20904255],
       [ 0.40373887,  1.53462059,  1.53900449,  1.20585281,  2.05737352,
         0.02744151, -0.39219169, -0.16849641,  0.51969943,  0.13130631,
         0.1105918 ,  0.06181765, -0.43033223, -1.51063028, -0.43027169,
         2.3947818 ,  1.51063028,  4.10267598, -0.79354311, -0.20361456,
        -0.08033604, -0.08424934, -0.18533399, -0.146414  , -0.11705362,
        -0.25099072, -0.01395331, -0.14075715, -0.31169126, -0.06660551,
        -0.05530782, -0.33974

## Begin Running ML Algorithms

In [9]:
def RunModel(model):
    """
    function to run a model on partial df and get the results
    """
    model.fit(X_train, y_train) #fits the model using training data
    pred = model.predict(X_test) #predict the test data now
    print('Report:')
    print(classification_report(y_test, pred))
    print('Confusion matrix:')
    print(confusion_matrix(y_test, pred))

#### Random forest classifier

In [42]:
testmodel = RandomForestClassifier(n_estimators=200) #n estimators is hyperparameter how many trees are in the forest
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.23      0.36      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.61      0.67    111736
weighted avg       0.97      0.98      0.97    111736

Confusion matrix:
[[108530    130]
 [  2380    696]]


In [64]:
testmodel = RandomForestClassifier(n_estimators=100)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.85      0.22      0.35      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.61      0.67    111736
weighted avg       0.97      0.98      0.97    111736

Confusion matrix:
[[108539    121]
 [  2407    669]]


In [73]:
testmodel = RandomForestClassifier(n_estimators=50)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.83      0.22      0.35      3076

    accuracy                           0.98    111736
   macro avg       0.90      0.61      0.67    111736
weighted avg       0.97      0.98      0.97    111736

Confusion matrix:
[[108520    140]
 [  2398    678]]


In [74]:
testmodel = RandomForestClassifier(n_estimators=300)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.22      0.35      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.61      0.67    111736
weighted avg       0.97      0.98      0.97    111736

Confusion matrix:
[[108530    130]
 [  2388    688]]


In [75]:
testmodel = RandomForestClassifier(n_estimators=100, criterion = "entropy")
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.22      0.35      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.61      0.67    111736
weighted avg       0.97      0.98      0.97    111736

Confusion matrix:
[[108537    123]
 [  2406    670]]


 #### Extra Trees Classifier 

In [36]:
testmodel = ExtraTreesClassifier(n_estimators=100)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.31      0.45      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.65      0.72    111736
weighted avg       0.98      0.98      0.97    111736

Confusion matrix:
[[108479    181]
 [  2129    947]]


In [37]:
testmodel = ExtraTreesClassifier(n_estimators=200)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.85      0.32      0.46      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.66      0.73    111736
weighted avg       0.98      0.98      0.98    111736

Confusion matrix:
[[108484    176]
 [  2091    985]]


In [38]:
testmodel = ExtraTreesClassifier(n_estimators=50)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.32      0.46      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.66      0.72    111736
weighted avg       0.98      0.98      0.97    111736

Confusion matrix:
[[108477    183]
 [  2105    971]]


In [39]:
testmodel = ExtraTreesClassifier(n_estimators=200, criterion="entropy")
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.32      0.46      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.66      0.73    111736
weighted avg       0.98      0.98      0.98    111736

Confusion matrix:
[[108472    188]
 [  2088    988]]


In [40]:
testmodel = ExtraTreesClassifier(n_estimators=300)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.32      0.46      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.66      0.73    111736
weighted avg       0.98      0.98      0.98    111736

Confusion matrix:
[[108477    183]
 [  2094    982]]


In [41]:
testmodel = ExtraTreesClassifier(n_estimators=300, criterion="entropy")
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.32      0.47      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.66      0.73    111736
weighted avg       0.98      0.98      0.98    111736

Confusion matrix:
[[108471    189]
 [  2083    993]]


In [55]:
testmodel = ExtraTreesClassifier(criterion="entropy", max_depth=50)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.31      0.45      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.65      0.72    111736
weighted avg       0.98      0.98      0.97    111736

Confusion matrix:
[[108475    185]
 [  2123    953]]


In [56]:
testmodel = ExtraTreesClassifier(criterion="entropy", max_depth=200)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.32      0.46      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.66      0.73    111736
weighted avg       0.98      0.98      0.98    111736

Confusion matrix:
[[108466    194]
 [  2086    990]]


In [57]:
testmodel = ExtraTreesClassifier(criterion="entropy", max_depth=1000)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.32      0.47      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.66      0.73    111736
weighted avg       0.98      0.98      0.98    111736

Confusion matrix:
[[108470    190]
 [  2085    991]]


In [58]:
testmodel = ExtraTreesClassifier(max_depth=1000)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.32      0.46      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.66      0.72    111736
weighted avg       0.98      0.98      0.97    111736

Confusion matrix:
[[108476    184]
 [  2105    971]]


In [59]:
testmodel = ExtraTreesClassifier(criterion="entropy", max_depth=500)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.32      0.46      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.66      0.73    111736
weighted avg       0.98      0.98      0.98    111736

Confusion matrix:
[[108479    181]
 [  2090    986]]


In [60]:
testmodel = ExtraTreesClassifier(criterion="entropy", max_depth=1500)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.84      0.32      0.46      3076

    accuracy                           0.98    111736
   macro avg       0.91      0.66      0.72    111736
weighted avg       0.98      0.98      0.97    111736

Confusion matrix:
[[108481    179]
 [  2106    970]]


#### Bagging Classifier 

In [61]:
testmodel = BaggingClassifier()
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.68      0.22      0.34      3076

    accuracy                           0.98    111736
   macro avg       0.83      0.61      0.66    111736
weighted avg       0.97      0.98      0.97    111736

Confusion matrix:
[[108332    328]
 [  2385    691]]


In [62]:
testmodel = BaggingClassifier(n_estimators = 20)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.74      0.24      0.36      3076

    accuracy                           0.98    111736
   macro avg       0.86      0.62      0.67    111736
weighted avg       0.97      0.98      0.97    111736

Confusion matrix:
[[108403    257]
 [  2346    730]]


In [63]:
testmodel = BaggingClassifier(n_estimators = 50)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.78      0.25      0.38      3076

    accuracy                           0.98    111736
   macro avg       0.88      0.62      0.68    111736
weighted avg       0.97      0.98      0.97    111736

Confusion matrix:
[[108437    223]
 [  2307    769]]


#### Decision Tree Classifier 

In [11]:
testmodel = DecisionTreeClassifier()
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.25      0.35      0.29      3076

    accuracy                           0.95    111736
   macro avg       0.62      0.66      0.64    111736
weighted avg       0.96      0.95      0.96    111736

Confusion matrix:
[[105432   3228]
 [  1987   1089]]


In [12]:
testmodel = DecisionTreeClassifier(criterion = "entropy")
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.27      0.35      0.30      3076

    accuracy                           0.96    111736
   macro avg       0.62      0.66      0.64    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105745   2915]
 [  2008   1068]]


In [13]:
testmodel = DecisionTreeClassifier(splitter="random")
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.28      0.39      0.33      3076

    accuracy                           0.96    111736
   macro avg       0.63      0.68      0.65    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105541   3119]
 [  1866   1210]]


In [14]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy")
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.29      0.37      0.33      3076

    accuracy                           0.96    111736
   macro avg       0.64      0.67      0.65    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105905   2755]
 [  1924   1152]]


In [15]:
testmodel = DecisionTreeClassifier(splitter="random", min_samples_split=3)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.33      0.36      0.34      3076

    accuracy                           0.96    111736
   macro avg       0.65      0.67      0.66    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[106380   2280]
 [  1965   1111]]


In [16]:
testmodel = DecisionTreeClassifier(splitter="random", min_samples_split=4)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.30      0.36      0.33      3076

    accuracy                           0.96    111736
   macro avg       0.64      0.67      0.65    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[106125   2535]
 [  1965   1111]]


In [17]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", min_samples_split=3)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.32      0.36      0.34      3076

    accuracy                           0.96    111736
   macro avg       0.65      0.67      0.66    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[106334   2326]
 [  1960   1116]]


In [18]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", min_samples_split=10)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.99      0.98    108660
        True       0.35      0.29      0.32      3076

    accuracy                           0.97    111736
   macro avg       0.67      0.64      0.65    111736
weighted avg       0.96      0.97      0.96    111736

Confusion matrix:
[[107049   1611]
 [  2192    884]]


In [19]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", min_samples_split=20)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.99      0.98    108660
        True       0.39      0.25      0.31      3076

    accuracy                           0.97    111736
   macro avg       0.68      0.62      0.64    111736
weighted avg       0.96      0.97      0.97    111736

Confusion matrix:
[[107444   1216]
 [  2303    773]]


In [20]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=40)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.29      0.38      0.33      3076

    accuracy                           0.96    111736
   macro avg       0.63      0.68      0.65    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105733   2927]
 [  1907   1169]]


In [21]:
testmodel = DecisionTreeClassifier(splitter="random", max_depth=10)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.97      1.00      0.99    108660
        True       0.43      0.03      0.05      3076

    accuracy                           0.97    111736
   macro avg       0.70      0.51      0.52    111736
weighted avg       0.96      0.97      0.96    111736

Confusion matrix:
[[108546    114]
 [  2991     85]]


In [22]:
testmodel = DecisionTreeClassifier(splitter="random", max_depth=70)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.28      0.37      0.32      3076

    accuracy                           0.96    111736
   macro avg       0.63      0.67      0.65    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105807   2853]
 [  1947   1129]]


In [23]:
testmodel = DecisionTreeClassifier(splitter="random", max_depth=5)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.97      1.00      0.99    108660
        True       1.00      0.00      0.00      3076

    accuracy                           0.97    111736
   macro avg       0.99      0.50      0.49    111736
weighted avg       0.97      0.97      0.96    111736

Confusion matrix:
[[108660      0]
 [  3075      1]]


In [24]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=10)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.97      1.00      0.99    108660
        True       0.50      0.03      0.06      3076

    accuracy                           0.97    111736
   macro avg       0.74      0.52      0.52    111736
weighted avg       0.96      0.97      0.96    111736

Confusion matrix:
[[108556    104]
 [  2971    105]]


In [25]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=40)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.28      0.35      0.31      3076

    accuracy                           0.96    111736
   macro avg       0.63      0.66      0.65    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105896   2764]
 [  1986   1090]]


In [26]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=70)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.32      0.40      0.35      3076

    accuracy                           0.96    111736
   macro avg       0.65      0.69      0.67    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105983   2677]
 [  1840   1236]]


In [35]:
testmodel = DecisionTreeClassifier(splitter="random", max_depth=70)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.29      0.38      0.33      3076

    accuracy                           0.96    111736
   macro avg       0.63      0.68      0.65    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105729   2931]
 [  1898   1178]]


In [32]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=150)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.27      0.35      0.31      3076

    accuracy                           0.96    111736
   macro avg       0.63      0.66      0.64    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105853   2807]
 [  2012   1064]]


In [31]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=1500)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.31      0.40      0.35      3076

    accuracy                           0.96    111736
   macro avg       0.65      0.69      0.67    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105959   2701]
 [  1846   1230]]


In [44]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=1200)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.29      0.37      0.33      3076

    accuracy                           0.96    111736
   macro avg       0.64      0.67      0.65    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105868   2792]
 [  1937   1139]]


In [45]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=1700)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.33      0.41      0.37      3076

    accuracy                           0.96    111736
   macro avg       0.66      0.69      0.67    111736
weighted avg       0.97      0.96      0.96    111736

Confusion matrix:
[[106080   2580]
 [  1810   1266]]


In [46]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=1500, class_weight={True: 5, False: 1})
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.29      0.37      0.32      3076

    accuracy                           0.96    111736
   macro avg       0.64      0.67      0.65    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105913   2747]
 [  1949   1127]]


In [47]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=1500, class_weight={True: 10, False: 1})
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.34      0.40      0.37      3076

    accuracy                           0.96    111736
   macro avg       0.66      0.69      0.67    111736
weighted avg       0.97      0.96      0.96    111736

Confusion matrix:
[[106250   2410]
 [  1839   1237]]


In [48]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=1700, class_weight={True: 15, False: 1})
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.32      0.37      0.34      3076

    accuracy                           0.96    111736
   macro avg       0.65      0.67      0.66    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[106250   2410]
 [  1942   1134]]


In [49]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=1200, class_weight={True: 15, False: 1})
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.31      0.35      0.33      3076

    accuracy                           0.96    111736
   macro avg       0.65      0.66      0.66    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[106330   2330]
 [  2005   1071]]


In [51]:
testmodel = DecisionTreeClassifier(splitter="random", max_depth=1700)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.31      0.40      0.35      3076

    accuracy                           0.96    111736
   macro avg       0.65      0.69      0.66    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105927   2733]
 [  1851   1225]]


In [52]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=1700, min_samples_split=3)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.31      0.35      0.33      3076

    accuracy                           0.96    111736
   macro avg       0.65      0.67      0.66    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[106252   2408]
 [  1988   1088]]


In [53]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=1700, min_samples_split=4)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.32      0.36      0.34      3076

    accuracy                           0.96    111736
   macro avg       0.65      0.67      0.66    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[106340   2320]
 [  1977   1099]]


In [54]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=1700, min_samples_split=10)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.99      0.98    108660
        True       0.37      0.30      0.33      3076

    accuracy                           0.97    111736
   macro avg       0.68      0.64      0.66    111736
weighted avg       0.96      0.97      0.97    111736

Confusion matrix:
[[107116   1544]
 [  2163    913]]


In [33]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=2500)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.97      0.98    108660
        True       0.28      0.36      0.32      3076

    accuracy                           0.96    111736
   macro avg       0.63      0.67      0.65    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[105855   2805]
 [  1960   1116]]


In [34]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", max_depth=1000)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.30      0.37      0.33      3076

    accuracy                           0.96    111736
   macro avg       0.64      0.67      0.66    111736
weighted avg       0.96      0.96      0.96    111736

Confusion matrix:
[[106008   2652]
 [  1933   1143]]


In [27]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", min_samples_leaf=2)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      0.98      0.98    108660
        True       0.35      0.31      0.33      3076

    accuracy                           0.97    111736
   macro avg       0.66      0.65      0.65    111736
weighted avg       0.96      0.97      0.96    111736

Confusion matrix:
[[106893   1767]
 [  2129    947]]


In [28]:
testmodel = DecisionTreeClassifier(splitter="random", criterion = "entropy", min_samples_leaf=10)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.51      0.16      0.24      3076

    accuracy                           0.97    111736
   macro avg       0.74      0.58      0.61    111736
weighted avg       0.96      0.97      0.97    111736

Confusion matrix:
[[108195    465]
 [  2587    489]]


In [29]:
testmodel = DecisionTreeClassifier(splitter="random", min_samples_leaf=10)
RunModel(testmodel)

Report:
              precision    recall  f1-score   support

       False       0.98      1.00      0.99    108660
        True       0.46      0.14      0.22      3076

    accuracy                           0.97    111736
   macro avg       0.72      0.57      0.60    111736
weighted avg       0.96      0.97      0.96    111736

Confusion matrix:
[[108151    509]
 [  2637    439]]
