# Welcome  

Notebook Author: Samuel Alter  
Notebook Subject: Capstone Project - Metamodel

BrainStation Winter 2023: Data Science

This notebook is the final piece of the modeling pipeline for this capstone project. Specifically, I will combine the geoanalysis model prediction results and the image analysis model prediction results into a new `sklearn` `LogisticRegression` model for a metamodel prediction.

# Imports

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import statsmodels.api as sm

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from sklearn.model_selection import GridSearchCV
from sklearn.decomposition import PCA
from sklearn import svm
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline

# Combine the datasets

The geoanalysis table was combined in the pattern `city`, `farm`, `fire1`, `fire2`. So the ground truth for each row would be:
* `city=0` ("nofire")
* `farm=0` ("nofire")
* `fire1=1` ("fire")
* `fire2=1` ("fire")

The imageanalysis, when predicting the class (fire/nofire) should have read in the fire photos first, which means that the imageanalysis prediction table needs to be flipped.

## Read in the geographic analysis prediction table:

In [3]:
geo_prob=pd.read_csv('/Users/sra/Files/projects/brainstation/brainstation_2023_ds_capstone/brainstation_2023_ds_capstone/01_capstone_notebooks/geo_prob.csv')
geo_prob

Unnamed: 0.1,Unnamed: 0,rownum,geo_prob
0,0,0,0.018438
1,1,1,0.022823
2,2,2,0.082352
3,3,3,0.022845
4,4,4,0.103090
...,...,...,...
19831,19831,19831,0.999999
19832,19832,19832,0.999992
19833,19833,19833,0.999990
19834,19834,19834,1.000000


The `nofire` data is listed first (rows 0-9918). See how the $9918^{th}$ row switches predictions: 

In [6]:
geo_prob.iloc[0:9919,:]

Unnamed: 0.1,Unnamed: 0,rownum,geo_prob
0,0,0,0.018438
1,1,1,0.022823
2,2,2,0.082352
3,3,3,0.022845
4,4,4,0.103090
...,...,...,...
9914,9914,9914,0.000257
9915,9915,9915,0.000010
9916,9916,9916,0.000217
9917,9917,9917,0.000041


## Read in the image analysis prediction table:

In [4]:
img_prob=pd.read_csv('/Users/sra/Files/projects/brainstation/brainstation_2023_ds_capstone/brainstation_2023_ds_capstone/01_capstone_notebooks/img_prob_20230408_run02.csv')
img_prob

Unnamed: 0.1,Unnamed: 0,0,1
0,0,0.999983,0.000017
1,1,0.999996,0.000004
2,2,0.999978,0.000022
3,3,0.999705,0.000295
4,4,0.999916,0.000084
...,...,...,...
19831,19831,0.654506,0.345494
19832,19832,0.030230,0.969770
19833,19833,0.188880,0.811120
19834,19834,0.433703,0.566298


As discussed above, `Tensorflow` should have predicted the `fire` images first because that folder is alphabetically before `nofire`. (`f` before `n`.) This means that the first half of this table needs to be flipped to the bottom to match the geoanalysis table.

Viewing the first half plus one row of the `img_prob` table:

In [7]:
img_prob.iloc[0:9919,:]

Unnamed: 0.1,Unnamed: 0,0,1
0,0,0.999983,0.000017
1,1,0.999996,0.000004
2,2,0.999978,0.000022
3,3,0.999705,0.000295
4,4,0.999916,0.000084
...,...,...,...
9914,9914,0.999812,0.000188
9915,9915,0.999237,0.000763
9916,9916,0.999873,0.000127
9917,9917,0.999935,0.000065


Additionally, the ground truth needs to be added to the table for the metamodel to train itself. Luckily, because the order is preserved in the tables, the first half of the geoanalysis is `nofire`, or `0`, and the `fire` second half is `1`.