## Analyzing & Predicting Validation Data
In a series of steps we'll be using to predict & Analyze the validation dataset that we prepared in the last notebook. For the **H20's Driverless AI Models** (Model 2 & Model 3), we'll simply be using the GUI to obtain predictions. You can use the mojo pipeline and import the saved model files directly into python for obtaining predictions as well.

- Read more about The Mojo Pipeline [here](http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/scoring-pipeline-cpp.html)
- You can find all the saved-model files [here](https://drive.google.com/drive/folders/1Xw8mD3RfcNjT89-mh-9ZgcDg1vyIurpn?usp=sharing)
- You can find all the csv files used in this notebook [here](https://drive.google.com/drive/folders/1Xw8mD3RfcNjT89-mh-9ZgcDg1vyIurpn?usp=sharing). You'll also find two folders with files that you can use for plotting :)


In [None]:
# Importing the required libraries
import pandas as pd
import numpy as np

We will be importing the `Model 1 Predctions` to be analyzed then used as Validation_dataset for `Model 2`

In [None]:
# Sorting Model 1 Predictions
mod1=pd.read_csv('Model_1_Prediction.csv')
mod1.loc[mod1['VAL_TYPE.no'] >= 0.5, 'Mineralized_predicted'] = 'N'
mod1.loc[mod1['VAL_TYPE.no'] < 0.5, 'Mineralized_predicted'] = 'Y'
mod1_y=mod1.loc[mod1['Mineralized_predicted'] == 'Y']
mod1_n=mod1.loc[mod1['Mineralized_predicted'] == 'N']

In [None]:
# Saving the dataframes
mod1_y.to_csv('Validation_Model_2.csv')
mod1_n.to_csv('False_Predicted_Model_1.csv')

This `Validation_Model_2.csv` is fed to the Model 2 (Driverless AI) for taking predictions. The predictions is saved/renamed to `Model_2_Prediction.csv`. This is now imported to create the validation file that will be used to make predictions from **Model 3** 

In [2]:
!ls

 exported			      Model2.zip
 final_files			      mojo
 GeoChem_Data			      mojo.zip
 helper_files			      plots
 markdowns			      res
 Model2				      validation
'Model2_&_Model3_Predictions.ipynb'   Validation_Data_Preparation.ipynb
 Model_2_Prediction.csv		      whl_files


In [61]:
mod_2=pd.read_csv('Model_2_Prediction.csv')
mod_2

Unnamed: 0,X,Y,d8-9s,dem-9s,gravity_1V,gravity_ma,SA_TMI,SA_TMI_VRT,CHEM_CODE.Ag,CHEM_CODE.Al,...,CHEM_CODE.Tl,CHEM_CODE.U,CHEM_CODE.U3O8,CHEM_CODE.V,CHEM_CODE.V2O5,CHEM_CODE.W,CHEM_CODE.WO3,CHEM_CODE.Y,CHEM_CODE.Zn,CHEM_CODE.Zr
0,8.684265e+05,2.360996e+06,2.0,196.02760,-0.00054,-28.80190,336.23224,188.72067,0.026894,0.002476,...,0.000026,0.000815,0.000230,0.004244,0.000157,0.000298,6.658961e-06,0.000042,0.050463,0.000662
1,1.245028e+06,2.389749e+06,8.0,-14.90586,0.00054,-9.64707,-84.02585,-225.24809,0.047426,0.000194,...,0.004543,0.019149,0.000478,0.019978,0.000447,0.010092,1.399078e-03,0.000029,0.013349,0.007553
2,1.418145e+06,2.110826e+06,128.0,37.50758,-0.00047,-14.72168,549.13177,498.79565,0.043687,0.006918,...,0.001704,0.013115,0.003083,0.024879,0.000053,0.000081,6.377340e-05,0.000095,0.033036,0.000095
3,1.113103e+06,2.453488e+06,4.0,47.91328,0.00232,-7.99617,472.27765,67.87169,0.004451,0.000008,...,0.002340,0.010542,0.000049,0.003358,0.000002,0.000131,3.798435e-06,0.000007,0.242410,0.000404
4,5.252406e+05,2.062240e+06,8.0,91.11829,-0.00112,-44.91911,651.75275,597.80988,0.041143,0.002305,...,0.000366,0.006303,0.000530,0.005903,0.000112,0.005092,1.845190e-05,0.000064,0.001698,0.000287
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
341798,9.790181e+05,2.434524e+06,1.0,154.79810,0.00110,-23.56145,64.37791,-108.68669,0.015865,0.005088,...,0.000013,0.000581,0.000035,0.018201,0.000038,0.000073,5.991108e-06,0.000077,0.001022,0.000415
341799,1.026546e+06,2.405579e+06,64.0,117.94060,-0.00026,-13.13198,18.49036,-123.87996,0.052919,0.001173,...,0.000325,0.008174,0.001631,0.019146,0.001954,0.000063,1.032530e-05,0.000149,0.005633,0.000453
341800,1.431375e+06,2.026014e+06,32.0,95.54968,-0.00005,-23.62956,152.30115,326.13589,0.004612,0.000034,...,0.000150,0.000265,0.000137,0.000676,0.000022,0.000002,3.065187e-07,0.000006,0.003448,0.000017
341801,1.153703e+06,2.139232e+06,16.0,100.49280,-0.00070,-19.31586,26.37417,-182.64040,0.012210,0.000571,...,0.002924,0.009468,0.004732,0.024592,0.000116,0.000358,3.970596e-05,0.000076,0.004625,0.003431


In [34]:
# Selecting the Chem Columns
mod_2_chem=mod_2.iloc[:,8:]

# Selecting only the desired columns
mod_2=mod_2[['X','Y','d8-9s','dem-9s','gravity_1V','gravity_ma','SA_TMI','SA_TMI_VRT']]

In [35]:
mod_2_chem['CHEM_CODE'] = mod_2_chem.idxmax(axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [43]:
# Add the column name with highest value by striping 'CHEM_CODE' from all rows
mod_2['CHEM_CODE']=mod_2_chem['CHEM_CODE'].str.replace('CHEM_CODE.','')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [44]:
# Save Model 2's file to Validation_Model_3
mod_2.to_csv('Validation_Model_3.csv')

Unnamed: 0,X,Y,d8-9s,dem-9s,gravity_1V,gravity_ma,SA_TMI,SA_TMI_VRT,CHEM_CODE
0,8.684265e+05,2.360996e+06,2.0,196.02760,-0.00054,-28.80190,336.23224,188.72067,Ca
1,1.245028e+06,2.389749e+06,8.0,-14.90586,0.00054,-9.64707,-84.02585,-225.24809,Rb
2,1.418145e+06,2.110826e+06,128.0,37.50758,-0.00047,-14.72168,549.13177,498.79565,Nd
3,1.113103e+06,2.453488e+06,4.0,47.91328,0.00232,-7.99617,472.27765,67.87169,Cu
4,5.252406e+05,2.062240e+06,8.0,91.11829,-0.00112,-44.91911,651.75275,597.80988,Ca
...,...,...,...,...,...,...,...,...,...
341798,9.790181e+05,2.434524e+06,1.0,154.79810,0.00110,-23.56145,64.37791,-108.68669,Ca
341799,1.026546e+06,2.405579e+06,64.0,117.94060,-0.00026,-13.13198,18.49036,-123.87996,Ca
341800,1.431375e+06,2.026014e+06,32.0,95.54968,-0.00005,-23.62956,152.30115,326.13589,Fe
341801,1.153703e+06,2.139232e+06,16.0,100.49280,-0.00070,-19.31586,26.37417,-182.64040,Fe


This `Validation_Model_3.csv` file is then used to obtain predictions from Model 3 (Driverless AI). Those predictions are the final predictions including ->
- The complete list of Mineralized Locations with their co-ordinates.
- The mineral with the highest probability of being found in the location.
- The type of Mineral Deposit.

Even though, we don't have to reprocess Model3's Predictions but let's still use it to analyze the final outcomes.

In [71]:
mod3 = pd.read_csv('Model_3_Prediction.csv')

In [72]:
# Get the UNIT_PPM_CLASS of all the predictions by using the idxmax function.
mod3['UNIT_PPM_CLASS']=mod3[['UNIT_PPM_CLASS.HIGH','UNIT_PPM_CLASS.LOW','UNIT_PPM_CLASS.MED']].idxmax(axis=1).str.replace('UNIT_PPM_CLASS.','')

In [74]:
mod3 = mod3[['X','Y','d8-9s','dem-9s','gravity_1V','gravity_ma','SA_TMI','SA_TMI_VRT','CHEM_CODE', 'UNIT_PPM_CLASS']]

In [76]:
mod3.head()

Unnamed: 0,X,Y,d8-9s,dem-9s,gravity_1V,gravity_ma,SA_TMI,SA_TMI_VRT,CHEM_CODE,UNIT_PPM_CLASS
0,868426.5,2360996.0,2.0,196.0276,-0.00054,-28.8019,336.23224,188.72067,Ca,HIGH
1,1245028.0,2389749.0,8.0,-14.90586,0.00054,-9.64707,-84.02585,-225.24809,Rb,MED
2,1418145.0,2110826.0,128.0,37.50758,-0.00047,-14.72168,549.13177,498.79565,Nd,LOW
3,1113103.0,2453488.0,4.0,47.91328,0.00232,-7.99617,472.27765,67.87169,Cu,LOW
4,525240.6,2062240.0,8.0,91.11829,-0.00112,-44.91911,651.75275,597.80988,Ca,HIGH


In [None]:
# Save the file to Final_Predictions.csv
mod3.to_csv('Final_Predictions.csv')