# Subhalo Random Forest Classification

## Apply Random Forest Classifier to DMO Subhalo Catalogs

This notebook applies a trained random forest to identify subhalos that are likely to be disrupted in hydrodynamic runs of dark matter only zoom-in simulations. After loading a DMO subhalo catalog with appropriate features and a trained random forest classifier, the classifier is used to output disruption probabilities.

### Imports

In [1]:
import numpy as np
import pandas as pd
import pickle

### Load Data

The user should upload the DMO subhalo catalog they wish to classify. We provide an example corresponding to the code in $\texttt{train.ipynb}$.

In [8]:
Halo023_properties = np.loadtxt('Halo023_properties_example.txt')

Halo023_features = pd.DataFrame(Halo023_properties)
Halo023_features = Halo023_features.rename(index=str, columns={0:"$d_{peri}$", \
                                                       1: "$a_{acc}$", \
                                                       2: "$V_{acc}$", \
                                                       3: "$M_{acc}$", \
                                                       4: "$a_{peri}$"})

Halo023_features

Unnamed: 0,$d_{peri}$,$a_{acc}$,$V_{acc}$,$M_{acc}$,$a_{peri}$
0,170.616243,0.8155,54.483148,1.713112e+10,0.9383
1,157.557844,0.9383,52.855967,1.372161e+10,1.0000
2,192.665472,0.6910,55.918359,1.797514e+10,0.7849
3,89.381026,0.8916,41.694235,6.440465e+09,0.9748
4,50.607471,0.8473,34.527324,3.258255e+09,0.9383
5,177.577639,0.7750,28.082589,2.116738e+09,0.9030
6,154.628554,0.9383,25.385856,1.967154e+09,1.0000
7,134.723775,0.8365,27.433545,1.877738e+09,0.9264
8,176.935008,0.8473,27.854052,1.546815e+09,1.0000
9,38.838089,0.8473,32.086552,2.179413e+09,0.9146


In [10]:
#Load finalized model
filename = 'finalized_rf.sav'
loaded_model = pickle.load(open(filename, 'rb'))

### Apply Random Forest Classifier

In [16]:
#Survival and Disruption Probabilities
prob = loaded_model.predict_proba(Halo023_features)
surv_prob = prob[:,0]
dest_prob = prob[:,1]

In [21]:
#Most likely classification (50% probability split)
pred = loaded_model.predict(Halo023_features)
Halo023_surviving_features = Halo023_features[pred==0]

In [23]:
print('Survival Fraction: %0.2f' % (len(pred[pred==0])/len(pred)))

Survival Fraction: 0.78
