# Kuiper Belt Object Classification
This Jupyter notebook is a complement to Smullen & Volk (2020). Please cite that paper if you use this notebook or any products in this repository.

Using short numerical simulations of observed Kuiper Belt Objects (KBOs), we have trained a Gradient Boosting Classifier to sort KBOs into four populations:
- Resonant: KBOs under the direct dynamical influence of Neptune
- Classical: Primordial KBOs from the formation of the Solar System
- Detached: Stable objects outside the classicals
- Scattering: KBOs with active orbital evolution

## Training the Classifier

This notebook requires at minimum, Numpy, Pandas, and scikit-learn. To use the expanded functionality of running simulations, this notebook also requires the N-body integration package [Rebound](https://rebound.readthedocs.io/en/latest/).

In [1]:
import numpy as np
import pandas as pd

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from sklearn.ensemble import GradientBoostingClassifier

%run simulate_and_parse.ipynb

Load the file that contains the features for the 2305 KBOs used in Smullen & Volk (2020).

In [2]:
all_KBOs = pd.read_csv('KBO_features.csv', skipinitialspace=True)
print(all_KBOs.columns)

Index(['MPC ID', 'Securely Classified', 'Class', 'a_i', 'a_f', 'a_min',
       'a_mean', 'a_max', 'a_sigma', 'a_delta', 'adot_min', 'adot_mean',
       'adot_max', 'adot_delta', 'e_i', 'e_f', 'e_min', 'e_mean', 'e_max',
       'e_sigma', 'e_delta', 'edot_min', 'edot_mean', 'edot_max', 'edot_delta',
       'i_i', 'i_f', 'i_min', 'i_mean', 'i_max', 'i_sigma', 'i_delta',
       'idot_min', 'idot_mean', 'idot_max', 'idot_delta', 'Om_i', 'Om_f',
       'Om_min', 'Om_mean', 'Om_max', 'Om_sigma', 'Om_delta', 'Omdot_min',
       'Omdot_mean', 'Omdot_max', 'Omdot_delta', 'o_i', 'o_f', 'o_min',
       'o_mean', 'o_max', 'o_sigma', 'o_delta', 'odot_min', 'odot_mean',
       'odot_max', 'odot_delta'],
      dtype='object')


Take only the securely classified objects to minimize "contamination" of the classifier.

In [3]:
secure_KBOs = all_KBOs[ all_KBOs['Securely Classified']==True ]

Make array of labels for classifier (and some helpful dictionaries for later).

In [4]:
all_types = list( set(secure_KBOs['Class']) )
types_dict = { all_types[i] : i for i in range( len(all_types) ) }
int_dict = { i : all_types[i] for i in range( len(all_types) ) }
classes = secure_KBOs['Class'].map(types_dict)

Split the data into the training and testing sets. Pandas may throw a warning about views vs. copies; you can ignore it.

In [5]:
features_train, features_test, classes_train, classes_test = train_test_split(secure_KBOs, classes, test_size=0.3, random_state=30)

ids_train = features_train['MPC ID'].to_numpy()
features_train.drop(['MPC ID', 'Securely Classified', 'Class'], axis=1, inplace=True)
features_train = features_train.to_numpy()

ids_test = features_test['MPC ID'].to_numpy()
features_test.drop(['MPC ID', 'Securely Classified', 'Class'], axis=1, inplace=True)
features_test = features_test.to_numpy()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Initialize and train the classifier using the trained hyperparameters from Smullen & Volk (2020).

In [6]:
classifier = GradientBoostingClassifier( learning_rate=0.1, loss='deviance', max_depth=3, max_features='log2', n_estimators=130, random_state=30 )
classifier.fit(features_train, classes_train)

GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=0.1, loss='deviance', max_depth=3,
                           max_features='log2', max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=1, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=130,
                           n_iter_no_change=None, presort='deprecated',
                           random_state=30, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False)

Test accuracy using the testing set of KBOs

In [7]:
classes_predict = classifier.predict( features_test )
print('Classifier is ', accuracy_score(classes_test, classes_predict) * 100, '% accurate on testing set' )

Classifier is  98.8929889298893 % accurate on testing set


## Helpful functions for using this classifier to investigate new KBOs
All defaults are for the resonant KBO K04VD0X

In [8]:
def print_probs(probs, int_dict):
    '''
    Helper function for pretty output
    '''
    
    print('This object has the following probabilities of class membership:')
    p=probs[0]
    for i,k in enumerate(list(int_dict.keys())):
        print(int_dict[k],':',p[i]*100,'%')
    return

### Predicting class from file
We provide the example output file `K04VD0X_bf.follow`. The filename is specified in the variable **`fname`**. The file must contain at minimum columns for time, semi-major axis, eccentricity, inclination, longitude of ascending node, and argument of pericenter. If the columns are not in this order, or if there are extra columns, the column indices for the required columns can be specified as a list passed to **`col_order`**.
This function will only take the first 101 rows of the file. Each row is a simulation time output: 0, 1E3, 2E3, ... , 99E3, 100E3

In [9]:
new_features = compute_from_file(fname='K04VD0X_bf.follow') # Load the data and compute features
prediction = classifier.predict_proba(new_features) # Predict the probabilities of class membership for object
print_probs(prediction, int_dict) # Pretty output

Loaded K04VD0X_bf.follow

This object has the following probabilities of class membership:
Resonant : 99.6478632033365 %
Detached : 0.0024340049781761464 %
Scattering : 0.0025545213338736485 %
Classical : 0.34714827035142304 %


### Simulating KBOs
These simulations, made using Rebound, are run with the integrator `Mercurius` and an initial timestep of 0.1 years. Default values are for the object K04VD0X.  Rebound queries the [JPL Horizons database](https://ssd.jpl.nasa.gov/horizons.cg) and therefore requires an internet connection.

### Predicting class from orbital elements
This runs an N-body simulation with Rebound of a KBO with a user-specified orbit. Required inputs are **`epoch`** in J2000 Julian Date and the six orbital elements (**a, ecc, inc, Omega, omega, and M**). Semi-major axis is in AU, and i, $\Omega$, $\omega$, and M are in degrees. The default input is orbital elements in heliocentric coordinates (the Horizons default), but the user can use barycentric orbital elements with _`barycentric=True`_.

In [10]:
new_features = compute_from_aei() # Pass KBO orbit and epoch, run a simulation, and compute features
prediction = classifier.predict_proba(new_features) # Predict the probabilities of class membership for object
print_probs(prediction, int_dict) # Pretty output

Searching NASA Horizons for 'Sun'... Found: Sun (10).
Searching NASA Horizons for 'Jupiter'... Found: Jupiter Barycenter (5).
Searching NASA Horizons for 'Saturn'... Found: Saturn Barycenter (6).
Searching NASA Horizons for 'Uranus'... Found: Uranus Barycenter (7).
Searching NASA Horizons for 'Neptune'... Found: Neptune Barycenter (8).
---------------------------------
REBOUND version:     	3.12.1
REBOUND built on:    	Mar 14 2020 16:43:01
Number of particles: 	6
Selected integrator: 	mercurius
Simulation time:     	1.0000000000000000e+05
Current timestep:    	0.100000
---------------------------------
<rebound.Particle object, m=1.0 x=-0.001962833101500212 y=0.007656781240012263 z=2.183275309101057e-05 vx=-0.0028796061033907056 vy=0.000237028114926568 vz=7.393244200386153e-05>
<rebound.Particle object, m=0.0009547919152112404 x=-1.2585531676262454 y=-5.145873816680648 z=0.04714476975015494 vx=2.601659934084253 vy=-0.7180348559000759 vz=-0.05302404666678661>
<rebound.Particle object, m

Next is an example for MU69 (Arrokoth) on today's\* date. We can get today's JD from `datetime` and  `Astropy` or use _`epoch=0`_.

\*The orbital elements were pulled from Horizons on 15 April 2020. This may require updating in the future, or the classifier will be somewhat less certain.

In [11]:
from astropy.time import Time
from datetime import datetime

# datetime.now() returns today's date, and astropy.Time object converts to jd
today = Time(datetime.now()).jd
print("Today's JD is", today)

# Elements extracted 15 April 2020
new_features = compute_from_aei(epoch=today, a=44.6694, ecc=0.05209, inc=2.4482, Omega=159.0966, omega=181.4084, M=311.9376) # Pass KBO orbit and epoch, run a simulation, and compute features
prediction = classifier.predict_proba(new_features) # Predict the probabilities of class membership for object
print_probs(prediction, int_dict) # Pretty output

Today's JD is 2458959.9932914493
Searching NASA Horizons for 'Sun'... Found: Sun (10).
Searching NASA Horizons for 'Jupiter'... Found: Jupiter Barycenter (5).
Searching NASA Horizons for 'Saturn'... Found: Saturn Barycenter (6).
Searching NASA Horizons for 'Uranus'... Found: Uranus Barycenter (7).
Searching NASA Horizons for 'Neptune'... Found: Neptune Barycenter (8).
---------------------------------
REBOUND version:     	3.12.1
REBOUND built on:    	Mar 14 2020 16:43:01
Number of particles: 	6
Selected integrator: 	mercurius
Simulation time:     	1.0000000000000000e+05
Current timestep:    	0.100000
---------------------------------
<rebound.Particle object, m=1.0 x=-0.003471276570798096 y=0.0020692299852362443 z=2.2182464349303e-05 vx=-0.0005862360504089874 vy=-0.0019674428935580334 vz=2.0321571633905386e-05>
<rebound.Particle object, m=0.0009547919152112404 x=5.265844721679079 y=-0.8725861969921078 z=-0.10963063559769906 vx=0.37701416875580473 vy=2.6540851514290904 vz=-0.0185447376

### Predicting class from Horizons identifier
This runs an N-body simulation with Rebound of a KBO with a specified identifier **`objname`** in the JPL Horizons database and optional **`epoch`**; defaults to time at which simulation is run.  Rebound will return a warning that the mass is 0; you can ignore this.

In [12]:
new_features=compute_from_jpl() # Pass KBO identifier and epoch, run a simulation, and compute features
prediction = classifier.predict_proba(new_features) # Predict the probabilities of class membership for object
print_probs(prediction, int_dict) # Pretty output

Searching NASA Horizons for 'Sun'... Found: Sun (10).
Searching NASA Horizons for 'Jupiter'... Found: Jupiter Barycenter (5).
Searching NASA Horizons for 'Saturn'... Found: Saturn Barycenter (6).
Searching NASA Horizons for 'Uranus'... Found: Uranus Barycenter (7).
Searching NASA Horizons for 'Neptune'... Found: Neptune Barycenter (8).
Searching NASA Horizons for 'NAME=K04VD0X'... Found: (2004 VX130).




---------------------------------
REBOUND version:     	3.12.1
REBOUND built on:    	Mar 14 2020 16:43:01
Number of particles: 	6
Selected integrator: 	mercurius
Simulation time:     	1.0000000000000000e+05
Current timestep:    	0.100000
---------------------------------
<rebound.Particle object, m=1.0 x=-0.001962833101500212 y=0.007656781240012263 z=2.183275309101057e-05 vx=-0.0028796061033907056 vy=0.000237028114926568 vz=7.393244200386153e-05>
<rebound.Particle object, m=0.0009547919152112404 x=-1.2585531676262454 y=-5.145873816680648 z=0.04714476975015494 vx=2.601659934084253 vy=-0.7180348559000759 vz=-0.05302404666678661>
<rebound.Particle object, m=0.0002858856727222417 x=7.503547575836432 y=-6.713111210727221 z=-0.21818183049830997 vx=1.3539882201902362 vy=1.3654584970477095 vz=-0.0741682373727811>
<rebound.Particle object, m=4.36624373583127e-05 x=12.224362025595719 y=15.216310406076719 z=-0.43947379221033056 vx=-1.0754830579656713 vy=0.912728070755838 vz=-0.005944009305805339>

Here's an example for the classical KBO K13EF4J.

In [13]:
new_features=compute_from_jpl(objname='K13EF4J') # Pass KBO identifier and epoch, run a simulation, and compute features
prediction = classifier.predict_proba(new_features) # Predict the probabilities of class membership for object
print_probs(prediction, int_dict) # Pretty output

Searching NASA Horizons for 'Sun'... Found: Sun (10).
Searching NASA Horizons for 'Jupiter'... Found: Jupiter Barycenter (5).
Searching NASA Horizons for 'Saturn'... Found: Saturn Barycenter (6).
Searching NASA Horizons for 'Uranus'... Found: Uranus Barycenter (7).
Searching NASA Horizons for 'Neptune'... Found: Neptune Barycenter (8).
Searching NASA Horizons for 'NAME=K13EF4J'... Found: (2013 EJ154).




---------------------------------
REBOUND version:     	3.12.1
REBOUND built on:    	Mar 14 2020 16:43:01
Number of particles: 	6
Selected integrator: 	mercurius
Simulation time:     	1.0000000000000000e+05
Current timestep:    	0.100000
---------------------------------
<rebound.Particle object, m=1.0 x=-0.001962833101500212 y=0.007656781240012263 z=2.183275309101057e-05 vx=-0.0028796061033907056 vy=0.000237028114926568 vz=7.393244200386153e-05>
<rebound.Particle object, m=0.0009547919152112404 x=-1.2585531676262454 y=-5.145873816680648 z=0.04714476975015494 vx=2.601659934084253 vy=-0.7180348559000759 vz=-0.05302404666678661>
<rebound.Particle object, m=0.0002858856727222417 x=7.503547575836432 y=-6.713111210727221 z=-0.21818183049830997 vx=1.3539882201902362 vy=1.3654584970477095 vz=-0.0741682373727811>
<rebound.Particle object, m=4.36624373583127e-05 x=12.224362025595719 y=15.216310406076719 z=-0.43947379221033056 vx=-1.0754830579656713 vy=0.912728070755838 vz=-0.005944009305805339>