# Classifying Pulsars from the High Time Resolution Universe Survey (HTRU2) - Model Analysis & Conclusions

## Overview & Citation

In this code notebook, we analyze the performance of five machine learning algorithms  used to to classify pulsars from the High Time Resolution Universe Survey, South (HTRU2). The five models were:
* Multiple Logistic Regression
* Decision Tree Classification
* Random Forest Classification
* Support Vector Machine (SVM) Classification
* Deep Neural Network (DNN) Classification

The dataset was retrieved from the UC Irvine Machine Learning Repository at the following link: https://archive.ics.uci.edu/ml/datasets/HTRU2#. The dataset was donated to the UCI Repository by Dr. Robert Lyon of The University of Manchester, United Kingdom. The two papers requested for citation in the description are listed below:

* R. J. Lyon, B. W. Stappers, S. Cooper, J. M. Brooke, J. D. Knowles, Fifty Years of Pulsar Candidate Selection: From simple filters to a new principled real-time classification approach, Monthly Notices of the Royal Astronomical Society 459 (1), 1104-1123, DOI: 10.1093/mnras/stw656
* R. J. Lyon, HTRU2, DOI: 10.6084/m9.figshare.3080389.v1.

## Comparing the Classification Metrics

In [1]:
import numpy as np
import pandas as pd
pd.read_csv('2020_1127_Model_Comparisons_Data.csv')

Unnamed: 0,Model,Accuracy,Precision,Recall,F1-Score
0,DNN,0.98,0.9,0.87,0.88
1,Random_Forest,0.98,0.94,0.84,0.88
2,Decision_Tree,0.97,0.83,0.84,0.83
3,SVM,0.98,0.94,0.83,0.88
4,Logistic,0.98,0.94,0.81,0.87


## Analysis & Conclusions

The challenge of the dataset is to find the best model that correctly identifies the most pulsars. The pulsars comprise approximately only 10% of the dataset, so accuracy is not an appropriate statistic to use when comparing models. Since we want to be as confident as possible that the instances classified as pulsars actually *are* pulsars, our most important metric is recall.

**The deep neural network (DNN) model performed the best**, with a recall of 0.87, precision of 0.90, and F1-score of 0.88. The random forest model tied with the DNN model in terms of F1-score (to 2 significant figures), but it had a lower recall of 0.84 and higher precision of 0.94. Both models should be considered for future testing, and additional testing of the DNN should be run to further optimize its architecture.