# Pulsar detection 

## Introduction
### Background information

A pulsar is a celestial body which emits precise amounts of electromagnetic radiation. It is highly sought after by astrophysicists as it is the only place where they can observe matter indirectly at nuclear density. 



### Question?


Can we build an accurate classifier to distinguish pulsars from other celestial bodies so?



### Dataset

The utilized dataset is [HTRU2 Data Set](https://archive.ics.uci.edu/ml/datasets/HTRU2) collected by Dr Robert Lyon, University of Manchester. 

In [38]:
############################### Preliminary exploratory data analysis ######################################
############################################################################################################
############################################################################################################
############################################################################################################
############################################################################################################
############################################################################################################


# loading dependencies
library(tidyverse)
library(tidymodels)

######################################### 1 & 2 - showing that the data can be read and tidying the data
############################################################################################################



# loading the data
pulsarData <- read_csv("HTRU_2.csv", col_names = FALSE)

colnames(pulsarData) <- c("MeanIntegratedprofile", # X1
                           "SdIntegratedProfile",   # X2
                           "ExcessKurtosisIntegratedProfile", # X3
                           "SkewnessIntegratedProfile", #X4
                           "MeanDM_SNRcurve", # X5
                           "SdDM_SNRcurve",  #X6
                           "ExcessKurtosisDM_SNRcurve", #X7
                           "SkewnessDM_SNRcurve", #X8
                           "Class") #X9
pulsarData = mutate(pulsarData, Class = as.factor(Class))



########################################## Dividing the data into training data and validating data
############################################################################################################

pulsarSplit <- initial_split(pulsarData, prop = 0.75, strata = Class)
pulsarTrain <- training(pulsarSplit)
pulsarTesting <- testing(pulsarSplit)

########################################## 3 - summarize the data in at least one table
############################################################################################################


# Chosen features: Mean, SD, ExcessKurtosis and Skrewness of DM_SNRcurve (Column 5, 6, 7, 8 respectively)

pulsarTrain %>% group_by(Class) %>% summarize(Count = n(), 
                                              IQRMeanDM_SNRcurve = IQR(MeanDM_SNRcurve), 
                                              MedianSdDM_SNRcurve = mad(SdDM_SNRcurve), 
                                              MeanExcessKurtosisDM_SNRcurve = mean(ExcessKurtosisDM_SNRcurve),
                                              MeanSkewnessDM_SNRcurve = mean(SkewnessDM_SNRcurve))


########################################## 4 - visualize the data (TO BE COMPLETED)
############################################################################################################


# plot(select(pulsarTrain, 5,7))
# plot(select(pulsarTrain, 5,6))
# plot(select(pulsarTrain, 7,8))


[1mRows: [22m[34m17898[39m [1mColumns: [22m[34m9[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[32mdbl[39m (9): X1, X2, X3, X4, X5, X6, X7, X8, X9

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Class,Count,IQRMeanDM_SNRcurve,MedianSdDM_SNRcurve,MeanExcessKurtosisDM_SNRcurve,MeanSkewnessDM_SNRcurve
<fct>,<int>,<dbl>,<dbl>,<dbl>,<dbl>
0,12172,2.394858,6.446277,8.860264,113.5122
1,1251,66.051839,19.612766,2.771868,18.456


## Methods (TO BE COMPLETED)
### Explain how you will conduct either your data analysis and which variables/columns you will use (MUST BE RECHECKED)
Classification will be used in to determine if radiation detected from a celestial body can be categorized as a pulsar. The data from DM_SNR curve (which is the signal to noise ratio will be used). ***The columns to be used are the following: MeanDM_SNRcurve, SdDM_SNRcurve, ExcessKurtosisDM_SNRcurve, SkewnessDM_SNRcurve.***

### Describe at least one way that you will visualize the results (TO BE COMPLETED)


## Expected outcomes and significance:
### What impact could such findings have?
It will benefit the study of physics and enhance the navigation of the universe using the
radiation emitted from the detector, and later can be used to calculate the precise
space position.

### What future questions could this lead to?
It can contribute to space travelling in the future; future scientists can examine this data to enhance the research of precise outer space distance for exploration.