## Predicting Pulsars: Mean, Standard Deviation and Kurtosis Analysis

**Logan Chan, Sam Donato, Navneet Bedi, Ahadjon Sultonov**

### Introduction

Pulsars were first discovered in 1967 by astronomers Jocelyn Bell Burnell and Antony Hewish (American Physical Society, 2006). 

The pulsar is a highly magnetized rotating neutron star which cause them to emit beams of radiation, that provide the first indirect evidence for the existence of gravitational waves. Pulsar stars also have the potential to reveal extreme phenomena in neutron star astrophysics (Zhang et al., 2020). 

Thus, in effect, pulsars can be thought of as 'cosmic lighthouses.'

These beams can appear to pulse as the star rotates, but other astronomical phenomena in space can mimic these pulsar signals, which we call spurious signals. The spurious signals can be challenging to identify and seperate from pulsar signals (Gaskill, 2020). 

The goal of this project will be to use variables from the UC Irvine Machine Learning Repository Pulsar Star Dataset to classify whether a star is pulsar or not.

The question we will be addressing is: **Given the mean, standard deviation, and excess kurtosis of the integrated profile, can we predict if a star is pulsar or if it is a spurious signal?**

### Preliminary Data Analysis

In [18]:
## Load libraries
library(tidyverse)
library(repr)
library(tidymodels)

set.seed(18)

## Download dataset from the internet
url <- "https://raw.githubusercontent.com/loganchan26/DSCI-100-project-group10-18/03a96760928865bfb961dc8ea308d3b129a82baf/HTRU_2.csv"
download.file(url, "data/pulsar_data.csv")
pulsar_data <- read_csv("data/pulsar_data.csv")

## rename columns and select columns of interest (no more tidying needed?)
colnames(pulsar_data) <- c("mean_ip", "std_dev_ip", "kurtosis_ip", "skew_ip", "mean_curve", "std_dev_curve", "kurtosis_curve", "skew_curve", "type") 
pulsar_data_selected <- pulsar_data |> select("mean_ip", "std_dev_ip", "kurtosis_ip", "type")
#glimpse(pulsar_data_selected)

## split set into training and testing
pulsar_split <- initial_split(pulsar_data_selected, prop = 0.75, strata = type)
pulsar_training <- training(pulsar_split)
pulsar_testing  <- testing(pulsar_split)

##summarize data into table(s) - mean of each obs type

## visualize data into scatterplot - distribution of each variable


[1mRows: [22m[34m17897[39m [1mColumns: [22m[34m9[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[32mdbl[39m (9): 140.5625, 55.68378214, -0.234571412, -0.699648398, 3.199832776, 19....

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Rows: 17,897
Columns: 4
$ mean_ip     [3m[90m<dbl>[39m[23m 102.50781, 103.01562, 136.75000, 88.72656, 93.57031, 119.4…
$ std_dev_ip  [3m[90m<dbl>[39m[23m 58.88243, 39.34165, 57.17845, 40.67223, 46.69811, 48.76506…
$ kurtosis_ip [3m[90m<dbl>[39m[23m 0.46531815, 0.32332837, -0.06841464, 0.60086608, 0.5319048…
$ type        [3m[90m<dbl>[39m[23m 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…


### Bibliography

Cheng Jun Zhang, Zhen Hong Shang, Wan Min Chen, Liu Xie, Xiang Hua Miao, A Review of Research on Pulsar Candidate Recognition Based on Machine Learning, Procedia Computer Science, Volume 166, 2020, Pages 534-538, ISSN 1877-0509, https://doi.org/10.1016/j.procs.2020.02.050.

https://www.aps.org/publications/apsnews/200602/history.cfm#:~:text=February%201968%3A%20The%20Discovery%20of,signal%20from%20an%20extraterrestrial%20civilization

https://phys.org/news/2020-06-future-space-cosmic-lighthouses.html 