## DSCI 100 Project Proposal: Classifying star category using temperature, luminosity, radius, and absolute magnitude as predictors.

### Introduction

&emsp;&emsp; In the celestial realm, stars are crucial celestial entities, each characterized by unique spectral and physical attributes. Contrary to ancient methods of classification that utilized constellations and positions in the sky, scientific reasoning in modern astrophysics seeks a unified understanding. Being such complex bodies, classifying the various star types can become ambiguous, thus demanding rigorous analysis of the various multifaceted characteristics through quantifiable values.

**Question:** Can we successively predict the star type based on data including temperature, luminosity, radius and absolute magnitude? 

**Dataset Description**

&emsp;&emsp; For the purpose of the project we will be using the Star Classification dataset provided by the YBI Foundation on [Kaggle](https://www.kaggle.com/code/ybifoundation/stars-classification). The data contains the following variables: star absolute temperature(in K), relative luminosity (L/Lo), relative radius (R/Ro), absolute magnitude (Mv), color, spectral class, and star type. Not all of these variables serve a purpose as many are classifications not predictors, more information on our variable selection in the **methods** section.


### Preliminary exploratory data analysis

**Setting Up Libraries and Parameters**

In [25]:
# Run this first.
library(tidyverse)
library(tidymodels)

# Importing data
dataset_url <- "https://raw.githubusercontent.com/YBIFoundation/Dataset/main/Stars.csv"

**Loading and tidying data**

In [33]:
# loading and tidying
star_raw_data <- read_csv(dataset_url) 

star_data <- star_raw_data |>
    rename(temperature = "Temperature (K)",
           luminosity = "Luminosity (L/Lo)",
           radius = "Radius (R/Ro)",
           absolute_magnitude = "Absolute magnitude (Mv)",
           star_type = "Star type",
           star_category = "Star category",
           star_colour = "Star color",
           spectral_class = "Spectral Class") |>
    select(temperature:absolute_magnitude,star_category)

star_data

[1mRows: [22m[34m240[39m [1mColumns: [22m[34m8[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): Star category, Star color, Spectral Class
[32mdbl[39m (5): Temperature (K), Luminosity (L/Lo), Radius (R/Ro), Absolute magnitu...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


temperature,luminosity,radius,absolute_magnitude,star_category
<dbl>,<dbl>,<dbl>,<dbl>,<chr>
3068,0.002400,0.17000,16.120,Brown Dwarf
3042,0.000500,0.15420,16.600,Brown Dwarf
2600,0.000300,0.10200,18.700,Brown Dwarf
2800,0.000200,0.16000,16.650,Brown Dwarf
1939,0.000138,0.10300,20.060,Brown Dwarf
2840,0.000650,0.11000,16.980,Brown Dwarf
2637,0.000730,0.12700,17.220,Brown Dwarf
2600,0.000400,0.09600,17.400,Brown Dwarf
2650,0.000690,0.11000,17.450,Brown Dwarf
2700,0.000180,0.13000,16.050,Brown Dwarf


### Methods