# Stellar Classification Exploration

### This notebook is for exploring and visualizing the stars given by [this](https://www.kaggle.com/datasets/vinesmsuic/star-categorization-giants-and-dwarfs/data) Kaggle dataset

In [1]:
# Import the Pandas package
import pandas as pd

# Import the NumPy package
import numpy as np

### Here we're going to load in the data and explore the number of columns:

In [2]:
# Load a file as a DataFrame and assign to the variable name exo
stars = pd.read_csv("archive/Star99999_raw.csv")

# Counting number of rows (first number) and columns (second number)
stars.shape

(99999, 6)

In [3]:
# Exploring the first 5 rows 
stars.head()

Unnamed: 0.1,Unnamed: 0,Vmag,Plx,e_Plx,B-V,SpType
0,0,9.1,3.54,1.39,0.482,F5
1,1,9.27,21.9,3.1,0.999,K3V
2,2,6.61,2.81,0.63,-0.019,B9
3,3,8.06,7.75,0.97,0.37,F0V
4,4,8.55,2.87,1.11,0.902,G8III


### Explanation of the [variables](https://vizier.cds.unistra.fr/viz-bin/VizieR-3) we care about

**V_mag** = photometric magnitude aka [apparent magnitude](https://www.britannica.com/science/photometry-astronomy)

**Plx** = [trigonometric parallax](https://astronomy.swin.edu.au/cosmos/T/Trigonometric+Parallax) in milli-arcsecs

**e_Plx** = standard error of parallax in milli-arcsecs

**B-V** = magnitude difference between optical B and optical V band

**SpType** = spectral type of the star (uses the [Morgan-Keenan](https://astronomy.swin.edu.au/cosmos/m/morgan-keenan+luminosity+class) MK classification)

In [4]:
# Make a copy of the dataset so we leave the raw file alone
star_copy = stars.copy()
star_copy.head()

Unnamed: 0.1,Unnamed: 0,Vmag,Plx,e_Plx,B-V,SpType
0,0,9.1,3.54,1.39,0.482,F5
1,1,9.27,21.9,3.1,0.999,K3V
2,2,6.61,2.81,0.63,-0.019,B9
3,3,8.06,7.75,0.97,0.37,F0V
4,4,8.55,2.87,1.11,0.902,G8III


In [5]:
# We can remove the first column completely as this is just to keep track of star 1, star 2, star 3, etc.
star_copy = star_copy[["Vmag", "Plx", "e_Plx", "B-V", "SpType"]]
star_copy.head()

Unnamed: 0,Vmag,Plx,e_Plx,B-V,SpType
0,9.1,3.54,1.39,0.482,F5
1,9.27,21.9,3.1,0.999,K3V
2,6.61,2.81,0.63,-0.019,B9
3,8.06,7.75,0.97,0.37,F0V
4,8.55,2.87,1.11,0.902,G8III


### Let's work on finding the absolute magnitude of these stars

The relationship between apparent magnitude and absolute magnitude is as follows:

$m - M = 5log \frac{d}{10}$

where "m" is apparent magnitude, "M" is absolute magnitude, and "d" is distance to the star from Earth in parseconds


The relationship between distance and parallax is:

$d = \frac{1}{p}$

where "p" is the trignometric parallax in arcseconds

### Before making our calculations, let's check the data type of the parallax column

In [6]:
# Check the first row of the parallax column
star_copy.iloc[0][1]


'   3.54'

In [7]:
# The type is currently a string, so we can't do numerical operations
type(star_copy.iloc[0][1])

str

In [8]:
# Need to remove empty spaces from string and convert to decimal number using float()

example = star_copy.iloc[0][1] # the original string cell value 
convert_num = "" # this will be our converted decimal number
count = 0

for i in range(len(example)):
    if example[i] != " ":
        convert_num += example[i]
        count += 1
    else:
        count += 1
        
convert_num = float(convert_num) # our string value is now a decimal number containing no spaces
print(convert_num)
print(type(convert_num))

3.54
<class 'float'>


Now let's build a function capable of iterating over the entire dataset and replace the old string values with numerical ones that don't contain empty spaces

In [None]:
# First let's convert all the parallax values to arcseconds - since it's currently in milli-arcseconds, divide by 1000