# SDSS DR18 Galaxy Classification
### STARFORMING vs STARBURST Prediction using Machine Learning


# 1. Introduction

Galaxies exhibit a wide range of physical properties, evolutionary stages, and star-formation behaviors. Among these, two important categories are **starforming** galaxies—those forming stars at a steady, sustained rate—and **starburst** galaxies, which undergo brief but extremely intense episodes of star formation. Understanding the differences between these galaxy types provides insight into the physical processes that influence galaxy evolution, including gas accretion, mergers, and environmental interactions.

In this project, I apply machine learning techniques to classify galaxies as either **STARFORMING** or **STARBURST** using photometric and structural data from the Sloan Digital Sky Survey Eighteenth Data Release (SDSS DR18). The dataset contains 100,000 galaxy observations with features such as sky coordinates, photometric magnitudes, band-specific axis ratios, redshift, and other SDSS-derived measurements.

The objective of this work is to build an end-to-end machine learning pipeline that includes exploratory data analysis, preprocessing, feature scaling, model development, and performance evaluation. Two supervised learning models are implemented—**Logistic Regression** and a **Random Forest Classifier**—to determine how well galaxy subclasses can be predicted from their observable properties. This project demonstrates the use of machine learning as a scientific analysis tool and evaluates which modeling approach provides the best performance for galaxy subclassification.


# 2. Imports


In [2]:
# Core libraries
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Sklearn utilities
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, ConfusionMatrixDisplay

# ML Algorithms
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

# Misc
import warnings
warnings.filterwarnings("ignore")


# 2. Data Loading

In [3]:
df = pd.read_csv("../data/sdss_100k_galaxy_form_burst.csv")
df.head()


Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,#Table1
objid,specobjid,ra,dec,u,g,r,i,z,modelFlux_u,modelFlux_g,modelFlux_r,modelFlux_i,modelFlux_z,petroRad_u,petroRad_g,petroRad_i,petroRad_r,petroRad_z,petroFlux_u,petroFlux_g,petroFlux_i,petroFlux_r,petroFlux_z,petroR50_u,petroR50_g,petroR50_i,petroR50_r,petroR50_z,psfMag_u,psfMag_r,psfMag_g,psfMag_i,psfMag_z,expAB_u,expAB_g,expAB_r,expAB_i,expAB_z,class,subclass,redshift,redshift_err
1237646587710669400,8175185722644649984,82.0386790197966,0.847177136346427,21.73818,20.26633,19.32409,18.64037,18.23833,2.007378,7.82364,18.63581,34.98175,50.64961,2.969037,4.252946,3.101782,3.46188,3.071923,2.559197,8.499634,30.32594,17.24706,36.44688,1.984029,1.835038,1.438609,1.638081,1.289375,22.58631,20.752,21.66492,20.07646,19.43575,0.09995142,0.3118636,0.2893703,0.270588,0.1871822,GALAXY,STARFORMING,0.06774854,1.485608E-05
1237646588247540577,8175186822156277760,82.138894235229,1.06307163479155,20.66761,19.32016,18.67888,18.24693,18.04122,5.403369,18.70364,33.76298,50.25997,60.73625,2.186902,2.625105,2.678123,2.594866,3.16345,4.333604,18.41877,51.06515,33.32697,62.45336,1.069268,1.278203,1.284687,1.263937,1.318443,21.31284,19.67125,20.23801,19.19277,18.85012,0.3665494,0.5168757,0.5174466,0.5522967,0.6369656,GALAXY,STARFORMING,0.1051184,9.869399E-06
1237646588247540758,8175187097034184704,82.028510297136,1.10400342592331,23.63531,21.19671,19.92297,19.31443,18.68396,0.2956932,3.318924,10.73388,18.80136,33.58972,0.9917983,1.644824,1.801951,1.749696,3.059948,0.1653659,2.800386,17.09313,9.494298,51.73537,0.6636064,0.9471089,0.9957339,0.9873955,1.612933,23.92244,20.6616,21.83267,20.00731,19.42235,0.05,0.4171365,0.5069503,0.5498811,0.3701658,GALAXY,STARFORMING,0.2340893,2.968146E-05
1237648702973083853,332152325571373056,198.544469237915,-1.09705896364626,20.12374,18.4152,17.47202,17.05297,16.72423,8.920645,43.04474,102.6101,150.9426,204.3161,6.625083,4.719598,4.494591,4.777463,4.636094,12.33053,42.82957,149.6309,105.7445,203.8816,3.160263,2.093415,2.023142,2.156205,2.035692,21.34938,18.7764,19.75832,18.38868,18.03204,0.3107628,0.3568271,0.3893448,0.3881598,0.4166596,GALAXY,STARFORMING,0.110825,3.046765E-05
