# 1. Introduction

The goal of this project is to apply machine learning techniques to the task of classifying astronomical objects using real observational data from the Sloan Digital Sky Survey (SDSS). SDSS is one of the largest and most influential sky surveys ever conducted, providing high-quality photometric and spectroscopic measurements for millions of celestial objects. Because galaxy classification plays a central role in astrophysics—supporting studies of galaxy evolution, morphology, and cosmology—this dataset offers an excellent opportunity to explore the effectiveness of modern machine learning algorithms.

In this project, I focus on classifying objects based on their photometric features, such as their measured brightness in the *u, g, r, i,* and *z* filters, right ascension, declination, and redshift. The primary task is to determine whether an object belongs to one of several classes provided in the dataset (e.g., **GALAXY**, **STAR**, or **QSO**), using these physical measurements as input features.

To solve this problem, I build a complete machine learning pipeline that includes data exploration, preprocessing, feature scaling, model training, and performance evaluation. I compare the performance of two different machine learning algorithms—**Logistic Regression** and a **Random Forest Classifier**—to determine which approach better captures the patterns in the data. The results of these models are analyzed and discussed in detail, along with insights about why certain models perform better than others.

This project demonstrates practical experience with the machine learning workflow, including handling real-world data, training multiple models for the same task, evaluating model performance, and interpreting results within the context of a scientific domain.

# 2. Imports


In [3]:
# Core libraries
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Sklearn utilities
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.metrics import(accuracy_score, confusion_matrix)

# ML Algorithms (pick at least two)
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

# Optional: dimensionality reduction / feature visualization
from sklearn.decomposition import PCA

# Misc
import warnings
warnings.filterwarnings("ignore")
