# Star Classification Using Color Values

This notebook demonstrates a modular framework for extracting, analyzing, and classifying stars from `.fits` observation files using color indices and machine learning. The approach is inspired by previous research on blue objects and subdwarfs.

## 1. Import Required Libraries

We will use `astropy` for FITS file handling, `numpy` and `pandas` for data manipulation, and `scikit-learn` for machine learning.

In [None]:
import numpy as np
import pandas as pd
from astropy.io import fits
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt

## 2. Data Extraction

Define a function to extract relevant columns (color, magnitude, errors, etc.) from a FITS file and return a pandas DataFrame.

In [None]:
def extract_fits_data(fits_path, columns):
    """
    Extract specified columns from a FITS file into a pandas DataFrame.
    """
    with fits.open(fits_path) as hdul:
        data = hdul[1].data
        df = pd.DataFrame({col: data[col] for col in columns})
    return df

## 3. Data Cleaning and Feature Engineering

Remove rows with missing or out-of-range values, and compute color indices as features for classification.

In [None]:
def clean_and_engineer(df, color_col, mag1_col, mag2_col):
    """
    Clean the DataFrame and compute color indices.
    """
    mask = (
        (~np.isnan(df[color_col])) &
        (~np.isnan(df[mag1_col])) &
        (~np.isnan(df[mag2_col])) &
        (df[mag1_col] > 10) & (df[mag1_col] < 25) &
        (df[mag2_col] > 10) & (df[mag2_col] < 25) &
        (df[color_col] > -3) & (df[color_col] < 4)
    )
    df_clean = df[mask].copy()
    # Example: Add more color indices if available
    if 'gab' in df_clean.columns and 'rab' in df_clean.columns:
        df_clean['g-r'] = df_clean['gab'] - df_clean['rab']
    return df_clean

## 4. Example: Load and Prepare Data

Specify the FITS file and columns of interest. Adjust these as needed for your dataset.

In [None]:
# Example configuration (edit as needed)
fits_file = '/home/osa/Dropbox/PhD/Work/201305-New_colours/int3-8eso2.fits'
columns = ['RUN', 'FIELD', 'gab', 'uab', 'rab', 'gaberr', 'uaberr', 'raberr', 'uming_ab', 'gminr_ab']
color_col = 'gminr_ab'  # or 'uming_ab'
mag1_col = 'gab'
mag2_col = 'rab'

df = extract_fits_data(fits_file, columns)
df_clean = clean_and_engineer(df, color_col, mag1_col, mag2_col)
df_clean.head()

## 5. Prepare Features and Labels

In [None]:
# Placeholder: Random labels for demonstration (replace with real labels)
np.random.seed(42)
df_clean['label'] = np.random.choice(['He-sdO', 'sdB', 'DA', 'QSO', 'sdO', 'binary', 'other'], size=len(df_clean))

# Features: Use color indices and magnitudes
feature_cols = [color_col, 'g-r'] if 'g-r' in df_clean.columns else [color_col]
X = df_clean[feature_cols].values
y = df_clean['label'].values

## 6. Train/Test Split

Split the data for training and testing the classifier.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

## 7. Machine Learning Classification

Train a Random Forest classifier and evaluate its performance.

In [None]:
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))

## 8. Visualize Color-Magnitude Diagram

Plot the color-magnitude diagram, coloring by predicted class.

In [None]:
plt.figure(figsize=(8,6))
scatter = plt.scatter(
    df_clean[color_col], df_clean[mag1_col], 
    c=pd.factorize(df_clean['label'])[0], cmap='tab10', alpha=0.7
)
plt.xlabel(color_col)
plt.ylabel(mag1_col)
plt.title('Color-Magnitude Diagram (colored by class)')
plt.gca().invert_yaxis()
plt.legend(*scatter.legend_elements(), title="Class")
plt.show()

## 9. Predict Star Types for New Data

Use the trained classifier to predict star types for new, unlabeled data.

In [None]:
def predict_star_types(new_df, clf, feature_cols):
    """
    Predict star types for new data using the trained classifier.
    """
    X_new = new_df[feature_cols].values
    return clf.predict(X_new)

# Example usage:
# predictions = predict_star_types(new_df, clf, feature_cols)