# Notebook 3: Classification Using Machine Learning / Clasificación mediante aprendizaje automático

In this notebook, we will use a classical machine learning method to classify the astronomical data saved in the previous notebook.


 <hr style="border:2px solid gray">
 
En este Jupyter notebook, utilizaremos un método clásico de aprendizaje automático para clasificar los datos astronómicos guardados en el Jupyter notebook anterior.


---

### Reading the data /  Lectura de los datos

First, we'll load the saved image and label data from the NumPy files.


 <hr style="border:2px solid gray">
 
 
En primer lugar, cargaremos la imagen guardada y los datos de la etiqueta de los archivos NumPy.


In [None]:
import numpy as np  # Importing NumPy for numerical operations and array handling

# Load the training images and labels back from the saved NumPy files
train_images = np.load('train_images.npy')  # Load image training data
train_labels = np.load('train_labels.npy')  # Load label training data


print("Training Data loaded successfully from NumPy files.")

In [None]:
print(train_images.shape)

In [None]:
train_labels.shape

---

## Pre-processing / Pre-procesamiento

Now, we will further pre-process the trainig images to simplify the data.

 <hr style="border:2px solid gray">
 
A continuación, preprocesaremos las imágenes de entrenamiento para simplificar los datos.


### Normalisation / Normalización


In [None]:
# Step 1: Normalize the training data

# Scaling pixel values to be between 0 and 1
train_images_pre = train_images.astype('float32') / 255.0

##### **⚠️ Freeing up Space** / **⚠️ Liberar espacio**

In [None]:
import gc

# Since we will no longer need the original training data (train_images), we can remove it from memory
del train_images

# Force garbage collection to free up memory
gc.collect()

print("train_images removed from memory.")

### Grayscaling / Escala de grises

In [None]:
#Step 2: Convert images to grayscale, transforming them from RGB (3 channels) to a single channel (grayscale).

from skimage.color import rgb2gray

# Apply grayscale conversion to each image
train_images_pre = np.array([rgb2gray(image) for image in train_images_pre])

### Downscaling / Reducción de escala

In [None]:
# Step 3: Downscale images to 64x64 pixels from 512x512

from skimage.transform import resize

# Resize images to 64x64 pixels
train_images_pre = np.array([resize(image, (64, 64), anti_aliasing=True) for image in train_images_pre])

### Now, Visualizing the 5 classes after pre-processing /  Ahora, Visualizando las 5 clases después del pre-procesamiento

In [None]:
print(train_labels)

In [None]:
import matplotlib.pyplot as plt
import numpy as np  # Ensure NumPy is imported

# Set the random seed for reproducibility
np.random.seed(42)

# Class names as a list
class_names = ['Blurry', 'Corrupt', 'Missing Data', 'Noisy', 'Priority']

# Number of images per class to display
num_images_per_class = 5

# Prepare the figure with appropriate size
fig, axes = plt.subplots(nrows=5, ncols=num_images_per_class, figsize=(15, 10))

for class_index, class_name in enumerate(class_names):
    # Get indices of images belonging to the current class
    indices = np.where(train_labels == class_index)[0]
    # Randomly select image indices using the fixed seed
    selected_indices = np.random.choice(indices, size=num_images_per_class, replace=False)
    for i, img_index in enumerate(selected_indices):
        # Get the corresponding image
        img = train_images_pre[img_index]
        # Access the appropriate axes
        ax = axes[class_index, i]
        # Display the image
        ax.imshow(img, cmap='gray')
        # Turn off axis ticks and labels
        ax.set_xticks([])
        ax.set_yticks([])
        # Remove the frame
        for spine in ax.spines.values():
            spine.set_visible(False)
    # Set the class label on the left of the row
    axes[class_index, 0].set_ylabel(class_name, rotation=90, size='large', labelpad=20, va='center')

# Adjust the subplot parameters to make room for the labels
plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1, wspace=0.05, hspace=0.05)

plt.show()

After reviewing the pre-processed images, it appears that reducing the size from 512x512 to 64x64 may not have been the best choice. Visually, it has become more challenging to distinguish between the **Priority**, **Noisy**, and **Blurry** categories. Now, let’s apply a machine learning method to classify these images.



 <hr style="border:2px solid gray">
 
Tras revisar las imágenes preprocesadas, parece que reducir el tamaño de 512x512 a 64x64 puede no haber sido la mejor elección. Visualmente, se ha vuelto más difícil distinguir entre las categorías **Priority**, **Noisy** y **Blurry**. Ahora, apliquemos un método de aprendizaje automático para clasificar estas imágenes.



---

## ML Classification / Clasificación por aprendizaje automático

In [None]:
# First, let us Flatten the training images
num_train_samples = train_images_pre.shape[0]
train_images_pre = train_images_pre.reshape(num_train_samples, -1)

### Train the Stochastic Gradient Descent (SGD) Model / Entrenar el modelo de descenso gradiente estocástico (SGD)

The Stochastic Gradient Descent (SGD) model refers to algorithms that use the stochastic gradient descent optimization method to train machine learning models. The SGDClassifier is a powerful tool for training linear models efficiently, especially when dealing with large datasets. Its speed comes from updating model parameters incrementally using individual samples or small batches, significantly reducing the computational overhead per iteration compared to traditional gradient descent methods.


 <hr style="border:2px solid gray">
 
 
El modelo de Descenso Gradiente Estocástico (SGD) hace referencia a los algoritmos que utilizan el método de optimización de descenso gradiente estocástico para entrenar modelos de aprendizaje automático. El SGDClassifier es una potente herramienta para entrenar modelos lineales de forma eficiente, especialmente cuando se trata de grandes conjuntos de datos. Su velocidad procede de la actualización incremental de los parámetros del modelo utilizando muestras individuales o pequeños lotes, lo que reduce significativamente la sobrecarga computacional por iteración en comparación con los métodos tradicionales de descenso de gradiente.


In [None]:
from sklearn.linear_model import SGDClassifier
import pickle

# Create the stochastic gradient descent model

sgd_model = SGDClassifier( loss='log_loss', max_iter=10000, n_jobs=4, random_state=42)

# Fit the model on the training data
sgd_model.fit(train_images_pre, train_labels)

# Save the model to a file
with open('sgd_model.pkl', 'wb') as file:
    pickle.dump(sgd_model, file)

---

##### **⚠️ Freeing up Space** / **⚠️ Liberar espacio**


In [None]:
import gc

# Remove the data from memory
del train_images_pre, train_labels

# Force garbage collection to free up memory
gc.collect()

print("Data removed from memory.")