# 1D Convolutional Neural Networks (CNNs) Feature Extraction

Hyperspectral imaging is a powerful tool for remote sensing and has a wide range of applications. However, hyperspectral data is typically high-dimensional, making it challenging to extract meaningful information and patterns.

In this notebook, we will explore how 1D CNNs can be used to extract features from hyperspectral data, which can then be used for downstream regression tasks. We will walk through the process of loading and preprocessing the data, building a 1D CNN, and performing feature extraction using the trained model.

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, Flatten, Dense, MaxPooling1D

In [3]:
def load_data(csv_path, feature_col_start, feature_col_end, target_col):
    """
    Load a CSV file into a Pandas DataFrame,drop Nan, and separate the feature and target columns.

    Parameters:
        csv_path (str): Path to the CSV file to load.
        feature_col_start, feature_col_end, (ints): Range of column indices to use as features.
        target_col (str or int): Name or index of the column to use as target.

    Returns:
        new_df: A df containing the features + labels DataFrame.
    """
    # Load CSV into a Pandas DataFrame
    df = pd.read_csv(csv_path)

    # drop nan
    df = df.dropna()

    # Extract the feature and target columns
    new_df = df[df.columns[feature_col_start: feature_col_end]]
    new_df[target_col] = df[target_col]

    return new_df

In [22]:
def feature_selection_1d_conv(df, target_col,K, kernel_size=3, filters=64, epochs=10, test_size=0.2):
    '''
    performs feature selection using a 1D convolutional neural network (CNN) on a given dataset
     Parameters:
      --------
        df: a pandas DataFrame containing the input features and target variable
        target_col: the name of the target variable column in the DataFrame.
        K: the number of selected features to retain after feature selection.
        kernel_size: the size of the kernel used in the convolutional layers.
        filters: the number of filters to use in the convolutional layers.
        epochs: the number of epochs to train the CNN.
        test_size: the fraction of data to use as the test set for splitting the data into training and testing sets.
     
     Returns:
     --------
        the new DataFrame containing the selected features and the target variable.
    '''
    
    # Split the data into training and testing sets
    X = df.drop(columns=[target_col]).values
    y = df[target_col].values
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)

    # Create the 1D convolutional neural network
    model = Sequential()
    # First convolutional layer
    model.add(Conv1D(filters=filters, kernel_size=kernel_size, activation='relu', input_shape=(X_train.shape[1], 1)))
    model.add(MaxPooling1D(pool_size=2))

    # Second convolutional layer
    model.add(Conv1D(filters=filters, kernel_size=kernel_size, activation='relu'))
    model.add(MaxPooling1D(pool_size=2))

    # Flatten layer to prepare for fully connected layers
    model.add(Flatten())

    # Fully connected layers and output layer ok K features
    model.add(Dense(units=K, activation='linear'))
    model.compile(optimizer='adam', loss='mse', metrics=['mse'])

    # Fit the model to the training data
    model.fit(X_train.reshape(X_train.shape[0], X_train.shape[1], 1), y_train, epochs=epochs, verbose=0)

    # Use the model to transform the data
    new_X_train = model.predict(X_train.reshape(X_train.shape[0], X_train.shape[1], 1))
    new_X_test = model.predict(X_test.reshape(X_test.shape[0], X_test.shape[1], 1))

    # Combine the transformed data with the target column to create the new DataFrame
    new_X_train = pd.DataFrame(np.concatenate([new_X_train, y_train.reshape(-1, 1)], axis=1), )
    new_X_test = pd.DataFrame(np.concatenate([new_X_test, y_test.reshape(-1, 1)], axis=1), )

    # # Concatenate the training and testing DataFrames
    new_df = pd.concat([new_X_train, new_X_test], axis=0)
    new_df = new_df.reset_index(drop=True)  # reset the index of the new DataFrame

    # create a list of new column names using string.ascii_uppercase
    new_cols = ['Feature_' + str(i+1) for i in range(len(new_df.columns)-1)] + ['A']

    # rename the columns using the new_cols list
    new_df = new_df.rename(columns=dict(zip(new_df.columns, new_cols)))

    return new_df

## Example

In [4]:
# Define input parameters
csv_path = '/content/data.csv'
feature_idx_i,feature_idx_f = 16,-2 # columns index of features
target_col = 'A' # labael column (regression)

# Parameters of 1D Convolution
K = 10 # Number of final features

In [5]:
# Load data
data = load_data(csv_path, feature_idx_i,feature_idx_f, target_col)
data.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df[target_col] = df[target_col]


Unnamed: 0,397.32,400.2,403.09,405.97,408.85,411.74,414.63,417.52,420.4,423.29,...,978.88,981.96,985.05,988.13,991.22,994.31,997.4,1000.49,1003.58,A
0,0.179808,0.152106,0.129191,0.115715,0.107613,0.102074,0.101501,0.099727,0.096248,0.096929,...,0.458213,0.464172,0.45852,0.462214,0.467727,0.467549,0.466043,0.471523,0.447471,2.01727
1,0.221156,0.186298,0.160032,0.146194,0.136323,0.128331,0.124891,0.12185,0.116359,0.114495,...,0.71797,0.717748,0.722268,0.726763,0.738159,0.741649,0.739217,0.762054,0.622104,1.872474
2,0.221893,0.185626,0.164002,0.154074,0.146511,0.137888,0.133002,0.13092,0.128935,0.126446,...,0.670528,0.675308,0.669332,0.689363,0.685825,0.698885,0.689815,0.705207,0.580815,2.043818
3,0.162126,0.129779,0.104428,0.089685,0.080833,0.075142,0.068085,0.063978,0.058188,0.054447,...,0.57067,0.574177,0.580435,0.579218,0.582644,0.592902,0.597743,0.609343,0.480618,2.123489
4,0.206857,0.164631,0.137415,0.118823,0.102912,0.09785,0.090029,0.084146,0.07765,0.072445,...,0.602451,0.609186,0.624415,0.62275,0.633371,0.64097,0.649146,0.659158,0.5361,2.122085


In [23]:
New_features_df = feature_selection_1d_conv(data, target_col,K, kernel_size=3, filters=64, epochs=10, test_size=0.2)
New_features_df



Unnamed: 0,Feature_1,Feature_2,Feature_3,Feature_4,Feature_5,Feature_6,Feature_7,Feature_8,Feature_9,Feature_10,A
0,3.170995,3.161355,3.161634,3.134616,3.162695,3.173879,3.138194,3.165236,3.165291,3.150519,2.294082
1,2.589892,2.617995,2.599960,2.624258,2.602822,2.593054,2.620603,2.605740,2.606247,2.616777,4.365390
2,2.709579,2.717774,2.708848,2.705012,2.707155,2.711651,2.701620,2.705066,2.711246,2.710224,2.004546
3,3.255538,3.251079,3.262606,3.259015,3.262858,3.255385,3.262632,3.269674,3.257873,3.257378,4.180234
4,3.376766,3.350069,3.376165,3.347734,3.365410,3.373400,3.352762,3.369678,3.370341,3.353174,1.710769
...,...,...,...,...,...,...,...,...,...,...,...
608,2.330210,2.352972,2.336530,2.359640,2.340025,2.332753,2.351664,2.332545,2.341273,2.348506,1.749088
609,3.028935,3.011060,3.025206,2.999045,3.022367,3.030542,2.999208,3.022764,3.020936,3.009901,3.620706
610,3.797003,3.786373,3.803708,3.782094,3.787907,3.800539,3.790735,3.797657,3.799232,3.788066,7.607726
611,2.462150,2.481224,2.465022,2.480001,2.471847,2.464758,2.475043,2.469917,2.471386,2.477748,4.216203
