# **Customer Satisfaction Classification**

## **Project Description:**
This project demonstrates the application of machine learning techniques to predict customer satisfaction in the airline industry. 
The goal is to classify customer satisfaction levels based on features such as flight duration, baggage complaints, in-flight service scores, and overall satisfaction ratings using both a Support Vector Machine (SVM) classifier and a Neural Network model built with PyTorch.
The SVM classifier serves as a baseline model, while the PyTorch-based neural network explores the potential of deep learning to improve prediction accuracy.


## **Dataset:**
#### The dataset contains the following columns:
 ##### - Customer ID: Unique identifier for each customer.
 ##### - Airline: The airline that the customer flew with.
 ##### - Flight Duration (hrs): The duration of the flight in hours.
 ##### - Baggage Complaints: The number of baggage complaints filed by the customer.
 ##### - In-flight Services Score: A rating (1-5) of the services provided during the flight.
 ##### - Overall Satisfaction: The overall satisfaction rating (1-5) provided by the customer.
## **Key Steps:**
#### Data Preprocessing:
   - Handling missing values (if any).
   - Encoding categorical features (e.g., airline names).
   - Scaling numerical features to bring them to the same scale.
#### Model Building:
   - Splitting the data into training and testing sets.
   - Building an SVM model to classify customer satisfaction.
#### Model Evaluation:
   - Evaluating the model using accuracy, confusion matrix, classification report, and cross-validation.
#### Key Libraries Used:
   - pandas: For data manipulation and analysis.
   - scikit-learn: For machine learning algorithms and data preprocessing.
   - matplotlib & seaborn: For data visualization.
#### Results:
   - The SVM model was trained and evaluated on the dataset.
   - The classification report provides insights into precision, recall, and F1 score.
   - Accuracy metrics were computed for model evaluation.

### Import Libraries

In [135]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.metrics import classification_report, confusion_matrix , accuracy_score
from sklearn.model_selection import GridSearchCV

## Loading Data

In [88]:
data = pd.read_csv("Airlines.csv")

In [89]:
data.head()

Unnamed: 0,Customer ID,Airline,Flight Duration (hrs),Ticket Price (USD),Baggage Complaints,In-flight Services Score (1-5),Overall Satisfaction (1-5)
0,1,Air Algerie,15.0,238,2,1,2
1,2,Qatar Airways,12.8,773,3,3,2
2,3,Emirates,6.5,1203,4,2,1
3,4,Kuwait,10.8,1037,3,4,3
4,5,ITA,7.3,553,2,5,1


In [90]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 7 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Customer ID                     51 non-null     int64  
 1   Airline                         51 non-null     object 
 2   Flight Duration (hrs)           51 non-null     float64
 3   Ticket Price (USD)              51 non-null     int64  
 4   Baggage Complaints              51 non-null     int64  
 5   In-flight Services Score (1-5)  51 non-null     int64  
 6   Overall Satisfaction (1-5)      51 non-null     int64  
dtypes: float64(1), int64(5), object(1)
memory usage: 2.9+ KB


In [91]:
data.describe()

Unnamed: 0,Customer ID,Flight Duration (hrs),Ticket Price (USD),Baggage Complaints,In-flight Services Score (1-5),Overall Satisfaction (1-5)
count,51.0,51.0,51.0,51.0,51.0,51.0
mean,26.0,7.888235,778.392157,2.078431,3.117647,2.882353
std,14.866069,4.195219,388.183826,1.383375,1.336369,1.380537
min,1.0,1.2,201.0,0.0,1.0,1.0
25%,13.5,4.25,451.0,1.0,2.0,2.0
50%,26.0,7.3,633.0,2.0,3.0,3.0
75%,38.5,11.4,1115.5,3.0,4.0,4.0
max,51.0,15.0,1493.0,4.0,5.0,5.0


## Handling with missing data

In [92]:
data = data.ffill()

In [93]:
data.head()

Unnamed: 0,Customer ID,Airline,Flight Duration (hrs),Ticket Price (USD),Baggage Complaints,In-flight Services Score (1-5),Overall Satisfaction (1-5)
0,1,Air Algerie,15.0,238,2,1,2
1,2,Qatar Airways,12.8,773,3,3,2
2,3,Emirates,6.5,1203,4,2,1
3,4,Kuwait,10.8,1037,3,4,3
4,5,ITA,7.3,553,2,5,1


In [94]:
data = data.dropna()

In [95]:
data.head()

Unnamed: 0,Customer ID,Airline,Flight Duration (hrs),Ticket Price (USD),Baggage Complaints,In-flight Services Score (1-5),Overall Satisfaction (1-5)
0,1,Air Algerie,15.0,238,2,1,2
1,2,Qatar Airways,12.8,773,3,3,2
2,3,Emirates,6.5,1203,4,2,1
3,4,Kuwait,10.8,1037,3,4,3
4,5,ITA,7.3,553,2,5,1


In [96]:
duplicates = data.duplicated().sum()

In [97]:
print("\n Number of duplicates is: " , duplicates )


 Number of duplicates is:  0


###  Preprocess the Data

In [98]:
print(data.columns)


Index(['Customer ID', 'Airline', 'Flight Duration (hrs)', 'Ticket Price (USD)',
       'Baggage Complaints', 'In-flight Services Score (1-5)',
       'Overall Satisfaction (1-5)'],
      dtype='object')


In [99]:
X = data.drop('Overall Satisfaction (1-5)', axis=1)

In [100]:
y = data['Overall Satisfaction (1-5)']

In [101]:
categorical_cols = X.select_dtypes(include=['object']).columns
numerical_cols = X.select_dtypes(exclude=['object']).columns

In [102]:
numerical_pipeline = Pipeline(steps=[
    ('scaler', StandardScaler())  ])

In [103]:
categorical_pipeline = Pipeline(steps=[
    ('encoder', OneHotEncoder(handle_unknown='ignore')) 
])


In [104]:
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_pipeline, numerical_cols),
        ('cat', categorical_pipeline, categorical_cols)
    ])

### Create the SVM Model

In [105]:
svm_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('svm', SVC(kernel='linear')) 
])

###  Split the Data

In [106]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Train the Model

In [107]:
svm_pipeline.fit(X_train, y_train)

### Evaluate the model

In [108]:
y_pred = svm_pipeline.predict(X_test)

In [109]:
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

Accuracy: 18.75%


In [110]:
print('\nClassification Report:')
print(classification_report(y_test,  y_pred , zero_division=1))


Classification Report:
              precision    recall  f1-score   support

           1       0.50      0.33      0.40         3
           2       0.00      0.00      0.00         3
           3       0.50      0.50      0.50         4
           4       0.00      0.00      0.00         2
           5       0.00      0.00      0.00         4

    accuracy                           0.19        16
   macro avg       0.20      0.17      0.18        16
weighted avg       0.22      0.19      0.20        16



In [111]:
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

Confusion Matrix:
[[1 2 0 0 0]
 [0 0 1 0 2]
 [0 1 2 1 0]
 [0 1 1 0 0]
 [1 2 0 1 0]]


## PyTorch Model for Comparison

### Convert data to PyTorch tensors

In [136]:
label_encoder = LabelEncoder()

In [139]:
X_test['Airline'] = label_encoder.fit_transform(X_test['Airline'])

In [140]:
X_train = X_train.apply(pd.to_numeric, errors='coerce')
y_train = pd.to_numeric(y_train, errors='coerce')
X_train = X_train.fillna(0)
y_train = y_train.fillna(0)

In [141]:
X_test_numpy = X_test.to_numpy().astype('float32')
X_train_tensor = torch.from_numpy(X_train_numpy)

In [142]:
y_train_numpy = y_train.to_numpy().astype('int64')
y_train_tensor = torch.from_numpy(y_train_numpy)

In [144]:
X_train_tensor = torch.tensor(X_train.values, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.int64)
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.int64)

In [145]:
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)

### Define the neural network model

In [146]:
class SimpleNN(nn.Module):
    def __init__(self, input_size):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, 64)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, len(y_train.unique())) 
        
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x

In [147]:
input_size = X_train.shape[1]
model = SimpleNN(input_size)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [150]:
print(torch.unique(y_batch))

tensor([1, 2, 3, 4, 5])


In [152]:
epochs = 1000  
for epoch in range(epochs):
    model.train()
    for X_batch, y_batch in train_loader:
        optimizer.zero_grad()
        y_batch = y_batch - 1
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()
    print(f'[Epoch {epoch+1}/{epochs}], Loss: {loss.item():.4f}')

[Epoch 1/1000], Loss: 1.1598
[Epoch 2/1000], Loss: 0.5586
[Epoch 3/1000], Loss: 1.2614
[Epoch 4/1000], Loss: 1.2258
[Epoch 5/1000], Loss: 1.0383
[Epoch 6/1000], Loss: 0.9075
[Epoch 7/1000], Loss: 1.3768
[Epoch 8/1000], Loss: 0.6528
[Epoch 9/1000], Loss: 1.1266
[Epoch 10/1000], Loss: 1.6788
[Epoch 11/1000], Loss: 0.8429
[Epoch 12/1000], Loss: 1.4471
[Epoch 13/1000], Loss: 1.5924
[Epoch 14/1000], Loss: 2.0791
[Epoch 15/1000], Loss: 0.9123
[Epoch 16/1000], Loss: 1.4962
[Epoch 17/1000], Loss: 1.3479
[Epoch 18/1000], Loss: 1.3351
[Epoch 19/1000], Loss: 1.9201
[Epoch 20/1000], Loss: 1.0147
[Epoch 21/1000], Loss: 1.6239
[Epoch 22/1000], Loss: 1.2797
[Epoch 23/1000], Loss: 1.0624
[Epoch 24/1000], Loss: 0.8426
[Epoch 25/1000], Loss: 1.0917
[Epoch 26/1000], Loss: 0.6625
[Epoch 27/1000], Loss: 1.0681
[Epoch 28/1000], Loss: 1.0838
[Epoch 29/1000], Loss: 2.0818
[Epoch 30/1000], Loss: 1.1751
[Epoch 31/1000], Loss: 1.6542
[Epoch 32/1000], Loss: 0.5214
[Epoch 33/1000], Loss: 1.7159
[Epoch 34/1000], Lo

### Evaluate the model

In [153]:
model.eval()
with torch.no_grad():
    y_pred_tensor = model(X_test_tensor)
    y_pred = torch.argmax(y_pred_tensor, axis=1).numpy()

In [156]:
print("\nNeural Network Classification Report:")
print(classification_report(y_test, y_pred , zero_division = 1))
print(f'Accuracy: {accuracy_score(y_test, y_pred ,) * 100:.2f}%')


Neural Network Classification Report:
              precision    recall  f1-score   support

           1       0.38      1.00      0.55         3
           2       0.00      0.00      0.00         3
           3       0.00      0.00      0.00         4
           4       0.00      0.00      0.00         2
           5       1.00      0.00      0.00         4

    accuracy                           0.19        16
   macro avg       0.28      0.20      0.11        16
weighted avg       0.32      0.19      0.10        16

Accuracy: 18.75%


In [157]:
param_grid = {
    'svm__C': [0.1, 1, 10],
    'svm__gamma': ['scale', 'auto']
}

In [158]:
grid_search = GridSearchCV(svm_pipeline, param_grid, cv=5)
grid_search.fit(X_train, y_train)

In [159]:
print(f"Best parameters found: {grid_search.best_params_}")

Best parameters found: {'svm__C': 0.1, 'svm__gamma': 'scale'}


In [160]:
y_pred = grid_search.best_estimator_.predict(X_test)
print(classification_report(y_test, y_pred , zero_division=1))

              precision    recall  f1-score   support

           1       1.00      0.00      0.00         3
           2       0.00      0.00      0.00         3
           3       0.60      0.75      0.67         4
           4       1.00      0.00      0.00         2
           5       0.00      0.00      0.00         4

    accuracy                           0.19        16
   macro avg       0.52      0.15      0.13        16
weighted avg       0.46      0.19      0.17        16

