# **Customer Satisfaction Classification**

## **Project Description:**
This project demonstrates the application of machine learning techniques to predict customer satisfaction in the airline industry using a Support Vector Machine (SVM) classifier. 
The goal is to classify customer satisfaction levels based on features such as flight duration, baggage complaints, in-flight service scores, and overall satisfaction ratings.
## **Dataset:**
#### The dataset contains the following columns:
 ##### - Customer ID: Unique identifier for each customer.
 ##### - Airline: The airline that the customer flew with.
 ##### - Flight Duration (hrs): The duration of the flight in hours.
 ##### - Baggage Complaints: The number of baggage complaints filed by the customer.
 ##### - In-flight Services Score: A rating (1-5) of the services provided during the flight.
 ##### - Overall Satisfaction: The overall satisfaction rating (1-5) provided by the customer.
## **Key Steps:**
#### Data Preprocessing:
   - Handling missing values (if any).
   - Encoding categorical features (e.g., airline names).
   - Scaling numerical features to bring them to the same scale.
#### Model Building:
   - Splitting the data into training and testing sets.
   - Building an SVM model to classify customer satisfaction.
#### Model Evaluation:
   - Evaluating the model using accuracy, confusion matrix, classification report, and cross-validation.
#### Key Libraries Used:
   - pandas: For data manipulation and analysis.
   - scikit-learn: For machine learning algorithms and data preprocessing.
   - matplotlib & seaborn: For data visualization.
#### Results:
   - The SVM model was trained and evaluated on the dataset.
   - The classification report provides insights into precision, recall, and F1 score.
   - Accuracy metrics were computed for model evaluation.

### Import Libraries

In [157]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV

## Loading Data

In [5]:
data = pd.read_csv("Airlines.csv")

In [6]:
data.head()

Unnamed: 0,Customer ID,Airline,Flight Duration (hrs),Ticket Price (USD),Baggage Complaints,In-flight Services Score (1-5),Overall Satisfaction (1-5)
0,1,Air Algerie,15.0,238,2,1,2
1,2,Qatar Airways,12.8,773,3,3,2
2,3,Emirates,6.5,1203,4,2,1
3,4,Kuwait,10.8,1037,3,4,3
4,5,ITA,7.3,553,2,5,1


In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 7 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Customer ID                     51 non-null     int64  
 1   Airline                         51 non-null     object 
 2   Flight Duration (hrs)           51 non-null     float64
 3   Ticket Price (USD)              51 non-null     int64  
 4   Baggage Complaints              51 non-null     int64  
 5   In-flight Services Score (1-5)  51 non-null     int64  
 6   Overall Satisfaction (1-5)      51 non-null     int64  
dtypes: float64(1), int64(5), object(1)
memory usage: 2.9+ KB


In [8]:
data.describe()

Unnamed: 0,Customer ID,Flight Duration (hrs),Ticket Price (USD),Baggage Complaints,In-flight Services Score (1-5),Overall Satisfaction (1-5)
count,51.0,51.0,51.0,51.0,51.0,51.0
mean,26.0,7.888235,778.392157,2.078431,3.117647,2.882353
std,14.866069,4.195219,388.183826,1.383375,1.336369,1.380537
min,1.0,1.2,201.0,0.0,1.0,1.0
25%,13.5,4.25,451.0,1.0,2.0,2.0
50%,26.0,7.3,633.0,2.0,3.0,3.0
75%,38.5,11.4,1115.5,3.0,4.0,4.0
max,51.0,15.0,1493.0,4.0,5.0,5.0


## Handling with missing data

In [9]:
data = data.ffill()

In [67]:
data.head()

Unnamed: 0,Customer ID,Airline,Flight Duration (hrs),Ticket Price (USD),Baggage Complaints,In-flight Services Score (1-5),Overall Satisfaction (1-5)
0,1,Air Algerie,15.0,238,2,1,2
1,2,Qatar Airways,12.8,773,3,3,2
2,3,Emirates,6.5,1203,4,2,1
3,4,Kuwait,10.8,1037,3,4,3
4,5,ITA,7.3,553,2,5,1


In [11]:
data = data.dropna()

In [68]:
data.head()

Unnamed: 0,Customer ID,Airline,Flight Duration (hrs),Ticket Price (USD),Baggage Complaints,In-flight Services Score (1-5),Overall Satisfaction (1-5)
0,1,Air Algerie,15.0,238,2,1,2
1,2,Qatar Airways,12.8,773,3,3,2
2,3,Emirates,6.5,1203,4,2,1
3,4,Kuwait,10.8,1037,3,4,3
4,5,ITA,7.3,553,2,5,1


In [17]:
duplicates = data.duplicated().sum()

In [21]:
print("\n Number of duplicates is: " , duplicates )


 Number of duplicates is:  0


###  Preprocess the Data

In [25]:
print(data.columns)


Index(['Customer ID', 'Airline', 'Flight Duration (hrs)', 'Ticket Price (USD)',
       'Baggage Complaints', 'In-flight Services Score (1-5)',
       'Overall Satisfaction (1-5)'],
      dtype='object')


In [149]:
X = data.drop('Overall Satisfaction (1-5)', axis=1)

In [150]:
y = data['Overall Satisfaction (1-5)']

In [152]:
categorical_cols = X.select_dtypes(include=['object']).columns
numerical_cols = X.select_dtypes(exclude=['object']).columns

In [153]:
numerical_pipeline = Pipeline(steps=[
    ('scaler', StandardScaler())  ])

In [154]:
categorical_pipeline = Pipeline(steps=[
    ('encoder', OneHotEncoder(handle_unknown='ignore')) 
])


In [155]:
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_pipeline, numerical_cols),
        ('cat', categorical_pipeline, categorical_cols)
    ])

### Create the SVM Model

In [134]:
svm_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('svm', SVC(kernel='linear')) 
])

###  Split the Data

In [156]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Train the Model

In [135]:
svm_pipeline.fit(X_train, y_train)

### Evaluate the model

In [162]:
y_pred = svm_pipeline.predict(X_test)

In [163]:
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

Accuracy: 43.75%


In [164]:
print('\nClassification Report:')
print(classification_report(y_test,  y_pred , zero_division=1))


Classification Report:
              precision    recall  f1-score   support

           1       0.75      1.00      0.86         3
           2       0.50      0.33      0.40         3
           3       0.33      0.25      0.29         4
           4       0.33      1.00      0.50         2
           5       0.00      0.00      0.00         4

    accuracy                           0.44        16
   macro avg       0.38      0.52      0.41        16
weighted avg       0.36      0.44      0.37        16



In [165]:
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

Confusion Matrix:
[[3 0 0 0 0]
 [0 1 0 1 1]
 [0 0 1 3 0]
 [0 0 0 2 0]
 [1 1 2 0 0]]


In [166]:
param_grid = {
    'svm__C': [0.1, 1, 10],
    'svm__gamma': ['scale', 'auto']
}

In [167]:
grid_search = GridSearchCV(svm_pipeline, param_grid, cv=5)
grid_search.fit(X_train, y_train)

In [168]:
print(f"Best parameters found: {grid_search.best_params_}")

Best parameters found: {'svm__C': 1, 'svm__gamma': 'scale'}


In [169]:
y_pred = grid_search.best_estimator_.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           1       1.00      0.33      0.50         3
           2       0.00      0.00      0.00         3
           3       0.33      0.25      0.29         4
           4       0.20      0.50      0.29         2
           5       0.00      0.00      0.00         4

    accuracy                           0.19        16
   macro avg       0.31      0.22      0.21        16
weighted avg       0.30      0.19      0.20        16

