Airline-Passenger-Referral-Prediction

Capstone Project- Classification, Predicted if the passenger's recommend airline to his friends or not.

Probelm statement:-

Data includes airline reviews from 2006 to 2019 for popular airlines around the world with multiple choice and free text questions. Data is scraped in Spring 2019. The main objective is to predict whether passengers will refer the airline to their friends.

Feature descriptions briefly as follows:

airline: Name of the airline.

overall: Overall point is given to the trip between 1 to 10.

author: Author of the trip

reviewdate: Date of the Review customer review: Review of the customers in free text format

aircraft: Type of the aircraft

travellertype: Type of traveler (e.g. business, leisure)

cabin: Cabin at the flight date flown: Flight date

seatcomfort: Rated between 1-5

cabin service: Rated between 1-5

foodbev: Rated between 1-5 entertainment: Rated between 1-5

groundservice: Rated between 1-5

valueformoney: Rated between 1-5

recommended: Binary, target variable.

Dataset

EDA

An EDA is a detailed analysis designed to reveal a data set's underlying structure. It is significant for a business because it identifies trends, patterns, and linkages that are not intuitively clear.

Univariate Analysis:

Numerical Features:

Outlier Detection

Therefore, no outliers has been detected in dataset.

Categorical Features:

The most frequent airline in the dataset, Spirit Airlines, maintains the top spot for the number of flights, followed by American and United airlines. As shown below: The most frequent Aircraft in the dataset, Airbus A320, maintains the top spot for the number of flights, followed by Boeing 777 and Airbus A380 aircraft. According to the above analysis, Bangkok to Hong Kong journey with maximum frequency in dataset holds the tops position followed by Bangkok to London and London to New York. The month of July is said to be the one with the highest travel. The second-most popular month for travel is December. The three plots mentioned above made it easier for us to understand that the majority of travellers are Solo Leisure in travellers type column. For most passengers, the Economy class is the one they like in the cabin column. There is slight variation between recommended and not recommended in the recommended column.

Bivariate Analysis**:

All types of travellers strongly prefer the economy class. Some of the Business class and Couple Leisure people choose business class for travelling. First class is least preferred among all traveller type categories.

Multivariate Analysis:

Airlines recent 5 year trend: #Multicollinearity: We can observe that a lot of rating variables have strongly correlated with the overall rating column. Therefore, we may ignore the remaining correlated columns and focus just on the overall column in order to optimize our analysis.

Natural Language Processing:

##Text Cleaning: Text cleaning is the process of preparing raw text for NLP (Natural Language Processing) so that machines can understand human language. Following approach is used here to clean the text of customer reviews:

Use pos_tag with nltk:- POS Tagging in NLTK is a process to mark up the words in text format for a particular part of a speech based on its definition and context.
Remove all character which are excluded from "a-z and A-Z".
Convert words into Lowercase and split them through space.
Remove stopwords using nltk library.
Lemmatization of reviews and get the meaningful words using WordNetLemmatizer.
Join back the words that were split before.
Initiate tokenization process.

Most Frequent words in customer review column:

Model performance:

Confusion matrix of test data

Auc-Roc Curve:

Model performance based on randomly created reviews:

Finally, interpreting the model through SHAP:

Conclusion:

Logistic Regression performed best among all other algorithms for this particular type of dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Airline Passenger Referral Prediction.pptx		Airline Passenger Referral Prediction.pptx
Airline_Passenger_Referral_Prediction.ipynb		Airline_Passenger_Referral_Prediction.ipynb
README.md		README.md
data_airline_reviews.xlsx		data_airline_reviews.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly