# IMPROVE CUSTOMER SATISFACTION FOR EMIRATES AIRLINE
In the service industry, keep high customer satisfaction is always among the top priority. Hence, for all airlines how customers view their service which reflects in overall rating is a good metric to measure the performance. Emirates airline is one of the airlines in the average range of overall rating (6/10). From 2010 to 2015, the rating for its service has not changed much. Improve certain parts in their service will definitely enhance customer satisfaction, and attract travellers to use the service more often or even suggest to their friends. However, Emirates needs to know where its investment would be most profitable, that is it needs to analyze cost and benefit related to increasing the performance in certain services. Therefore, in this project my aim is to provide a model where Emirates airlines can use in planning its targetted overall rating. 

**Data**

The data is scraped from airlinequality.com by quankiquanki from Skytrax website: https://github.com/quankiquanki/skytrax-reviews-dataset. Skytrax website has long been a great site for customers reviews of airlines, where it collects reviews and ratings for each airlines, lounges, seats and airports. The dataset for Emirates airline has a total sample of 691 observations with 20 columns containing information of the reviewers (name, country, date of review, type of travellers) and their ratings for each features as well as comments on the airlines. The ratings for each attribute range from 1 to 5, while overall rating is from 1 to 10. 

In [9]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import math
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
import statsmodels.api as sm

In [10]:
df = pd.read_csv('airline.csv')
df.head(2)

Unnamed: 0,airline_name,link,title,author,author_country,date,content,aircraft,type_traveller,cabin_flown,route,overall_rating,seat_comfort_rating,cabin_staff_rating,food_beverages_rating,inflight_entertainment_rating,ground_service_rating,wifi_connectivity_rating,value_money_rating,recommended
0,adria-airways,/airline-reviews/adria-airways,Adria Airways customer review,D Ito,Germany,2015-04-10,Outbound flight FRA/PRN A319. 2 hours 10 min f...,,,Economy,,7.0,4.0,4.0,4.0,0.0,,,4.0,1
1,adria-airways,/airline-reviews/adria-airways,Adria Airways customer review,Ron Kuhlmann,United States,2015-01-05,Two short hops ZRH-LJU and LJU-VIE. Very fast ...,,,Business Class,,10.0,4.0,5.0,4.0,1.0,,,5.0,1


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41396 entries, 0 to 41395
Data columns (total 20 columns):
airline_name                     41396 non-null object
link                             41396 non-null object
title                            41396 non-null object
author                           41396 non-null object
author_country                   39805 non-null object
date                             41396 non-null object
content                          41396 non-null object
aircraft                         1278 non-null object
type_traveller                   2378 non-null object
cabin_flown                      38520 non-null object
route                            2341 non-null object
overall_rating                   36861 non-null float64
seat_comfort_rating              33706 non-null float64
cabin_staff_rating               33708 non-null float64
food_beverages_rating            33264 non-null float64
inflight_entertainment_rating    31114 non-null float64
ground_se

In [12]:
df_emi = df[df.airline_name=='emirates']
df_emi.head(2)

Unnamed: 0,airline_name,link,title,author,author_country,date,content,aircraft,type_traveller,cabin_flown,route,overall_rating,seat_comfort_rating,cabin_staff_rating,food_beverages_rating,inflight_entertainment_rating,ground_service_rating,wifi_connectivity_rating,value_money_rating,recommended
14773,emirates,/airline-reviews/emirates,Emirates customer review,B Finn,Australia,2015-08-02,Overall we found the experience disappointing....,A380,FamilyLeisure,Economy,Singapore to Paris via Dubai,5.0,3.0,1.0,2.0,3.0,4.0,3.0,3.0,0
14783,emirates,/airline-reviews/emirates,Emirates customer review,Michael Leibman,South Africa,2015-07-28,Flight from Cape Town was late into Dubai due ...,Boeing 777 and A380,Couple Leisure,Business Class,Cape Town to London via Dubai,7.0,3.0,3.0,3.0,4.0,2.0,2.0,3.0,0


In [13]:
df_emi.describe()

Unnamed: 0,overall_rating,seat_comfort_rating,cabin_staff_rating,food_beverages_rating,inflight_entertainment_rating,ground_service_rating,wifi_connectivity_rating,value_money_rating,recommended
count,690.0,691.0,691.0,688.0,688.0,61.0,35.0,691.0,691.0
mean,6.246377,3.596237,3.287988,3.453488,4.190407,3.508197,3.142857,3.609262,0.615051
std,3.088937,1.207675,1.543425,1.351322,1.075096,1.409763,1.497898,1.329978,0.486936
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0
25%,3.0,3.0,2.0,2.0,4.0,3.0,2.0,3.0,0.0
50%,7.0,4.0,4.0,4.0,5.0,4.0,3.0,4.0,1.0
75%,9.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,1.0
max,10.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,1.0


In [14]:
df_emi.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 691 entries, 14773 to 15808
Data columns (total 20 columns):
airline_name                     691 non-null object
link                             691 non-null object
title                            691 non-null object
author                           691 non-null object
author_country                   690 non-null object
date                             691 non-null object
content                          691 non-null object
aircraft                         50 non-null object
type_traveller                   61 non-null object
cabin_flown                      691 non-null object
route                            61 non-null object
overall_rating                   690 non-null float64
seat_comfort_rating              691 non-null float64
cabin_staff_rating               691 non-null float64
food_beverages_rating            688 non-null float64
inflight_entertainment_rating    688 non-null float64
ground_service_rating            61 non

# Data Wrangling

In the data cleaning step, I do the following:
* Only choose data since 2010
* Drop data without overall rating
* Drop information that is not needed for modelling purpose: link, title, author, author country, date, content, aircraft type, route, recommended
* Encode missing values as 0
* Get dummies for traveller type and cabin flown 

In [16]:
df_emi.date=pd.to_datetime(df_emi.date) #change date object into datetime format
df_emi=df_emi[df_emi.date>='2010-01-01'] 
df_emi.shape

(691, 20)

In [20]:
#drop data points without overall rating
df_clean=df_emi[df_emi['overall_rating'].notnull()]

In [21]:
#drop some attributes that will not be used in modelling
df_clean.drop(['airline_name','link','title','author','author_country','date','content','aircraft','route','recommended'],axis=1, inplace=True)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [23]:
df_clean.head()

Unnamed: 0,type_traveller,cabin_flown,overall_rating,seat_comfort_rating,cabin_staff_rating,food_beverages_rating,inflight_entertainment_rating,ground_service_rating,wifi_connectivity_rating,value_money_rating
14773,FamilyLeisure,Economy,5.0,3.0,1.0,2.0,3.0,4.0,3.0,3.0
14783,Couple Leisure,Business Class,7.0,3.0,3.0,3.0,4.0,2.0,2.0,3.0
14784,Couple Leisure,Economy,9.0,5.0,4.0,4.0,5.0,4.0,,5.0
14794,Couple Leisure,Business Class,10.0,5.0,5.0,5.0,5.0,4.0,5.0,4.0
14795,FamilyLeisure,Economy,6.0,4.0,3.0,5.0,5.0,3.0,1.0,3.0


In [24]:
#deal with missing values
df_clean=df_clean.fillna({'ground_service_rating':0, 'wifi_connectivity_rating':0,'seat_comfort_rating':0,'cabin_staff_rating':0,'food_beverages_rating':0,'inflight_entertainment_rating':0, 'value_money_rating':0}, inplace=True)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)


In [25]:
#change types cabin flown and traveler types into category type
df_cabin_flown= pd.get_dummies(df_clean['cabin_flown'])
df_clean=pd.concat([df_clean, df_cabin_flown], axis=1)
df_clean.drop(['cabin_flown'], axis=1, inplace=True)

df_type_traveller= pd.get_dummies(df_clean['type_traveller'])
df_clean=pd.concat([df_clean, df_type_traveller], axis=1)
df_clean.drop(['type_traveller'], axis=1, inplace=True)

In [26]:
df_clean.isnull().values.any() #check if there are any missing values in dataframe

False

In [27]:
df_clean.head()

Unnamed: 0,overall_rating,seat_comfort_rating,cabin_staff_rating,food_beverages_rating,inflight_entertainment_rating,ground_service_rating,wifi_connectivity_rating,value_money_rating,Business Class,Economy,First Class,Premium Economy,Business,Couple Leisure,FamilyLeisure,Solo Leisure
14773,5.0,3.0,1.0,2.0,3.0,4.0,3.0,3.0,0,1,0,0,0,0,1,0
14783,7.0,3.0,3.0,3.0,4.0,2.0,2.0,3.0,1,0,0,0,0,1,0,0
14784,9.0,5.0,4.0,4.0,5.0,4.0,0.0,5.0,0,1,0,0,0,1,0,0
14794,10.0,5.0,5.0,5.0,5.0,4.0,5.0,4.0,1,0,0,0,0,1,0,0
14795,6.0,4.0,3.0,5.0,5.0,3.0,1.0,3.0,0,1,0,0,0,0,1,0


In [29]:
df_clean.shape

(690, 16)