# Skytrax Airline User Reviews

#### Table of Contents:
Data cleaning<br>
Exploratory data analysis<br>
NLP on reviewer comments using NLTK and Scikit-learn<br>
Data analysis on airline ratings data<br>

#### Background:
Customer sentiment plays a significant role in how people perceive airlines and may influence whether they recommend the airline to someone else. Additionally, a large number of unfavorable reviews can bring uncomfortable media attention and cause public relations issues. Our goal is to use Natural Language Processing (NLP) on reviewer comments in the Skytrax Airlines User Reviews dataset to predict if a reviewer would recommend an airline based on what they wrote.<br>
<br>
Skytrax data can be found here: https://github.com/quankiquanki/skytrax-reviews-dataset

## Setup and data cleaning
Importing packages, reading in data

In [None]:
# import your packages you need
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math as math
import seaborn as sns
%matplotlib inline

In [None]:
# use pandas to import the airline skytrax data that you saved to your computer
airline=pd.read_csv(r'C:\Users\Laura\Documents\Learning Python\Skytrax airline data\data\airline.csv', sep=',')

In [None]:
#check out the data to see what it looks like
airline.head()

In [None]:
#taking a look at the columns
airline.info()

In [None]:
#subset your data to just the variables we need 
airline=airline[['airline_name', 'title', 'author', 'author_country', 'date', 'content', 
                   'cabin_flown', 'overall_rating', 'value_money_rating', 'recommended']]

In [None]:
airline.info()

In [None]:
#convert date to datetime, check out min and max
airline['date'] = pd.to_datetime(airline['date'])
print(airline['date'].min())  #looks at our dates
print(airline['date'].max())

In [None]:
# we have missing data for several of these variables so we will need to drop them
print("we are now dropping missings and saving in a new dataset called airline_completecase")
airline_completecase = airline.dropna()
#how many rows?
print('we have '+ str(len(airline_completecase)) + ' complete case observations in this subset')

#convert overall rating and value money rating to int
airline_completecase.overall_rating=airline_completecase.overall_rating.astype(int)
airline_completecase.value_money_rating=airline_completecase.value_money_rating.astype(int)

In [None]:
#let's make a No/Yes variable for recommended
airline_completecase['rec'] = np.where(airline_completecase['recommended']==1, 'Yes', 'No')

In [None]:
#apply length of message data set
airline_completecase['length'] = airline_completecase['content'].apply(len)

In [None]:
#information about our dataset
airline_completecase.info()

In [None]:
#information about our dataset 
airline_completecase.nunique().sort_values()

In [None]:
airline_completecase.describe()

In [None]:
#check out airlines complete case 
airline_completecase.head()

### Train/Test Split

This test size is 30% of the dataset (10410 obs) and the training set is the rest (24269 obs). Note: This is the default split. 

In [None]:
#Split data into train/test

from sklearn.model_selection import train_test_split

msg_train, msg_test, label_train, label_test = \
train_test_split(airline_completecase['content'], airline_completecase['rec'], test_size=0.3)

print('training set: '  + str(len(msg_train)) + ' obs')
print('test set: '+ str(len(msg_test)) + ' obs' )
print('combined: ' + str(len(msg_train) + len(msg_test))  + ' obs')

In [None]:
msg_train.head()

In [None]:
msg_test.head()

## Exploratory Data Analysis

We will do some exploratory data analysis on this text stuff to see what we can learn about the relationship between the reviews and whether or not the reviewer recommends the airline

In [None]:
#countplot of recommended (0=no, 1=yes)
sns.set(style="darkgrid")
ax=sns.countplot(x='recommended', data=airline_completecase)
ax.set_title("Recommend Airline?")
ax.set_xticklabels(["No","Yes"])
ax.set_xlabel(" ")
ax.set_ylabel("Number of Reviews")

In [None]:
#rating by recommend or not. This is obviously expected
sns.set(style="darkgrid")
g = sns.FacetGrid(airline_completecase, col="recommended", margin_titles=True)
bins = np.linspace(0, 10, 10)
g.map(plt.hist, "overall_rating", color="steelblue", bins=bins)
axes = g.axes.flatten()
axes[0].set_title("Not Recommended")
axes[1].set_title("Recommended")
axes[0].set_ylabel("Count")
for ax in axes:
    ax.set_xlabel("Overall Rating")

In [None]:
#rating by value money rated. This is obviously expected
g = sns.FacetGrid(airline_completecase, col="recommended", margin_titles=True)
bins = np.linspace(0, 5, 5)
g.map(plt.hist, "value_money_rating", color="steelblue", bins=bins)
axes = g.axes.flatten()
axes[0].set_title("Not Recommended")
axes[1].set_title("Recommended")
axes[0].set_ylabel("Count")
for ax in axes:
    ax.set_xlabel("Value Money Rating")

In [None]:
airline_completecase['recommended'].value_counts()

In [None]:
#boxplots of median overall and value money rating by whether or not reviewere recommended airline
f, axes = plt.subplots(1, 2, figsize=(8, 4))
sns.boxplot(x=airline_completecase['recommended'], y=airline_completecase['overall_rating'],  ax=axes[0])
axes[0].set_title("Median Overall Rating")
axes[0].set_xticklabels(["No","Yes"])
axes[0].set_xlabel("Recommend Airline?")
axes[0].set_ylabel("Overall Rating")

sns.boxplot(x=airline_completecase['recommended'], y=airline_completecase['value_money_rating'],  ax=axes[1])
axes[1].set_title("Median Value Money Rating")
axes[1].set_xticklabels(["No","Yes"])
axes[1].set_xlabel("Recommend Airline?")
axes[1].set_ylabel("Value for Money Rating")

plt.tight_layout()

In [None]:
#median ratings by recommended
airline_completecase.groupby('recommended').median()

In [None]:
#mean ratings by recommended
airline_completecase.groupby('recommended').mean()

In [None]:
#length of review vs recommended or not
g = sns.FacetGrid(airline_completecase, col="recommended", margin_titles=True, height=6)
bins = 100
g.map(plt.hist, "length", color="steelblue", bins=bins)
axes = g.axes.flatten()
axes[0].set_title("Not Recommended")
axes[1].set_title("Recommended")
axes[0].set_ylabel("Count")
for ax in axes:
    ax.set_xlabel("Length of Review")

In [None]:
#exploratory look 
airline_completecase.groupby('recommended').describe()

# Now getting into NLP with NLTK! We will analyze the reviews and see if they predict whether the reviewer recommends the airlines

Special thanks to the Python for Data Science and Machine Learning Bootcamp by Jose Portilla at Udemy.com, since I used that lecture to set this up<br>
<br>
Class link here: https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp/learn/v4/overview

In [None]:
#get set up for NLP using NLTK
import nltk
import matplotlib.pyplot as plt
import string
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report
from sklearn.pipeline import Pipeline


In [None]:
msg_train.head()

In [None]:
print(msg_train.iloc[111])  #check out a few full reviews to get a feeel for what they look like
print("")
print(msg_train.iloc[535])
print("")
print(msg_train.iloc[904])
print("")
print(msg_train.iloc[5678])

### Text Pre-processing
Our data is all in text format (strings). We will need to change this to some sort of numerical feaeture so that we can work with it. The simplest version is to use the bag-of-words approach, where each unique word in a text is represented by a number. We will convert the raw messages into vectors.<br>
<br>

We will start off by writing a function that splits a message into its individual words (returns a list), we will remove punctuation using the Python string library and remove common words using the NLTK library stopwords. 

In [None]:
import string
print("string imported")
print(string.punctuation)

In [None]:
#show some stopwords
from nltk.corpus import stopwords
stopwords.words('english')[0:10] # Show some stop words

In [None]:
#putting together a funtion to apply to our airlines data

def text_process(mess):
    """
    Takes in a string of text, then performs the following:
    1. Remove all punctuation
    2. Remove all stopwords
    3. Returns a list of the cleaned text
    """
    # Check characters to see if they are in punctuation
    nopunc = [char for char in mess if char not in string.punctuation]

    # Join the characters again to form the string.
    nopunc = ''.join(nopunc)
    
    # Now just remove any stopwords
    return [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]

### Vectorization: converting each message into a vector that machine learning models can understand.

This will be done in three steps using the bag-of-words model:<br>
1. Count how many times does a word occur in each message (Known as term frequency)<br>
2. Weigh the counts, so that frequent tokens get lower weight (inverse document frequency)<br>
3. Normalize the vectors to unit length, to abstract from the original text length (L2 norm)<br>

### TF-IDF: applying term weighting and normalization

We will now apply TF-IDF (term frequency-inverse document frequency), and the tf-idf weight is a weight often used in information retrieval and text mining. The more frequently a word appears, the lower its weight will be and vice versa.<br>
<br>
TF: Term Frequency, which measures how frequently a term occurs in a document. Since every document is different in length, it is possible that a term would appear much more times in long documents than shorter ones. Thus, the term frequency is often divided by the document length (aka. the total number of terms in the document) as a way of normalization:<br>

TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document).<br>

IDF: Inverse Document Frequency, which measures how important a term is. While computing TF, all terms are considered equally important. However it is known that certain terms, such as "is", "of", and "that", may appear a lot of times but have little importance. Thus we need to weigh down the frequent terms while scale up the rare ones, by computing the following:<br>

IDF(t) = log_e(Total number of documents / Number of documents with term t in it).<br>

### Training a model: Naive Bayes classifier algorithm

Now that everything is represented as a vector, we can finally start to train our classifier. Naive Bayes classifier is a good place to start

### Building a data pipeline
We will run our model and then predict off the test set, using the SciKit Learn's pipeline capabilities to store a pipeline of workflow.

In [None]:
#pipepine to do it all, same processes as before (this is identical to the same section on 
#NLP in the Udemy course jupyter notebook)

from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ('bow', CountVectorizer(analyzer=text_process)),  # strings to token integer counts
    ('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
    ('classifier', MultinomialNB()),  # train on TF-IDF vectors w/ Naive Bayes classifier
])

print('pipeline built')

In [None]:
#start with the training data
pipeline.fit(msg_train,label_train)
print('done yay!')

In [None]:
#now use the test data
predictions = pipeline.predict(msg_test)

print('done yay!')

In [None]:
#see how well it did
from sklearn.metrics import classification_report
print(classification_report(predictions,label_test))

### Some thoughts

Precision: True positives/(True positives + false positives)<br>
Recall: True positives/(True positives + false negatives)<br>
<br>
Our precision for the "Yes" predictions is pretty high, this is pretty nice. However, our precision for the "No" predictions comes out not nearly as nice. That being said, it is important in this case that we are able to have high precision in the "yes" column since this minimizes false positives. As something like this might be used to predict revenue from airline ticket sales, false positives may lead one to miss their revenue targets, which can be problematic for an airline's bottom line.

In [None]:
#let's look at our results
print(predictions[0:10]) # predictions from the pipeline
print(label_test)  #what it was labelled at the outset

In [None]:
#let's do a confusion matrix
from sklearn.metrics import confusion_matrix

cnf_matrix = confusion_matrix(label_test, predictions)  #prediction matrix
cnf_matrix  #print it out

In [None]:
labels = ['No', 'Yes']

fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111)
cax = ax.matshow(cnf_matrix)
plt.title('Confusion matrix of the classifier')
fig.colorbar(cax)
ax.set_xticklabels([''] + labels)
ax.set_yticklabels([''] + labels)
plt.xlabel('Predicted')
plt.ylabel('True')

### Data analysis on airline ratings data
We can also analyze the structured data that is present here. Since there is so much missing data for maany of the ratings, we will focus upon the overall rating, value for money rating, cabin flown, and if the airline was recommended

In [None]:
#back to airline_completecase 
#what are the list of countries represeted
countries_list=airline_completecase.author_country.unique()
countries_list

In [None]:
#proportion that flew in which cabin
airline_completecase['cabin_flown'].value_counts()

### Some exploratory figures

In [None]:
#cabin flown 
ax=sns.countplot(x='cabin_flown', data=airline_completecase)
ax.set_title("Overall Rating by Cabin Flown")
ax.set_xlabel("Cabin Flown")
ax.set_ylabel("Number of Reviews")

In [None]:
# histogram of overall ratings
ax=sns.distplot(airline_completecase.overall_rating, bins=10, kde=False, rug=False);
ax.set_title("Histogram of Overall Ratings")
ax.set_xlabel("Overall Rating")
ax.set_ylabel("Number of Reviews")

In [None]:
#rating by recommend or not. This is obviously expected
sns.set(style="darkgrid")
g = sns.FacetGrid(airline_completecase, col="recommended", margin_titles=True)
bins = np.linspace(0, 10, 10)
g.map(plt.hist, "overall_rating", color="steelblue", bins=bins)
axes = g.axes.flatten()
axes[0].set_title("Not Recommended")
axes[1].set_title("Recommended")
axes[0].set_ylabel("Count")
for ax in axes:
    ax.set_xlabel("Overall Rating")

In [None]:
#rating by recommend or not. This is obviously expected
sns.set(style="darkgrid")
g = sns.FacetGrid(airline_completecase, col="recommended", margin_titles=True)
bins = np.linspace(0, 5, 5)
g.map(plt.hist, "value_money_rating", color="steelblue", bins=bins)
axes = g.axes.flatten()
axes[0].set_title("Not Recommended")
axes[1].set_title("Recommended")
axes[0].set_ylabel("Count")
for ax in axes:
    ax.set_xlabel("Value Money Rating")

In [None]:
#rating by cabin flown 
sns.set(style="darkgrid")
g = sns.FacetGrid(airline_completecase, col="cabin_flown", margin_titles=True)
bins = np.linspace(0, 10, 10)
g.map(plt.hist, "overall_rating", color="steelblue", bins=bins)
axes = g.axes.flatten()
axes[0].set_title("Economy")
axes[1].set_title("Business Class")
axes[2].set_title("Premium Economy")
axes[3].set_title("First Class")
axes[0].set_ylabel("Count")
for ax in axes:
    ax.set_xlabel("Overall Rating")

In [None]:
#rating by cabin flown 
sns.set(style="darkgrid")
g = sns.FacetGrid(airline_completecase, col="cabin_flown", margin_titles=True)
bins = np.linspace(0, 5, 5)
g.map(plt.hist, "value_money_rating", color="steelblue", bins=bins)
axes = g.axes.flatten()
axes[0].set_title("Economy")
axes[1].set_title("Business Class")
axes[2].set_title("Premium Economy")
axes[3].set_title("First Class")
axes[0].set_ylabel("Count")
for ax in axes:
    ax.set_xlabel("Value Money Rating")

### What airlines are most represented here?

In [None]:
#groupby carrier to look at how things rank by carrier
airlines_carrier=airline_completecase['airline_name'].value_counts()
airlines_carrier

In [None]:
airlines_carrier[airlines_carrier>100] 

In [None]:
#median by airline carriers
airlines_grouped_median=airline_completecase.groupby('airline_name').median()
#preliminary rank by carrier, but will need to restrict to like those with 100+ reviews only 
airlines_grouped_median.sort_values(by='overall_rating', ascending=False)

### It looks like we will need to subset our data to the more popular airlines with >100 reviews 

In [None]:
#subset only by airlines with >100 reviews 
airlines_carrier=airline_completecase['airline_name'].value_counts()  #counts of airline carrier
airlines_subset=airlines_grouped_median[airlines_carrier>100]  #subset to those with >100 reviews
airlines_subset.sort_values(by='overall_rating', ascending=False)   #mediannprint out overall ratings 


In [None]:
#print out the  list  of airlines that have >100 reviews
#we will use this for the dictionary to generate country of origin
pd.options.display.max_seq_items = 150
print(str(airlines_carrier[airlines_carrier>100].index))  

### Dictionary containing airlines and country of origin (done manually outside of python since there isnt a consistent rule regarding this that I can have a computer do)

In [None]:
airline_dict={
    'british-airways' : 'United Kingdom' ,
    'united-airlines' : 'United States' ,
    'air-canada-rouge' : 'Canada' ,
    'emirates' : 'United Arab Emirates' ,
    'american-airlines' : 'United States' ,
    'lufthansa' : 'Germany' ,
    'qantas-airways' : 'Australia' ,
    'jet-airways' : 'India' ,
    'ryanair' : 'United Kingdom' ,
    'etihad-airways' : 'United Arab Emirates' ,
    'cathay-pacific-airways' : 'Hong Kong' ,
    'qatar-airways' : 'Qatar' ,
    'air-canada' : 'Canada' ,
    'turkish-airlines' : 'Turkey' ,
    'malaysia-airlines' : 'Malaysia' ,
    'virgin-atlantic-airways' : 'United Kingdom' ,
    'singapore-airlines' : 'Singapore' ,
    'china-southern-airlines' : 'China' ,
    'air-france' : 'France' ,
    'delta-air-lines' : 'United States' ,
    'easyjet' : 'United Kingdom' ,
    'aer-lingus' : 'Ireland' ,
    'norwegian' : 'Norway' ,
    'sunwing-airlines' : 'Canada' ,
    'thomson-airways' : 'United Kingdom' ,
    'virgin-australia' : 'Australia' ,
    'Thai-airways' : 'Thailand' ,
    'garuda-indonesia' : 'Indonesia' ,
    'klm-royal-dutch-airlines' : 'Netherlands' ,
    'finnair' : 'Finland' ,
    'swiss-international-air-lines' : 'Switzerland' ,
    'southwest-airlines' : 'United States' ,
    'thomas-cook-airlines' : 'United Kingdom' ,
    'allegiant-air' : 'United States' ,
    'tap-portugal' : 'Portugal' ,
    'iberia' : 'Spain' ,
    'asiana-airlines' : 'South Korea' ,
    'jet2-com' : 'United Kingdom' ,
    'korean-air' : 'South Korea' ,
    'eva-air' : 'Taiwan' ,
    'air-transat' : 'Canada' ,
    'air-india' : 'India' ,
    'alitalia' : 'Italy' ,
    'airasia' : 'Malaysia' ,
    'spirit-airlines' : 'United States' ,
    'jetstar-airways' : 'Australia' ,
    'air-new-zealand' : 'New Zealand' ,
    'srilankan-airlines' : 'Sri Lanka' ,
    'sas-scandinavian-airlines' : 'Sweden' ,
    'china-eastern-airlines' : 'China' ,
    'air-berlin' : 'Germany' ,
    'ana-all-nippon-airways' : 'Japan' ,
    'vietnam-airlines' : 'Vietnam' ,
    'us-airways' : 'United States' ,
    'austrian-airlines' : 'Austria' ,
    'royal-brunei-airlines' : 'Brunei' ,
    'south-african-airways' : 'South Africa' ,
    'lan-airlines' : 'Chile' ,
    'philippine-airlines' : 'Philippines' ,
    'icelandair' : 'Iceland' ,
    'flybe' : 'United Kingdom' ,
    'vueling-airlines' : 'Spain' ,
    'monarch-airlines' : 'United Kingdom' ,
    'aeroflot-russian-airlines' : 'Russian Federation' ,
    'aegean-airlines' : 'Greece' ,
    'wizz-air' : 'Hungary' ,
    'frontier-airlines' : 'United States' ,
    'virgin-america' : 'United States' ,
    'air-china' : 'China' ,
    'hawaiian-airlines' : 'United States' ,
    'alaska-airlines' : 'United States' ,
    'tigerair' : 'Australia' ,
    'egyptair' : 'Egypt' ,
    'china-airlines' : 'China' ,
    'brussels-airlines' : 'Belgium' ,
    'bangkok-airways' : 'Thailand' ,
    'tam-airlines' : 'Brazil' ,
    'ethiopian-airlines' : 'Ethiopia' ,
    'oman-air' : 'Oman' ,
    'japan-airlines' : 'Japan' ,
    'airasia-x' : 'Malaysia' ,
    'fiji-airways' : 'Fiji' ,
    'royal-jordanian-airlines' : 'Jordan' ,
    'aerolineas-argentinas' : 'Argentina' ,
    'gulf-air' : 'Bahrain' ,
    'lot-polish-airlines' : 'Poland' ,
    'kenya-airways' : 'Kenya' ,
    'avianca' : 'Colombia' ,
    'continental-airlines' : 'United States' ,
    'condor-airlines' : 'Germany' ,
    'air-europa' : 'Spain' ,
    'dragonair' : 'Hong Kong' ,
    'bmi-british-midland-international' : 'United Kingdom' ,
    'scoot' : 'Singapore' ,
    'silkair' : 'Singapore' ,
    'royal-air-maroc' : 'Morocco' ,
    'el-al-israel-airlines' : 'Israel' ,
    'aeromexico' : 'Mexico' ,
    'saudi-arabian-airlines' : 'Saudi Arabia' ,
    'pegasus-airlines' : 'Turkey' ,
    'indigo-airlines' : 'India' ,
    'air-astana' : 'Kazakhstan' ,
    'copa-airlines' : 'Panama' ,
    'kuwait-airways ' : 'Kuwait'}

In [None]:
#airlines_subset
#now we have to go and assign these to the carriers
#note that those where airline_country is missing are those who are not included in final set 
#to do this, we will definitely have to go back to the original dataset airlines_complete case and rerun these things
airline_completecase['airline_country']=airline_completecase['airline_name'].map(airline_dict)

In [None]:
#check this out to see if it checks out
airline_completecase.tail()

In [None]:
#check out the ones for BA to see that this checks out
airline_completecase[airline_completecase['airline_name']=='british-airways']

In [None]:
#check out the ones for adria to see that this checks out : Adria should not have a country assigned 
airline_completecase[airline_completecase['airline_name']=='adria-airways']

In [None]:
#go get rid of missings in airline_completecase
print("we are now dropping reviews from airlines with less than 100 reviews from airline_completecase and saving in a new dataset called airline_short")
airline_short = airline_completecase.dropna()
#how many?
print('we have '+ str(len(airline_short)) + ' complete case observations in this subset')

In [None]:
airline_short.info()

In [None]:
airline_short.nunique().sort_values()

In [None]:
#now let's sort by countries!!
#median by airline carriers
airlines_median=airline_short.groupby('airline_country').median()
#preliminary rank by carrier, but will need to restrict to like those with 100+ reviews only 
medsort=airlines_median.sort_values(by='overall_rating', ascending=False)
medsort

In [None]:
airline_short['airline_country'].value_counts().head(10)

In [None]:
f, ax = plt.subplots(figsize=(12, 50))
sns.barplot(x="overall_rating", y=medsort.index, data=medsort,
            label="Overall", color="b")
# Add a legend and informative axis label
ax.set_title('Median Ratings by Airline Country of Origin')
ax.set(xlim=(0, 10), ylabel="Country of Airline",
       xlabel="Rating")
plt.savefig("ratingsbycountry.png", dpi=300)

In [None]:
#do just for the US

#now let us subset by US carriers only
us_airlines=airline_short[airline_short['airline_country']=="United States"].groupby('airline_name').median()
#sort by descenting rating 
us_ranked=us_airlines.sort_values(by='overall_rating', ascending=False)
us_ranked

In [None]:
f, ax = plt.subplots(figsize=(12, 18))
sns.barplot(x="overall_rating", y=us_ranked.index, data=us_ranked,
            label="Overall", color="b")
# Add a legend and informative axis label
ax.set_title('Median Ratings of US Airlines')
ax.set(xlim=(0, 10), ylabel="Airline",
       xlabel="Rating")
plt.savefig("usratingsbycarrier.png")

In [None]:
#to get percent recommended
#recommended is a dichotomous variable coded 0, 1, a mean of these will give you a proportion that would 
#recommend this airline
rec_frac=airline_short[airline_short['airline_country']=="United States"].groupby('airline_name').mean()  
rec_sort=rec_frac.sort_values(by='recommended', ascending=False)
rec_sort['recommended']

In [None]:
#multiply recommended by 100 for percentage
rec_sort['rec_pct']=rec_sort['recommended']*100

f, ax = plt.subplots(figsize=(6, 18))
sns.barplot(x="rec_pct", y=rec_sort.index, data=rec_sort,
            label="Overall", color="b")
# Add a legend and informative axis label
ax.set_title('Average Ratings of US Airlines')
ax.set(xlim=(0, 100), ylabel="Airline",
       xlabel="Percent that would Recommend")
plt.savefig("usrecs_bycarrier.png")