# Buzzdiggr Pitch to Egypt Air EA
## Introduction
Social media listening and sentiment analysis provide vital information to a business wanting to grow its revenues, retain the customers and provide greater customer experience. Airlines nowadays have turned their heads to the mass crowd in the social media in order to get use of this information. 

This use case will explore buzzdiggr's pitch to the reknowned Egypt Air, Egypt's main commercial flights. Egypt Air, like many are ***interested in looking into ways of understanding how they’re perceived online, and improve their consumer offerings to compete with other airlines based on customer voice.*** Using twitter data mentions, I will explore the various insights that can suggest on Egypt Air where to focus, what to look for and so.

## Libraries, Dataset perp

In [None]:
# Load the main libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rc
import seaborn as sb
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from collections import Counter
import re
from wordcloud import WordCloud, STOPWORDS
import nltk

%matplotlib inline

In [None]:
# Set off the warning
pd.set_option('mode.chained_assignment', None)
# To see all columns in my dataset
pd.set_option('display.max_columns', 500)
# Set off the warning in matplotlib
np.warnings.filterwarnings('ignore')
# load in the dataset into a pandas dataframe, I will look at the structure in the wrangling process
df = pd.read_csv('airlines-extract.csv')
#Download stop words
nltk.download('words')

In [None]:
# First lookout on the data
df.head(3)

In [None]:
# A bit of cleaning to the data, I will add a dummy column to help me out with the plots
# Make columns lowercase adn remove spaces
df.columns = map(str.lower, df.columns)
df.columns = df.columns.str.replace(' ', '')
df.mentioncreationdate = pd.to_datetime(df.mentioncreationdate)
df.authoraccountcreationdate = pd.to_datetime(df.authoraccountcreationdate)
df['count'] = 1

In [None]:
# Define the function to remove urls from the mention texts
def remove_url(txt):
    """Replace URLs found in a text string with nothing 
    (i.e. it will remove the URL from the string).

    Parameters
    ----------
    txt : string
        A text string that you want to parse and remove urls.

    Returns
    -------
    The same txt string with url's removed.
    """

    return " ".join(re.sub("([^0-9A-Za-z \t])|(\w+:\/\/\S+)", "", txt).split())

# Fix the full text
df['mentionfulltext'] = df['mentionfulltext'].str.lower()
df['mentionfulltext'] = [remove_url(tweet) for tweet in df.mentionfulltext]

## First EDA on the dataset
Now to look at the data first glance.
### Twitter mentions for the Airlines, How many times Egypt Air has been mentioned?

In [None]:
fig = px.pie(df, values= 'count', names='brand')
fig.show()

Egypt Air as been mentioned 13511 times, second lowest between the 4 airlines and slightly less than wuarter of the mentions of Lufthansa. From the first look, Egypt Air's presence looks weak on twitter. However that alone **cannot** tell if Egypt Air is precieved bad, good or in the middle. The next step is to look at the sentiment analysis.

Now let's look at the mentions are spread sentimentally.

In [None]:
# I will want to see the spread of sentiments over each brand and see where Egy Air stands
sent_1 = df.groupby(['brand', 'sentiment'])['mentionid'].count()
base = sb.color_palette("GnBu_d", n_colors=3)
sent_1.unstack().plot(kind = 'barh', stacked = True, color = base, figsize = (10,5));
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), fontsize = 'large');
plt.ylabel('Airline');
plt.xlabel('Tweets');

Neutral tweets are ones that mentioned the airline, but not yet decisive on whether the emotion towards the airline is positive or negative. Or it can be an official announcement. They do make up of most the tweets also.

The negative emotions in tweets towards airlines are higher than the positive ones. Looking at the sentiment analysis next, will be looking at the sentiment ratio (Positive : Negative tweets), social media score for Egypt Air and deepen the social listening analysis.

### Sentiment Analysis

In [None]:
# Remove the neutral sentiment
a =  df.loc[df['sentiment'] != 'Neutral']
sent_2 = a.groupby(['sentiment'], as_index = False)['mentionid'].count()

labels = sent_2['sentiment']
values = sent_2['mentionid']

# Use `hole` to create a donut-like pie chart (Yes, I like donuts more)
fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.6)])
fig.show()

In [None]:
test = df[df['sentiment'] != 'Neutral'].groupby(['brand','sentiment'], as_index = False)['mentioninteractions'].sum()
# I just converted it to a list to get the values and add them Negative:Positive
test2 = test.values.tolist()

labels = ['Negative','Positive']

# Create subplots, using 'domain' type for pie charts
specs = [[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'}, {'type':'domain'}]]
fig = make_subplots(rows=2, cols=2, specs=specs)

# Define pie charts
fig.add_trace(go.Pie(labels=labels, values=[170680,17783], title='Egypt Air', title_position = 'bottom center', hole = 0.6), 1, 1)
fig.add_trace(go.Pie(labels=labels, values=[6274356, 936221], title='Emirates', title_position = 'bottom center', hole = 0.6), 1, 2)
fig.add_trace(go.Pie(labels=labels, values=[3369308, 2142353], title='Lufthansa', title_position = 'bottom center', hole = 0.6), 2, 1)
fig.add_trace(go.Pie(labels=labels, values=[230678, 16407], title='Saudi Airlines', title_position = 'bottom center', hole = 0.6), 2, 2)

# Tune layout and hover info
fig.update_traces(hoverinfo='label+percent', textinfo='none')

fig = go.Figure(fig)
fig.show()

Negative tweets occupy slightly more than the positive ones, with 53.9% of both tweet counts (we exclude the neutrals in the analysis).

A deeper look does not sight well for Egypt Air, only **9.4%** of tweet mentions provide that customers have had a positive feedback. It seems that customers faced an unpleasant experience with Egypt Air, at this point the social media presence is weak and negative.

## Egypt Air's social media score on twitter
### SMS against the other airlines, how well is Egypt Air doing?
#### Sentiment Score
##### Egypt Air

In [None]:
# Excluding neutrals
# Egypt Air
egy_pos = len(df[(df['brand'] == 'EgyptAir') & (df['sentiment'] == 'Positive')])
egy_neg = len(df[(df['brand'] == 'EgyptAir') & (df['sentiment'] == 'Negative')])
egy_tot = len(df[(df['brand'] == 'EgyptAir') & (df['sentiment'] != 'Neutral')])
egy_air = df[df['brand'] == 'EgyptAir']
egy_pos_sent = (egy_pos/(egy_tot))*100
egy_neg_sent = (egy_neg/(egy_tot))*100
egy_net_sent = (egy_pos_sent - egy_neg_sent)
egy_reach = (egy_air.authorid.nunique()/egy_air.mentionid.count())*100
egy_mention = len(egy_air)

#Emirates
emir_pos = len(df[(df['brand'] == 'Emirates') & (df['sentiment'] == 'Positive')])
emir_neg = len(df[(df['brand'] == 'Emirates') & (df['sentiment'] == 'Negative')])
emir_tot = len(df[(df['brand'] == 'Emirates') & (df['sentiment'] != 'Neutral')])
emir_air = df[df['brand'] == 'Emirates']
emir_pos_sent = (emir_pos/(emir_tot))*100
emir_neg_sent = (emir_neg/(emir_tot))*100
emir_net_sent = (emir_pos_sent - emir_neg_sent)
emir_reach = (emir_air.authorid.nunique()/emir_air.mentionid.count())*100
emir_mention = len(emir_air)

#Lufthansa
luft_pos = len(df[(df['brand'] == 'Lufthansa') & (df['sentiment'] == 'Positive')])
luft_neg = len(df[(df['brand'] == 'Lufthansa') & (df['sentiment'] == 'Negative')])
luft_tot = len(df[(df['brand'] == 'Lufthansa') & (df['sentiment'] != 'Neutral')])
luft_air = df[df['brand'] == 'Lufthansa']
luft_pos_sent = (luft_pos/(luft_tot))*100
luft_neg_sent = (luft_neg/(luft_tot))*100
luft_net_sent = (luft_pos_sent - luft_neg_sent)
luft_reach = (luft_air.authorid.nunique()/luft_air.mentionid.count())*100
luft_mention = len(luft_air)

# Saudi Airlines
saudi_pos = len(df[(df['brand'] == 'Saudi Airlines') & (df['sentiment'] == 'Positive')])
saudi_neg = len(df[(df['brand'] == 'Saudi Airlines') & (df['sentiment'] == 'Negative')])
saudi_tot = len(df[(df['brand'] == 'Saudi Airlines') & (df['sentiment'] != 'Neutral')])
saudi_air = df[df['brand'] == 'Saudi Airlines']
saudi_pos_sent = (saudi_pos/(saudi_tot))*100
saudi_neg_sent = (saudi_neg/(saudi_tot))*100
saudi_net_sent = (saudi_pos_sent - saudi_neg_sent)
saudi_reach = (saudi_air.authorid.nunique()/saudi_air.mentionid.count())*100
saudi_mention = len(saudi_air)

In [None]:
data = {'EgyptAir': [egy_net_sent, egy_reach, egy_mention], 'Emirates': [emir_net_sent, emir_reach, emir_mention],
       'Lufthansa': [luft_net_sent, luft_reach, luft_mention], 'SaudiAirlines': [saudi_net_sent, saudi_reach, saudi_mention]}

pd.DataFrame.from_dict(data, orient='index',
                       columns=['Net Sentiment', 'Reach', 'Mention'])

Egypt Air will need to do much better on the twitter platform, though 44.2% reach. The sentiment score shows the same as mentioned, negative experiences of customers as seen also from the pie chart.

## Sounds on twitter on Egypt Air's data
### Postflight and Inflight
Tagging the dataset with two categories, ***postflight*** and ***inflight***. Both tags contain topics of interest, The main measure is to see the sentiment with these tags (excluding neutral) against the mentions interactions that took place.

In [None]:
df['inflight'] = pd.np.where(df['mentionfulltext'].str.contains("entertainment|movie|music"), "entertainment",
                            pd.np.where(df['mentionfulltext'].str.contains("toilet"), "toilet",
                            pd.np.where(df['mentionfulltext'].str.contains("service"), "service",
                            pd.np.where(df['mentionfulltext'].str.contains("smell|odor|odour"), "smell",
                            pd.np.where(df['mentionfulltext'].str.contains("passenger"), "passenger",
                            pd.np.where(df['mentionfulltext'].str.contains("crew|aircrew|cabincrew|attendant"), "crew",
                            pd.np.where(df['mentionfulltext'].str.contains("comfort|seat|seating|space"), "comfort",
                            pd.np.where(df['mentionfulltext'].str.contains("food|drink|beverage|water|juice|foodie"), "food/drink",
                            pd.np.where(df['mentionfulltext'].str.contains("trip"), "trip","NaN")))))))))

df['postflight'] = pd.np.where(df['mentionfulltext'].str.contains("board|boarding|check|checkin|onboard"), "boarding",
                            pd.np.where(df['mentionfulltext'].str.contains("baggage|luggage|lost"), "luggage",
                            pd.np.where(df['mentionfulltext'].str.contains("cancel|cancellation|cancelation|overbook|delay"), "nuisance",
                            pd.np.where(df['mentionfulltext'].str.contains("lounge|reception"), "reception",
                            pd.np.where(df['mentionfulltext'].str.contains("price|cost|charge|money"), "price",
                            pd.np.where(df['mentionfulltext'].str.contains("compensation"), "compensation",
                            pd.np.where(df['mentionfulltext'].str.contains("book|online"), "booking",
                            pd.np.where(df['mentionfulltext'].str.contains("connection|layover|transit"), "transit",
                            pd.np.where(df['mentionfulltext'].str.contains("offer|special offer"), "offer","NaN")))))))))

egy_air = df[(df['brand'] == 'EgyptAir') & (df['sentiment'] != 'Neutral')]

In [None]:
# Inflight Data
egy_in = egy_air[(egy_air['inflight'] != 'NaN')]
inflight_int = egy_in.groupby(['inflight','sentiment'], as_index = False)['mentioninteractions'].sum()
# Postflight Data
egy_post = egy_air[(egy_air['postflight'] != 'NaN')]
postflight_int = egy_post.groupby(['postflight','sentiment'], as_index = False)['mentioninteractions'].sum()

In [None]:
fig = px.bar(inflight_int, y='mentioninteractions', x='inflight',color = 'sentiment', barmode = 'stack',)
fig.show()

Most of inflight topics, though low interactions show more positive sentiment than the negative. Noteably, inflight service topics show the most positive interactions, meaning that customer experience in that area always returned satisfactory reviews.

Egypt Air should look into retaining the level of services that could yeald more atraction to customers.

In [None]:
fig = px.bar(postflight_int, y='mentioninteractions', x='postflight',color = 'sentiment', barmode = 'stack')
fig.show()

Looking at the postflight experience. Interactions are yet weak. But showing some interesting insights about the topics. interms of boarding (checkin boarding and so), nearly split reviews. This might indicate that Egypt Air do not offer a standout boarding procedures to their customers. Possibly look into applying digital boarding procedures and look for more ways to ease on the customers their boarding.

Most of the customers face a hard time in booking their flight (online or offline), and this can be due to server connection issues or slow booking process on the shelf.

Luggage is a critical topic, nearly 90% of the customers reported bad experiences concerning their luggage. This can be lost luggage, damaged, stolen, or any. A compensation policy should be able to tackle this experience. Moreover, ensure handling of luggage in and out of the planes might not be a complete ownership, but Egypt Air can maintain a level of reliability of their luggage handling inside airports.

Egypt Air needs to have more sorting on flights that got cancelled and be able to reimburse customers, or reduce the amount of overbooked flights, maybe by creating an overlapping schedule for high demand destinations.

A final note is that in terms of prices, some low as 44 interactions still provided positive feedback.

### Postflight/Inflight demographics
#### Gender data analysis

In [None]:
test = df.groupby(['inflight','sentiment','authorgender','brand'], as_index = False)['mentioninteractions'].sum()

test = test[(test['authorgender'] != 'Unknown') & (test['sentiment'] != 'Neutral') & (test['inflight'] != 'NaN') & 
            (test['brand']=='EgyptAir') & (test['authorgender'] != 'organization')]
fig = px.treemap(test, path=['authorgender', 'sentiment', 'inflight'], values='mentioninteractions')
fig.show()

In [None]:
test = df.groupby(['postflight','sentiment','authorgender','brand'], as_index = False)['mentioninteractions'].sum()

test = test[(test['authorgender'] != 'Unknown') & (test['sentiment'] != 'Neutral') & (test['postflight'] != 'NaN') & 
            (test['brand']=='EgyptAir') & (test['authorgender'] != 'organization')]
fig = px.treemap(test, path=['authorgender', 'sentiment', 'postflight'], values='mentioninteractions')
fig.show()

nearly 3/4 of the tweets out there originated from males. For the inflight category, the most tweeted positive topic among both genders' tweets is the inflight services. Females though, shared different experiences beyond service, the second most topic is the entertainment (for males it was entertainment). Commonly between both, the flight trip tweeted as a bad experience.

Postflight negative main topic was luggage in both genders. Same outlook for males and females with sharing 2nd and 3rd topics with bad experience (booking and nuisance). As stated before, Egypt Air needs to improve customer experience in boarding, booking, and luggage. Having improvements there would yield better experience and more exposure.

Also, promoting the brand between female audience is important, as the female gender interaction with Egypt Air's brand is weak. More customer engagement would improve the traffiking volume of females and thus interactions (positive of course).

### Countries with most interactions

In [None]:
test = df.groupby(['inflight','sentiment','authorcountry','brand'], as_index = False)['mentioninteractions'].sum()

test = test[(test['authorcountry'] != 'Unknown') & (test['sentiment'] != 'Neutral') & (test['inflight'] != 'NaN') & 
            (test['brand']=='EgyptAir')]
fig = px.treemap(test, path=['authorcountry', 'sentiment', 'inflight'], values='mentioninteractions')
fig.show()

In [None]:
test = df.groupby(['postflight','sentiment','authorcountry','brand'], as_index = False)['mentioninteractions'].sum()

test = test[(test['authorcountry'] != 'Unknown') & (test['sentiment'] != 'Neutral') & (test['postflight'] != 'NaN') & 
            (test['brand']=='EgyptAir')]
fig = px.treemap(test, path=['authorcountry', 'sentiment', 'postflight'], values='mentioninteractions')
fig.show()

The final lookout is the reach of the brand to other places, originally an Egyptian ompany Egypt Air operates in many countries.

Passengers from Ireland had the best inflight experience. As all tweets originated from there reported positive on the service topic. For Egyptian passengers too, service was on top. It is important to note that a full positive is a limitation on data, brand promotion in other countries is highly recommended to yield more customer reach (thus having mor interactions with these audience on twitter).

For the postflight, posts from Nigeria show that customers experienced a non-satisfying time, with topics being heard over twitter, boarding, luggage and booking experiences. Though some positives along the way, still an indication of brand promotion overseas is very important. An opportunity that can be siezed here, is that customers from different countries mentioning the airline, which makes it a good deal to explore these reigons.

## Etihad Airways mentions

In [None]:
df['etihadmentions'] = df.mentionfulltext.str.contains('etihad|etihadairways|etihad airways')

In [None]:
etihad = df[df['etihadmentions'] == True]

In [None]:
#etihad.to_csv('etihad_mentions.csv')

## Acknowledgements
- [This link gave the main influence](https://awario.com/blog/airline-industry-social-listening-report/)
For reach caluclations I used this [link](https://www.dummies.com/education/internet-basics/how-to-understand-social-mention-metrics/).

- [Brandwatch](https://www.brandwatch.com/blog/how-airlines-can-use-social-listening-to-boost-online-reputation/) is a guide to some of the topis metrics I am trying to represent, I used the same annotation to categorizing topics (postflight and inflight).