# Exploratory Analysis Notebook

#### This notebook contains a demonstration of the tools necessary for conducting exploratory analysis of the data. This includes searching for specific users, topics and hashtags and then using descriptive analytics (e.g. frequency analysis, descripting statistics and temporal frequency).

In [2]:
import trt_API.process as proc

## Variables for Analysis

In [4]:
# Set the path to the parent directory containing all Tweets of interest
# Use a wildcard (i.e., *) if directory contains subdirectories
DIRECTORY = './../../../airline-tweets/*'
# Set to True to isolate english language tweets
ENGLISH = False

## Load Tweet and Generate Dataframe

In [5]:
tweet_objects = proc.loadTweetObjects(DIRECTORY)
df = proc.convertTweetsToDataframe(tweet_objects, ENGLISH)

Loaded utf-8 df.
Initial size: 115132
Dropping duplicates...
Final size: 87074


## Remove ReTweets

In [7]:
cldf = proc.removeRetweets(df)

Removed 672 duplicates.


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['RT'][df.tweet.astype(str).str[0:2] == 'RT'] = df.tweet.str.split(':',expand=True).iloc[:,0]


## Extract Potential Cashtags

In [8]:
ctdf = proc.extractPossibleCashtags(df)

Total potential Cashtags: 53


## Removing Noisy Tweets

In [9]:
'''
*** Tweets often use popular hashtags with unrelated topics.
*** Noisy words can be identified to use to filter such tweets.
*** Enter these words below in the noisy_terms list.
'''
noisy_terms = ['#GoldenGlobes']
cldf = proc.removeNoisyTerms(df, noisy_terms)

Removed 0 noisy terms.


In [10]:
for i in range(0,70):
    if cldf.original_tweet.iloc[i]== 'None':
        print(cldf.tweet.iloc[i]+'\n\n')
    else:
        print(cldf.original_tweet.iloc[i]+'\n\n')

trying to really understand how a flight gets delayed from 725p to 945p , because of catering?? @Delta  stuck @ JFK. STILL.


Mt. Fujiâ¦ itâs time. Summit season is upon us. Are you trying to make a trip to the top this year? ð» https://t.co/4xt8pQanMe


D-Day has arrived. The @FAIreland Cerebral Palsy squad take off with @Ryanair to the @ifcpf World Cup in Seville. Thanks to Ryanair for their support. ð®ðªð®ðªð®ðª #COYBIG @YouBoysInGreen @CRISClubs @ParalympicsIRE https://t.co/TtuyJq5zDt


Suitcase been left at Gatwick, day 2 on holiday and still no contact or updates, what is going on??? @easyJet


Itâs 2019 and youâre telling me @SpiritAirlines seriously thought it was a good idea to ask Sis to leave her PAID seat for this heifer?! ðð¾ââï¸ https://t.co/QnCXE6rEWv


Wow! Black passenger on @SpiritAirlines told to move from seat she purchased after white women refuses to sit next to her.


Just forgot how terrible @easyJet is; all the flights Iâve take

In [18]:
cldf.head(20)

Unnamed: 0,date,followers,username,location,tweet,id,original_tweet,RT
0,Jul 05 00:52:15 2019\t0,520,claimcompanies,,RT @BexxFrancois: trying to really understand ...,1146944848173383681,trying to really understand how a flight gets ...,RT @BexxFrancois
4,Jul 05 03:32:40 2019\t0,279,krishnayana95,"Bali, Indonesia",RT @FlyANA_official: Mt. Fujiâ¦ itâs time. ...,1146985218349289473,Mt. Fujiâ¦ itâs time. Summit season is upon...,RT @FlyANA_official
8,Jul 05 07:21:16 2019\t0,1246,CaraCentre_ie,"IT Tralee, Kerry, Ireland",RT @oisin76: D-Day has arrived. The @FAIreland...,1147042747406266368,D-Day has arrived. The @FAIreland Cerebral Pal...,RT @oisin76
16,Jul 05 09:23:27 2019\t0,172,harryhutson16,,"RT @Joesmithhx: Suitcase been left at Gatwick,...",1147073495861518337,"Suitcase been left at Gatwick, day 2 on holida...",RT @Joesmithhx
28,Jul 05 15:05:55 2019\t0,2457,MomentsWithMani,,RT @CollegeSistas: Itâs 2019 and youâre te...,1147159680386572289,Itâs 2019 and youâre telling me @SpiritAir...,RT @CollegeSistas
32,Jul 05 16:12:03 2019\t0,507,dreaxdreaaaaa,in my own world,RT @tharealmelissa: Wow! Black passenger on @S...,1147176323384782848,Wow! Black passenger on @SpiritAirlines told t...,RT @tharealmelissa
34,Jul 05 17:55:49 2019\t0,5620,Flight_Refunds,"London, UK",RT @BelenSZ1985: Just forgot how terrible @eas...,1147202437146718209,Just forgot how terrible @easyJet is; all the ...,RT @BelenSZ1985
36,Jul 05 18:56:36 2019\t0,443,RespectMyBlk,Virginia,RT @CollegeSistas: Itâs 2019 and youâre te...,1147217733777547264,Itâs 2019 and youâre telling me @SpiritAir...,RT @CollegeSistas
40,Jul 12 00:02:00 2019\t0,107,sukila,"Bolton, CT USA",RT @MichelleKnefel3: @JoshuaPotash @AmericanAi...,1149468917368123392,@JoshuaPotash @AmericanAir what on earth is th...,RT @MichelleKnefel3
46,Jul 12 01:45:58 2019\t0,2740,natashasamani,"Udupi, India",RT @airlineguys: Oooo! Just saw the new @unite...,1149495081457479680,Oooo! Just saw the new @united livery in perso...,RT @airlineguys


In [14]:
odf = cldf.iloc[:,5:7]

In [16]:
odf.to_csv('../sample_airline.csv')