In [None]:
data_file = '../input/Tweets.csv'
parser = lambda x: dt.datetime.strptime(x[:-6], '%Y-%m-%d %H:%M:%S')
tweets = pd.read_csv(data_file, index_col = 'tweet_id',
                     parse_dates=[12], date_parser = parser)
pd.options.display.max_rows = 8
tweets[['airline_sentiment','airline', 'retweet_count', 
        'text', 'tweet_created']]

# Results

## Tweets over the week

Below we show the distribution of tweets by day of week for each airline (color), and sentiment.  From top to bottom the different panels show neutral sentiment, positive sentiment, and negative sentiment.  Day of week 0 is Monday, while day of week 6 is a Sunday.

For all airlines but American and Southwest, the number of complaints peak on Sunday and slowly decline throughout the week, while the number of positive comments peak on Tuesday.  In contrast, tweets peak on Monday for American Airlines, but the number of tweets experience a sharp decline in the middle of the week.  For Southwest, the number of compaints tend to peak on Tuesdays and Saturdays.

In [None]:
tweets['dow'] = tweets.tweet_created.dt.dayofweek

g = sb.FacetGrid(tweets, row = 'airline_sentiment', 
                 hue = 'airline', legend_out = True,
                 aspect = 4, size = 2.5)
g.map(sb.distplot, 'dow', hist = False)
g.add_legend()
g.axes.flat[0].set_xlim(0,6)
g.axes.flat[2].set_xlabel('Day of Week')

## Tweet sentiment by airline

In the following figure, we study how people tweet and retweet about different airlines.  From top to bottom, we show the number of tweets for each airline and sentiment, the number of retweets for each airline and sentiment, and the ratio of retweets to tweets for each airline and sentiment.  Sentiment is colored red, blue, or green to respectively represent negative, neutral, or positive sentiment.  Twitter users love to hate all of the airlines except Virgin America.  United is clearly the most reviled of all airlines.  Interestingly, the retweet efficiency is airline dependent.  Tweets about United, American, and US Airways are more likely to be retweeted if they are negative, while tweets about Southwest, Delta, and Virgin America are more likely to be retweeted if they are positive.  This general trend anecdotally reflects this author's personal opinion of airline quality. We speculate that Twitter users are retweeting comments that they personally agree with.  Consequently, we suggest that the retweet efficiency could be a useful metric for determining public opinion.

In [None]:
groups = tweets.groupby([tweets.airline, 
                         tweets.airline_sentiment])

retweet_table = groups.retweet_count.apply(sum)
my_colors = list(islice(cycle(['r', 'b', 'g']), 
                        None, len(retweet_table)))
fig, ax = plt.subplots(3, sharex = True)
groups.count().name.plot(kind = 'bar', color = 
                         my_colors, title = 
                         '# of Tweets', ax = ax[0])

retweet_table.plot(kind = 'bar', color= my_colors, 
                   title = '# of Retweets', ax = ax[1])
(retweet_table/groups.count().name).plot(
    kind = 'bar', color = my_colors, 
    title = 'Retweet Efficiency', ax = ax[2])

In [None]:
# This R environment comes with all of CRAN preinstalled, as well as many other helpful packages
# The environment is defined by the kaggle/rstats docker image: https://github.com/kaggle/docker-rstats
# For example, here's several helpful packages to load in 

library(ggplot2) # Data visualization
library(readr) # CSV file I/O, e.g. the read_csv function

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

system("ls ../input")

# Any results you write to the current directory are saved as output.

In [None]:
# This Julia environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/julia docker image: https://github.com/kaggle/docker-julia
# For example, here's a helpful package to load in 

using DataFrames # data processing, CSV file I/O - e.g. readtable("../input/MyTable.csv")

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

run(`ls ../input`)

# Any results you write to the current directory are saved as output.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

# Any results you write to the current directory are saved as output.