# Twitter API for Airline Sentiment Analysis
Author: Matthew Huh


The objective of this program is to retrieve tweets about each airline from twitter. Since the twitter API limits access to tweets from the past week, the data is time sensitive, and non-repeatable. The output of this program will be a csv file that will be used as the testing set for the main project. To see how this dataset will be utilized, please view the project below.

Sentiment Analysis using Airline Tweets
https://github.com/mhuh22/Thinkful/blob/master/Bootcamp/Unit%207/Sentiment%20Analysis%20using%20Airline%20Tweets.ipynb

### Packages

In [1]:
import tweepy
import twitter_credentials
from textblob import TextBlob
import pandas as pd
import time
import os

In [2]:
# Entering credentials to utilize Tweepy

auth = tweepy.OAuthHandler(twitter_credentials.CONSUMER_KEY, twitter_credentials.CONSUMER_SECRET)
auth.set_access_token(twitter_credentials.ACCESS_TOKEN, twitter_credentials.ACCESS_TOKEN_SECRET)

api = tweepy.API(auth)

In [3]:
class listener (tweepy.StreamListener):
     
    def on_data(self, data, time_limit=1):
        self.start_time = time.time()
        self.limit = time_limit
        try:
            print(data)
            
#             tweet = data.split(', "text":"')[1].split('","source')[0]
#             print(tweet)
            
#             saveThis = str(time.time()) + '::' + tweet
            
            saveFile = open('twitDB.csv', 'a')
            saveFile.write(data)
            saveFile.write('\n')
            saveFile.close()
            return(True)
        except BaseException:
            print('failed ondata', str(e))
            time.sleep(5)
    
    def on_error(self, status):
        print(status)

In [4]:
# Create dataframe to store twitter information
tweet_df = pd.DataFrame(columns=['airline', 'text'])

# A list of the official airline accounts
airlines = ['AmericanAir', 'delta_airline4', 'SouthwestAir', 'USAirways', 'united', 'VirginAirline']

In [7]:
# Access data for each airline, and append to the dataframe
# (Technically works, but only returns 75 results)
for airline in airlines:
    airline_tweets = api.search(airline, 
                                count=1000)
    for tweet in airline_tweets:
        tweet_df = tweet_df.append({'airline': airline, 'text':tweet.text}, ignore_index=True)

In [11]:
tweet_df.shape

(708, 2)

In [12]:
tweet_df.head()

Unnamed: 0,airline,text
0,AmericanAir,@AmericanAir Must admit. Pretty disappointed i...
1,AmericanAir,"RT jonnajarian ""Chicago pilotsüë®‚Äç‚úàÔ∏èüë©‚Äç‚úàÔ∏è America..."
2,AmericanAir,RT @RoseBartu: yasssss!!! @americanair #upgrad...
3,AmericanAir,RT @TheRealTea_Ling: Are there any cyber Monda...
4,AmericanAir,RT @jonnajarian: What a class act: Chicago pil...


In [10]:
# Save extracted data to a local directory
if os.path.exists('airline_tweets/test_set.csv'):
    os.remove('airline_tweets/test_set.csv')
tweet_df.to_csv('airline_tweets/test_set.csv', encoding='utf-8')

### Note: Accessing this file from another location

This file requires a separate file called twitter_credentials.py with the following format

CONSUMER_KEY = 
CONSUMER_SECRET = 
ACCESS_TOKEN = 
ACCESS_TOKEN_SECRET = 

Find credentials here after logging in 
https://apps.twitter.com/app/15976800/keys