# Assignment Explanation

In this assignment, you'll create a Python script to perform a sentiment analysis of the Twitter activity of various news oulets, and to present your findings visually.

Your final output should provide a visualized summary of the sentiments expressed in the last 100 Tweets sent out by the following news organizations: __BBC, CBS, CNN, Fox, and New York times__.

Your final Jupyter notebook must:

* Pull last 100 tweets from each outlet.
* Perform a sentiment analysis with the compound, positive, neutral, and negative scoring for each tweet. 
* Pull into a DataFrame the tweet's source acount, its text, its date, and its compound, positive, neutral, and negative sentiment scores.
* Export the data in the DataFrame into a CSV file.
* Save PNG images for each plot.

As final considerations:

* Use the Matplotlib and Seaborn libraries.
* Include a written description of three observable trends based on the data. 
* Include proper labeling of your plots, including plot titles (with date of analysis) and axes labels.
* Include an exported markdown version of your Notebook called  `README.md` in your GitHub repository.  


Hints, requirements, and considerations:

* You may find it helpful to organize your code in function(s), then call them.
* If you're not yet familiar with creating functions in Python, here is a tutorial you may wish to consult: [https://www.tutorialspoint.com/python/python_functions.htm](https://www.tutorialspoint.com/python/python_functions.htm).



## Copyright

Coding Boot Camp (C) 2017. All Rights Reserved.

# NewsVADERPy: News Mood Analysis

- Trend 1
- Trend 2
- Trend 3

In [4]:
# ----------------------------------------------------------------------
# Step 1: Import necessary modules and environment (which contains
# Twitter API keys) and set up Twitter API authentication and the VADER
# Sentiment Analyzer
# ----------------------------------------------------------------------
# import libraries
import matplotlib as plt
import pandas as pd
import seaborn as sns

import tweepy

# import environment, then import API keys from environment
import os
consumer_key = os.environ['twitter_consumer_key']
consumer_secret = os.environ['twitter_consumer_secret']
access_token = os.environ['twitter_access_token']
access_token_secret = os.environ['twitter_access_token_secret']

# import + initialize VADER
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

# Setup Tweepy API Authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, parser=tweepy.parsers.JSONParser())

In [32]:
# ----------------------------------------------------------------------
# Step 2: Create function to parse and clean tweet data and return 
# sentiment information based on VADER
# ----------------------------------------------------------------------
def parseTweets(targetNewsOrg_list,numTweets):
    """
    This function takes in two arguments:
        1) Twitter handle (String), and 
        2) the number of (most recent) tweets you want analyzed. (int) 
    
    It returns a  list of dictionaries with the following key:value 
    pairs for each tweet:
        - "handle":"handle" (str)
        - "date":timestamp
        - "compound":value (float)
        - "positive":value (float)
        - "neutral":value (float)
        - "negative":value (float)
    """
    # variable to store oldest tweet
    oldest_tweet = None   
    
    # create an empty list to store dictionaries
    results_list = []
   
    # ----------------------------------------------------------------------
    # Step 2.5: 
    # - Iterate in increments of 10 until you get full numTweets.
    # - For each set of 10:
    #   - iterate through, 
    #   - analyze with VADER, then 
    #   - add to lists. 
    # ----------------------------------------------------------------------
    
    # loop through each organization in the list to pull tweets
    for i in range(len(targetNewsOrg_list)):
        # select the current news org from the list
        handle = targetNewsOrg_list[i]
        
        # iterate the necessary number of times to get the requested numTweets
        for i in range(numTweets):
            # get list of tweets, then increment max id so no double-counting
            try:
                tweet_list = api.user_timeline(f"@{handle}", count=10, max_id=oldest_tweet)
            except Exception:
                raise
        
            # iterate over each tweet in the list to run analysis
            tweetAnalysis = analyzer.polarity_scores(tweet_list[i]['text'])
            
            # add dictionary holding results to results list
            results_list.append({"handle":handle,"date":tweet_list[i]['created_at'], "compound":tweetAnalysis['compound'],"positive":tweetAnalysis['pos'],"neutral":tweetAnalysis['neu'],"negative":tweetAnalysis['neg']})
        
        # reduce max id by one so it doesn't skip a tweet next round
        oldest_tweet = int(tweet_list[i]['id_str']) - 1
    
    return pd.DataFrame(results_list)

In [33]:
# ----------------------------------------------------------------------
# Step 3: Call API, get tweets, and parse tweets into a dataframe+CSV
# ----------------------------------------------------------------------

# create list of target news organizations' Twitter handles
targetNewsOrg_list = ["BBC","CNN"]

# define number of tweets we want to pull from each org
numTweets = 10

# create dict to store dictionaries generated during analysis
completeResults_df = parseTweets(targetNewsOrg_list, numTweets)

completeResults_df

Unnamed: 0,compound,date,handle,negative,neutral,positive
0,0.4019,Sun Mar 04 20:11:35 +0000 2018,BBC,0.089,0.726,0.184
1,0.8468,Sun Mar 04 19:00:05 +0000 2018,BBC,0.0,0.614,0.386
2,0.4215,Sun Mar 04 18:59:39 +0000 2018,BBC,0.0,0.882,0.118
3,-0.3182,Sun Mar 04 18:00:09 +0000 2018,BBC,0.099,0.901,0.0
4,0.4588,Sun Mar 04 16:57:46 +0000 2018,BBC,0.0,0.889,0.111
5,0.6369,Sun Mar 04 16:30:12 +0000 2018,BBC,0.0,0.811,0.189
6,0.3382,Sun Mar 04 15:30:29 +0000 2018,BBC,0.0,0.87,0.13
7,0.1779,Sun Mar 04 15:00:06 +0000 2018,BBC,0.0,0.876,0.124
8,0.0498,Sun Mar 04 14:03:03 +0000 2018,BBC,0.17,0.65,0.18
9,0.802,Sun Mar 04 13:46:48 +0000 2018,BBC,0.0,0.702,0.298


The first plot will be and/or feature the following:

- Be a scatter plot of sentiments of the last 100 tweets sent out by each news organization, ranging from -1.0 to 1.0, where a score of 0 expresses a neutral sentiment, -1 the most negative sentiment possible, and +1 the most positive sentiment possible.
- Each plot point will reflect the compound sentiment of a tweet.
- Sort each plot point by its relative timestamp.


In [None]:
# ----------------------------------------------------------------------
# Step 4: Generate first plot: scatterplot of last 100 tweets showing 
# compound sentiment and sorted by relative timestamp
# ----------------------------------------------------------------------

bar plot visualizing the overall sentiments of the last 100 tweets from each organization. 
For this plot, you will again aggregate the compound sentiments analyzed by VADER.



In [None]:
# ----------------------------------------------------------------------
# Step 5: Generate second plot: bar plot showing overall compound 
# sentiment in the last 100 tweets
# ----------------------------------------------------------------------