# Crisis Sentiment Analysis

This activity is a mini-project where students will create a data visualization dashboard, they have to analyze sentiment and tone about the news related to the financial crisis of 2008 that where published along the last month. Students will retrieve the news articles from the News API; by default, the developer account gives access to news articles up to a month old.

In this activity, students will use their new sentiment analysis skills, in combination to some of the skills they already master such as: Pandas, Pyviz, Plotly Express and PyViz Panel.

This Jupyter notebook is a sandbox where students will conduct the sentiment analysis tasks and charts creation before assembling the dashboard.

In [None]:
# Initial imports
import os
from path import Path
import pandas as pd
import numpy as np
import hvplot.pandas
import nltk
from wordcloud import WordCloud
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from newsapi import NewsApiClient
from ibm_watson import ToneAnalyzerV3
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
import plotly.express as px
import matplotlib.pyplot as plt
import matplotlib as mpl
import panel as pn

plt.style.use("seaborn-whitegrid")
pn.extension("plotly")


## Instructions

### Fetching the Latests News Metions About the Crisis of 2008

Using the News API, get all the news in English about the financial crisis of 2008 using the keywords `"financial AND crisis AND 2008"` in the `q` parameter. Define a `page_size=100` to have at least 100 news articles to analyze.

In [None]:
# Retrieve the News API key
news_api = os.getenv("news_api")



In [None]:
# Create the newsapi client



In [None]:
# Fetch the news articles about the financial crisis on 2008 in English


# Show the total number of news



### Creating a VADER Sentiment Scoring Function

Use the VADER sentiment scoring function from `NLTK` to score the sentiment polarity of the 100 news you fetched. Just for convenience, start downloading the `vader_lexicon` in order to initialize the VADER sentiment analyzer.

In [None]:
# Download/Update the VADER Lexicon
nltk.download("vader_lexicon")



In [None]:
# Initialize the VADER sentiment analyzer



In order to score the VADER sentiment, create a function named `get_sentiment_scores(text, date, source, url)` that will receive four parameters.

* `text` is the text whose sentiment will be scored.
* `date` the date the news article was published using the format `YYYY-MM-DD`.
* `source` is the name of the news article's source.
* `url` is the URL that points to the article.

The `get_sentiment_score()` function should return a Python dictionary with the scoring results. This dictionary is going to be used in the next section to create a DataFrame; the structure of the dictionary is the following:

* `date` the date passed as parameter to the function.
* `text` the text passed a parameter to the function.
* `source` the source passed as parameter to the function.
* `url` the URL passed as parameter to the function.
* `compound` the compound score from the VADER sentiment analyzer.
* `pos` the positive score from the VADER sentiment analyzer.
* `neu` the neutral score from the VADER sentiment analyzer.
* `neg` the negative score from the VADER sentiment analyzer.
* `normalized` the normalized scored based on the `compound` results. Its value should be `1` for positive sentiment, `-1` for negative sentiment, and `0` for neutral sentiment.

This is an example of the function's return value:

```python
{'date': '2019-06-24',
'text': '\nMore than a decade since the global economic meltdown of 2008
    devastated lives across the world, no one who caused the crisis has
    been held responsible.\n\n"The 2008 financial crisis displayed what
    the world now identifies as financial contagion," says Philip J Baker,
    the former managing partner of a US-based \nhedge fund that collapsed
    during the financial crisis.\n\nDespite this, "zero Wall Street chief
    executives have been to prison, even though there is today absolutely
    no doubt that Wall Street executives and politicians \nwere complicit
    in creating the crisis," he says. \n\nBaker was among the few
    relatively smaller players imprisoned for the part they played.\n\n
    In July 2009, he was arrested in  Germany and extradited to the
    United States where he faced federal court on charges of fraud and
    financial crimes.\n\nHe pled guilty and was sentenced to 20 years
    in prison for costing some 900 investors about $294mn worldwide.
    He served eight years in jail and is now on \nparole and advocates
    against financial crime.\n',
'source': 'aljazeera',
'url': 'https://www.aljazeera.com/programmes/specialseries/2019/06/men-stole-world-2008-financial-crisis-190611124411311.html',
'compound': -0.9911,
'pos': 0.048,
'neu': 0.699,
'neg': 0.254,
'normalized': -1}
```

In [None]:
# Define a function to get the sentiment scores
def get_sentiment_scores(text, date, source, url):
    sentiment_scores = {}

    return sentiment_scores



### Creating the News Articles' Sentiments DataFrame

In this section you have to create a DataFrame that is going to be used to plot the sentiment analysis results. Using a `for-loop`, iterate across all the news articles you fetched to create the DataFrame structure; define an empty list to append the sentiment scoring results for each news article and create the DataFrame using the list as data source.

Once you create the DataFrame do the following:

* Sort the DataFrame rows by the `date` column.
* Define the `date` column as the DataFrame index.
* Save the DataFrame as a CSV file in order to use it on the sentiment analysis dashboard creation.

In [None]:
# Empty list to store the DataFrame structure
sentiments_data = []

# Loop through all the news articles
for article in crisis_news_en["articles"]:
    try:
        # Get sentiment scoring using the get_sentiment_score() function
        
    except AttributeError:
        pass

# Create a DataFrame with the news articles' data and their sentiment scoring results

# Sort the DataFrame rows by date

# Define the date column as the DataFrame's index



In [None]:
# Save the news articles DataFrame with VADER Sentiment scoring as a CSV file



### Creating the Average Sentiment Chart

Use `hvPlot` to create a two lines chart that compares the average `compound` and `normalized` sentiment scores along the last month.

In [None]:
# Define the average sentiment DataFrame



In [None]:
# Create the two lines chart



### Creating the Sentiment Distribution Chart

Based on the `normalized` sentiment score, create a bar chart using `hvPlot` that shows the number of negative, neutral and positive news articles. This chart represents the overall sentiment distribution.

In [None]:
# Define the sentiment distribution DataFrame



In [None]:
# Create the sentiment distribution bar chart



### Getting the Top 10 Positive and Negative News Articles

In this section you have to create two DataFrames, one with the top 10 positive news according to the `compound` score, and other with the top 10 negative news. Refer to the [`hvplot.table()` documentation](https://hvplot.pyviz.org/user_guide/Plotting.html#Tables) to create two tables presenting the following columns of these news articles:

* Date
* Source
* Text
* URL

In [None]:
# Getting Top 10 positive news articles



In [None]:
# Create a table with hvplot



In [None]:
# Getting Top 10 negative news articles



In [None]:
# Create a table with hvplot



### Creating the Sentiment Distribution by News Article's Source

In this section, use `hvPlot` to create a bar chart that presents the distribution of negative, neutral and positive news according to the `normalized` score; the results should be grouped by `source`.

In [None]:
# Create the sentiment distribution by news articles' source DataFrame



In [None]:
# Create the sentiment distribution by news articles' source bar chart



### Creating the Word Clouds

In this section you will create two word clouds, one using the bag-of-words method and other using TF-IDF.

#### Bag-of-Words' Word Cloud

Use the `CountVectorizer` module from `sklearn` to create a word cloud with the top 20 words with the highest counting. Save the DataFrame with the top 20 words as a CSV file named `top_words_data.csv` for future use on the dashboard creation.

In [None]:
# Creating the CountVectorizer instance defining the stopwords in English to be ignored



In [None]:
# Getting the tokenization and occurrence counting



In [None]:
# Retrieve unique words list

# Get the last 100 word (just as a sample)



In [None]:
# Getting the bag of words as DataFrame


# Sorting words by 'Word_Count' in descending order



In [None]:
# Get top 20 words with the highest counting



In [None]:
# Save the top words DataFrame



In [None]:
# Create a string list of terms to generate the bag-of-words word cloud



In [None]:
# Create the bag-of-words word cloud



#### TF-IDF Wordcloud

Use the `TfidfVectorizer` module from `sklearn` to create a word cloud with the top 20 words with the highest frequency. Save the DataFrame with the top 20 words as a CSV file named `top_wors_tfidf_data.csv` for future use on the dashboard creation.

In [None]:
# Getting the TF-IDF



In [None]:
# Retrieve words list from corpous

# Get the last 100 word (just as a sample)



In [None]:
# Creating a DataFrame Representation of the TF-IDF results

# Sorting words by 'Frequency' in descending order



In [None]:
# Get 20 top words



In [None]:
# Save the top words TF-IDF DataFrame



In [None]:
# Create a string list of terms to generate the tf-idf word cloud



In [None]:
# Create the tf-idf word cloud



## Challenge: Radar Chart with Tone Analysis

In this challenge section, you have to use Plotly Express and IBM Watson Tone Analyzer to create a radar chart presenting the tone of all the news articles that you retrieved.

Refer to the [polar coordinates chart demo](https://plot.ly/python/plotly-express/#polar-coordinates) and the [Plotly Express reference documentation](https://www.plotly.express/plotly_express/#plotly_express.scatter_polar) to learn more about how to create this chart.

In [None]:
# Get the Tone Analyzer API Key and URL
tone_api = os.getenv("tone_api")
tone_url = os.getenv("tone_url")



In [None]:
# Initialize Tone Analyser Client



In order to create the radar chart, you need to score the tone of each article and retrieve the `document_tone`. Create a function named `get_tone(text,url)` that will receive two parameters and will get the tone score for a particular article.

* `text` the content of the article.
* `url` the URL pointing to the article.

The `get_tone()` function will use the `tone()` method from the `ToneAnalyzerV3` module to score the article's tone. Remember that for each document (or text), the `tone()` method of IBM Watson Tone Analyzer [scores one or more overall document tones](https://cloud.ibm.com/apidocs/tone-analyzer#analyze-general-tone-get), you can also get and empty result if no tone were scored; this function should return a dictionary with the first document tone's score with the following structure:

* `score` refers to the first `tone` from the `document_tone`.
* `tone_id` refers to the `tone_id` from the first `tone`.
* `tone_name` refers to the `tone_name` from the first `tone`.
* `text` the text passed as parameter.
* `url` the URL passed as parameter.

This is an example of the function's return value:

```python
{'score': 0.616581,
'tone_id': 'sadness',
'tone_name': 'Sadness',
'text': '\nMore than a decade since the global economic meltdown of 2008
    devastated lives across the world, no one who caused the crisis has
    been held responsible.\n\n"The 2008 financial crisis displayed what
    the world now identifies as financial contagion," says Philip J Baker,
    the former managing partner of a US-based \nhedge fund that collapsed
    during the financial crisis.\n\nDespite this, "zero Wall Street chief
    executives have been to prison, even though there is today absolutely
    no doubt that Wall Street executives and politicians \nwere complicit
    in creating the crisis," he says. \n\nBaker was among the few
    relatively smaller players imprisoned for the part they played.\n\n
    In July 2009, he was arrested in  Germany and extradited to the
    United States where he faced federal court on charges of fraud and
    financial crimes.\n\nHe pled guilty and was sentenced to 20 years
    in prison for costing some 900 investors about $294mn worldwide.
    He served eight years in jail and is now on \nparole and advocates
    against financial crime.\n',
'url': 'https://www.aljazeera.com/programmes/specialseries/2019/06/men-stole-world-2008-financial-crisis-190611124411311.html'}
```

In [None]:
# Create a function to analyze the text's tone with the 'tone()' method of IBM Watson Tone Analyzer.
def get_tone(text, url):
    try:
        
    except:
        pass



Create a DataFrame with the tone scoring from all the news articles. Use an empty list to create a the DataFrame's structure and a `for-loop` to iterate across all the news to score their tone using the `get_tone()` function.

In [None]:
# Create an empty list to create the DataFrame's structure

# Iterate across all the news articles to score their tone.
print(f"Analyzing tone from {crisis_news_df.shape[0]} articles...")
for index, row in crisis_news_df.iterrows():
    try:
        print("*", end="")
        # Get news article's tone
        
    except:
        pass
print("\nDone :-)")



In [None]:
# Create the DateFrame containing the news articles and their tone scoring results.



Save the DataFrame as a CSV file named `tone_data.csv` for further use on the dashboard creation.

Create a radar chart using the `scatter_polar()` method from Plotly Express as follows:

* Use the `score` column for the `r` and `color` parameters.
* Use the `tone_name` column for the `theta` parameter.
* Use the `url` column for the `hover_data` parameter.
* Define a `title` for the chart.

In [None]:
# Create the radar chart

