# News Headlines Sentiment

Use the news api to pull the latest news articles for bitcoin and ethereum and create a DataFrame of sentiment scores for each coin. 

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
from newsapi import NewsApiClient
%matplotlib inline

In [2]:
# Retrieve the News API key
api_key = os.getenv("news_api")

In [3]:
# Create the newsapi client
newsapi = NewsApiClient(api_key=api_key)

In [4]:
# Fetch the Bitcoin news articles
bitcoin = newsapi.get_everything(q="bitcoin",language="en",sort_by="relevancy")
bitcoin.keys()

dict_keys(['status', 'totalResults', 'articles'])

In [5]:
# Fetch the Ethereum news articles
etheruem = newsapi.get_everything(q="ethereum",language="en",sort_by="relevancy")
etheruem.keys()

dict_keys(['status', 'totalResults', 'articles'])

In [6]:
def sentiment_analizyer(articles_list):
    bitcoin_sentiment = []
    for c in articles_list:
        try:
            text = c["content"]
            date = c["publishedAt"]
            sentiment = analyzer.polarity_scores(text)
            compound = sentiment["compound"]
            pos = sentiment["pos"]
            neu = sentiment["neu"]
            neg = sentiment["neg"]
            bitcoin_sentiment.append({
                "text":text,
                "positive":pos,
                "negative":neg,
                "neutral":neu,
                "compound":compound
            })
        except AttributeError:
            pass
    bitcoin_df = pd.DataFrame(bitcoin_sentiment)
    cols = [ "compound","negative","neutral","positive","text",   ]
    bitcoin_df = bitcoin_df[cols]
    return bitcoin_df

In [7]:
# Create the Bitcoin sentiment scores DataFrame
bitcoin_df = sentiment_analizyer(bitcoin["articles"])
bitcoin_df.head()

Unnamed: 0,compound,negative,neutral,positive,text
0,0.0,0.0,1.0,0.0,PayPal has partnered with cryptocurrency compa...
1,0.2263,0.0,0.951,0.049,"Two days ago, about $1 billion worth of bitcoi..."
2,0.6808,0.0,0.838,0.162,PayPal is rolling out cryptocurrency support l...
3,-0.7184,0.154,0.846,0.0,The Financial Crimes Enforcement Network (FinC...
4,0.25,0.0,0.941,0.059,"2018’s jokes are 2020’s reality. I’m speaking,..."


In [8]:
# Create the ethereum sentiment scores DataFrame
etheruem_df = sentiment_analizyer(etheruem["articles"])
etheruem_df.head()

Unnamed: 0,compound,negative,neutral,positive,text
0,0.0,0.0,1.0,0.0,PayPal has partnered with cryptocurrency compa...
1,0.0,0.0,1.0,0.0,Breitling is partnering with Arianee to issue ...
2,0.6808,0.0,0.838,0.162,PayPal is rolling out cryptocurrency support l...
3,-0.4215,0.132,0.783,0.085,Portions of the global economy melted down in ...
4,0.0,0.0,1.0,0.0,FILE PHOTO: A worker pushing a trolley walks w...


In [9]:
# Describe the Bitcoin Sentiment
bitcoin_df.describe()

Unnamed: 0,compound,negative,neutral,positive
count,20.0,20.0,20.0,20.0
mean,0.01132,0.04995,0.8982,0.0518
std,0.46116,0.078992,0.092502,0.069552
min,-0.7184,0.0,0.716,0.0
25%,-0.2789,0.0,0.82675,0.0
50%,0.0,0.0,0.9285,0.0205
75%,0.2558,0.07725,1.0,0.07175
max,0.8225,0.215,1.0,0.229


In [10]:
# Describe the Ethereum Sentiment
etheruem_df.describe()

Unnamed: 0,compound,negative,neutral,positive
count,16.0,16.0,16.0,16.0
mean,0.168438,0.016188,0.922813,0.061062
std,0.366023,0.037754,0.09189,0.08394
min,-0.4215,0.0,0.703,0.0
25%,0.0,0.0,0.8615,0.0
50%,0.0,0.0,0.942,0.023
75%,0.416125,0.0,1.0,0.0835
max,0.9468,0.132,1.0,0.297


### Questions:

Q: Which coin had the highest mean positive score?

A: **The ``bitcoin`` beats the ``etheruem`` by just .02.**

Q: Which coin had the highest compound score?

A: **``Etheruem`` has a score of .912 while ``bitcoin`` only has a score of .822**

Q. Which coin had the highest positive score?

A: **``Etheruem`` had the highest postive score**

---

# Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word
2. Remove Punctuation
3. Remove Stopwords

In [11]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re
import nltk
from nltk.tokenize import RegexpTokenizer

In [12]:
# Expand the default stopwords list if necessary
sw_1 = set(stopwords.words('english'))
sw_2 = set([",",".","[","]","?","(",")","-","$","-"])
sw = sw_1.union(sw_2)

In [35]:
sentence_tokenized = [sent_tokenize(i) for i in bitcoin_df["text"]]
# --------------------------------------------------------------------------------------------------    
word_tokenized = []
for story in sentence_tokenized:
    words = []
    for sent in story:
        words = words + word_tokenize(sent)
    word_tokenized.append(words)
# -----------------------------------------------------------------------------------------------    
lower = []
for i in word_tokenized:
    words = []
    for c in i:
        if c not in sw:
            words.append(c.lower())
    lower.append(words)
# -----------------------------------------------------------------------------------------------    
re_sw = []
for i in lower:
    sw_re =[]
    for c in i:
        if c not in sw:
            sw_re.append(c)
    re_sw.append(sw_re)
tokens = re_sw 

In [37]:
len(tokens)

20

In [13]:
from nltk.stem import WordNetLemmatizer

In [14]:

# Lemmatize Words into root words
results55 = [WNL.lemmatize(word) for word in re_sw]

NameError: name 're_sw' is not defined

In [40]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    sentence_tokenized = [sent_tokenize(i) for i in text]
    # --------------------------------------------------------------------------------------------------    
    word_tokenized = []
    for story in sentence_tokenized:
        words = []
        for sent in story:
            words = words + word_tokenize(sent)
        word_tokenized.append(words)
# --------------------------------------------------------------------------------------------------    
    lower = []
    for i in word_tokenized:
        words = []
        for c in i:
            if c not in sw:
                words.append(c.lower())
        lower.append(words)
# --------------------------------------------------------------------------------------------------    
    re_sw = []
    for i in lower:
        sw_re =[]
        for c in i:
            if c not in sw:
                sw_re.append(c)
        re_sw.append(sw_re)
    tokens = re_sw       
    return tokens


In [41]:
len(tokenizer(bitcoin_df["text"]))

20

In [None]:
# Create a new tokens column for bitcoin
# YOUR CODE HERE!

In [None]:
# Create a new tokens column for ethereum
# YOUR CODE HERE!

---

# NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [None]:
from collections import Counter
from nltk import ngrams

In [None]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Use the token_count function to generate the top 10 words from each coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [None]:
money_news_df = money_news_df.sort_values(by=["Frequency"], ascending=False)
money_news_df.head(10)
top_words = money_news_df[(money_news_df["Frequency"] >= 10) & (money_news_df["Frequency"] <= 30)]
top_words.head(10)

In [None]:
# Get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [None]:
# Get the top 10 words for Ethereum
# YOUR CODE HERE!

# Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Create a string list of terms to generate the word cloud
terms_list = str(top_words["Word"].tolist())

# Create the word cloud
wordcloud = WordCloud(colormap="RdYlBu").generate(terms_list)
plt.imshow(wordcloud)
plt.axis("off")
fontdict = {"fontsize": 20, "fontweight": "bold"}
plt.title("Money News Word Cloud", fontdict=fontdict)
plt.show()


In [None]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

# Named Entity Recognition

In this section, you will build a named entity recognition model for both coins and visualize the tags using SpaCy.

In [None]:
import spacy
from spacy import displacy

In [None]:
# Optional - download a language model for SpaCy
!python -m spacy download en_core_web_sm

In [None]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

## Bitcoin NER

In [None]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

## Ethereum NER

In [None]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!