# News Headlines Sentiment

Use the news api to pull the latest news articles for bitcoin and ethereum and create a DataFrame of sentiment scores for each coin. 

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [52]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
from datetime import datetime, timedelta
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from newsapi.newsapi_client import NewsApiClient
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

In [53]:
# Read your api key environment variable
load_dotenv()
# initiate SentimentIntensityAnalyzer
sent_analyzer = SentimentIntensityAnalyzer()


In [54]:
# Create a newsapi client
newsapi = NewsApiClient(api_key=os.environ["NEWSAPI"])


In [64]:
# kp: to avoid repeating code create function to get_everything for fetching news articles based on keyword
def get_articles(keyword):
    articles = newsapi.get_everything(
        q=keyword,
        language="en",
        sort_by="relevancy",
        page=1,
        )
    return articles

In [77]:
# Fetch the Bitcoin news articles
bitcoin_headlines = get_articles("bitcoin")


In [68]:
# Fetch the Ethereum news articles
ethereum_headlines = get_articles('ethereum')

In [91]:
def sentiment_analyzer(headlines):
    sentiments = []
    for article in headlines['articles']:
        try:
            sentiment = analyzer.polarity_scores(article['content'])
            sentiments.append({
                "Text": article['content'],
                "Compound": sentiment['compound'],
                "Positive": sentiment['pos'],
                "Negative": sentiment['neg'],
                "Neutral": sentiment['neu']
            })
        except AttributeError:
            pass
        df = pd.DataFrame(sentiment, columns = ["Compound", "Negative", "Neutral", "Positive", "Text"])
        return df

In [92]:
# Create the Bitcoin sentiment scores DataFrame
sentiment_analyzer(bitcoin_headlines)

Unnamed: 0,Compound,Negative,Neutral,Positive,Text


In [None]:
# Create the ethereum sentiment scores DataFrame


In [None]:
# Describe the Bitcoin Sentiment
# YOUR CODE HERE!

In [None]:
# Describe the Ethereum Sentiment
# YOUR CODE HERE!

### Questions:

Q: Which coin had the highest mean positive score?

A: 

Q: Which coin had the highest compound score?

A: 

Q. Which coin had the highest positive score?

A: 

---

# Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word
2. Remove Punctuation
3. Remove Stopwords

In [None]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [None]:
# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [None]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Create a list of the words

    # Convert the words to lowercase
    
    # Remove the punctuation
    
    # Remove the stop words
    
    # Lemmatize Words into root words
    
    return tokens


In [None]:
# Create a new tokens column for bitcoin
# YOUR CODE HERE!

In [None]:
# Create a new tokens column for ethereum
# YOUR CODE HERE!

---

# NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [None]:
from collections import Counter
from nltk import ngrams

In [None]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Use the token_count function to generate the top 10 words from each coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [None]:
# Get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [None]:
# Get the top 10 words for Ethereum
# YOUR CODE HERE!

# Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

# Named Entity Recognition

In this section, you will build a named entity recognition model for both coins and visualize the tags using SpaCy.

In [None]:
import spacy
from spacy import displacy

In [None]:
# Optional - download a language model for SpaCy
# !python -m spacy download en_core_web_sm

In [None]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

## Bitcoin NER

In [None]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

## Ethereum NER

In [None]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!