# News Headlines Sentiment

Use the news api to pull the latest news articles for bitcoin and ethereum and create a DataFrame of sentiment scores for each coin. 

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
from datetime import datetime, timedelta
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from newsapi.newsapi_client import NewsApiClient
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

In [2]:
# Read your api key environment variable
load_dotenv()
# initiate SentimentIntensityAnalyzer
sent_analyzer = SentimentIntensityAnalyzer()


In [3]:
# Create a newsapi client
newsapi = NewsApiClient(api_key=os.environ["NEWSAPI"])


In [37]:
# kp: to avoid repeating code create function to get_everything for fetching news articles by keyword 
def get_headlines(keyword):
    """
    Utilizes newsapi.get_everything() to pull today's headlines and return a list of headlines.
    """
    all_headlines = []
    date = datetime.today()
    articles = newsapi.get_everything(
        q=keyword,
        from_param=date,
        language="en",
        sort_by="relevancy",
        page=1)
    headlines = []
    for i in range(0, len(articles["articles"])):
        headlines.append(articles["articles"][i]["title"])
    all_headlines.append(headlines)
    return all_headlines

In [61]:
def test_all(keyword):
    """
    pull headlines from keyword (ex: 'bitcoin')
    analyzes sentiment
    creates dataframe with sentiments listed & headline
    """
    df_columns = ['Compound', 'Negative', 'Neutral', 'Positive', 'Text']
    date = datetime.today()
    articles = newsapi.get_everything(
        q=keyword,
        from_param=date,
        language="en",
        sort_by="relevancy",
        page=1)
    headlines = [article['title'] for article in articles['articles']]
    #figure out how to name a dataframe with keyword 
    keyword_df = pd.DataFrame(columns=df_columns)
    keyword_df['Text'] = pd.Series(headlines)
    """for text in keyword_df:
        for index, row in text.iterrows():
            scores = analyzer.polarity_scores(row.Text)
            row.Compound = scores['compound']
            row.Negative = scores['neg']
            row.Neutral = scores['neu']
            row.Positive = scores['pos']
    for text in keyword_df:
        numeric_col_labels = text.columns.drop('Text')
        text[numeric_col_labels] = text[numberic_col_labels].apply(pd.to_numeric, errors='coerce')"""
    return keyword_df

In [62]:
test_all('bitcoin')

Unnamed: 0,Compound,Negative,Neutral,Positive,Text
0,,,,,Crypto fans rejoice: Bitcoin rallies to the br...
1,,,,,Closing Time for Bitcoin’s Iconic Room 77 – ‘A...
2,,,,,Video: Fresh Competition Is Shaping The Bitcoi...
3,,,,,FinCEN Fines Bitcoin-Mixing CEO $60M in Landma...
4,,,,,US Alleges Top Russian Cyber Hackers Tried to ...
5,,,,,Market Wrap: Bitcoin Bounces to $11.8K; Over 1...
6,,,,,Market Wrap: Bitcoin Bounces to $11.8K; Over 1...
7,,,,,3iQ’s The Bitcoin Fund Offers Trading Denomina...
8,,,,,Bitcoin price holds strong amid negative news ...
9,,,,,"Institutional Bitcoin longs at record-high, he..."


In [38]:
def sentiment_analyzer(headlines):
    """
    Takes headlines from get_articles(keyword), analyzes sentiment, and returns
    DataFrame with sentiments listed by [Compound, Negative, Neutral, Postive, Headline]
    """
    sentiments = []
    for article in headlines['articles']:
        try:
            sentiment = analyzer.polarity_scores(article['title'])
            sentiments.append({
                "Text": article['title'],
                "Compound": sentiment['compound'],
                "Positive": sentiment['pos'],
                "Negative": sentiment['neg'],
                "Neutral": sentiment['neu']
            })
        except AttributeError:
            pass
        df = pd.DataFrame(sentiments, columns = ["Compound", "Negative", "Neutral", "Positive", ""])
        return df

In [39]:
# kp: to avoid repeating code create function to get_everything for fetching news articles based on keyword
def get_articles(keyword):
    """
    Utilizes newsapi.get_everything() to pull today's headlines and return all information
    """
    articles = newsapi.get_everything(
        q=keyword,
        language="en",
        sort_by="relevancy",
        page=1,
        )
    return articles

In [41]:
# Fetch the Bitcoin news articles
bitcoin_headlines = get_headlines('bitcoin')
bitcoin_headlines

[['Bitcoin Approaches $12,000 After Snapping Equities Correlation',
  'Crypto fans rejoice: Bitcoin rallies to the brink of $12,000',
  'Closing Time for Bitcoin’s Iconic Room 77 – ‘And That’s OK,’ Says Owner',
  'Video: Fresh Competition Is Shaping The Bitcoin Mining Hardware And Pool Landscape',
  'Introducing CBPI: A New Way To Measure Bitcoin Network Electrical Consumption',
  'FinCEN Fines Bitcoin-Mixing CEO $60M in Landmark Crackdown on Helix, Coin Ninja',
  'US Alleges Top Russian Cyber Hackers Tried to Cover Digital Tracks With Bitcoin',
  'Market Wrap: Bitcoin Bounces to $11.8K; Over 10K BTC Locked in Harvest Finance',
  'Market Wrap: Bitcoin Bounces to $11.8K; Over 10K BTC Locked in Harvest Finance',
  '3iQ’s The Bitcoin Fund Offers Trading Denominated in Canadian Dollars',
  'Bitcoin price holds strong amid negative news blitz, says CoinShares report',
  'Institutional Bitcoin longs at record-high, hedge funds short — CME data',
  'Price analysis 10/19: BTC, ETH, XRP, BCH, B

In [36]:
# Fetch the Ethereum news articles
ethereum_headlines = get_articles('ethereum')

In [26]:
# Create the Bitcoin sentiment scores DataFrame
sentiment_analyzer(bitcoin_headlines)

Unnamed: 0,Compound,Negative,Neutral,Positive,Text
0,0.0,0.0,1.0,0.0,'One day everyone will use China's digital cur...


In [None]:
# Create the ethereum sentiment scores DataFrame


In [None]:
# Describe the Bitcoin Sentiment
# YOUR CODE HERE!

In [None]:
# Describe the Ethereum Sentiment
# YOUR CODE HERE!

### Questions:

Q: Which coin had the highest mean positive score?

A: 

Q: Which coin had the highest compound score?

A: 

Q. Which coin had the highest positive score?

A: 

---

# Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word
2. Remove Punctuation
3. Remove Stopwords

In [None]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [None]:
# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [None]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Create a list of the words

    # Convert the words to lowercase
    
    # Remove the punctuation
    
    # Remove the stop words
    
    # Lemmatize Words into root words
    
    return tokens


In [None]:
# Create a new tokens column for bitcoin
# YOUR CODE HERE!

In [None]:
# Create a new tokens column for ethereum
# YOUR CODE HERE!

---

# NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [None]:
from collections import Counter
from nltk import ngrams

In [None]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Use the token_count function to generate the top 10 words from each coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [None]:
# Get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [None]:
# Get the top 10 words for Ethereum
# YOUR CODE HERE!

# Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

# Named Entity Recognition

In this section, you will build a named entity recognition model for both coins and visualize the tags using SpaCy.

In [None]:
import spacy
from spacy import displacy

In [None]:
# Optional - download a language model for SpaCy
# !python -m spacy download en_core_web_sm

In [None]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

## Bitcoin NER

In [None]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

## Ethereum NER

In [None]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!