# News Headlines Sentiment

Use the news api to pull the latest news articles for bitcoin and ethereum and create a DataFrame of sentiment scores for each coin. 

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
from newsapi import NewsApiClient

%matplotlib inline
load_dotenv()

True

In [2]:
# Read your api key environment variable
# YOUR CODE HERE!
api_key = os.getenv("NEWS_API")
api_key


'a69f1362b664445b9517b56b3314dce6'

In [3]:
# Create a newsapi client
# YOUR CODE HERE!
newsapi = NewsApiClient(api_key=api_key)

In [4]:
# Fetch the Bitcoin news articles
# YOUR CODE HERE!
Bitcoin_headlines = newsapi.get_everything(
        q="Bitcoin",
        language= "en",
        sort_by="relevancy"
)
print(f"total articles about Bitcoin: {Bitcoin_headlines['totalResults']}")
Bitcoin_headlines["articles"][0]

total articles about Bitcoin: 4600


{'source': {'id': None, 'name': 'Lifehacker.com'},
 'author': 'Mike Winters on Two Cents, shared by Mike Winters to Lifehacker',
 'title': 'Is the New Visa Bitcoin Rewards Card Worth It?',
 'description': 'Visa\xa0has partnered with cryptocurrency startup BlockFi to offer the first rewards credit card that pays out in Bitcoin rather than cash, but is it worth applying for? Unless you’re extremely bullish on cryptocurrency and don’t mind getting seriously dinged fo…',
 'url': 'https://twocents.lifehacker.com/is-the-new-visa-bitcoin-rewards-card-worth-it-1845803159',
 'urlToImage': 'https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/a2650t4nr8r2uyujbnfu.png',
 'publishedAt': '2020-12-03T22:00:00Z',
 'content': 'Visa\xa0has partnered with cryptocurrency startup BlockFi to offer the first rewards credit card that pays out in Bitcoin rather than cash, but is it worth applying for? Unless youre extremely bullish o… [+2239 chars]'}

In [5]:
# Fetch the Ethereum news articles
# YOUR CODE HERE!
Ethereum_headlines = newsapi.get_everything(
        q="Ethereum",
        language= "en",
        sort_by="relevancy"
)
print(f"total articles about Ethereum: {Ethereum_headlines['totalResults']}")
Ethereum_headlines["articles"][0]

total articles about Ethereum: 1243


{'source': {'id': 'reuters', 'name': 'Reuters'},
 'author': 'Tom Wilson',
 'title': 'Smaller digital coins soar as bitcoin powers on towards record high - Reuters UK',
 'description': 'Digital currencies Ethereum and XRP soared on Monday, gaining momentum as bitcoin powered on towards its all-time high.',
 'url': 'https://in.reuters.com/article/us-crypto-currencies-idUKKBN2831RI',
 'urlToImage': 'https://static.reuters.com/resources/r/?m=02&d=20201123&t=2&i=1542157677&r=LYNXMPEGAM0XD&w=800',
 'publishedAt': '2020-11-23T14:16:00Z',
 'content': 'FILE PHOTO: Representation of the Ethereum virtual currency standing on the PC motherboard is seen in this illustration picture, February 3, 2018. REUTERS/Dado Ruvic/Illustration\r\nLONDON (Reuters) - … [+1237 chars]'}

In [6]:
# Create the Bitcoin sentiment scores DataFrame
# YOUR CODE HERE!
bitcoin_sentiments = []

for article in Bitcoin_headlines["articles"]:
    try:
        sentiment = analyzer.polarity_scores(article["content"])
      
        bitcoin_sentiments.append({
            "Date" : article["publishedAt"][:10],
            "Text": article["content"],
            "Compound": sentiment["compound"],
            "Positive": sentiment["pos"],
            "Negative": sentiment["neg"],
            "Neutral": sentiment["neu"]
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
btc_sent_df = pd.DataFrame(bitcoin_sentiments)

# Reorder DataFrame columns
cols =["Date", "Compound", "Negative", "Neutral", "Positive", "Text"]
btc_sent_df = btc_sent_df[cols]

btc_sent_df.head()

Unnamed: 0,Date,Compound,Negative,Neutral,Positive,Text
0,2020-12-03,0.6369,0.0,0.838,0.162,Visa has partnered with cryptocurrency startup...
1,2020-11-20,0.2023,0.0,0.95,0.05,"In November 2017, after an absolutely massive,..."
2,2020-12-06,0.0,0.0,1.0,0.0,"Unlike ‘conventional’ cryptocurrencies, a cent..."
3,2020-11-25,0.4404,0.075,0.773,0.152,If youve been watching the crypto markets over...
4,2020-12-09,0.0,0.0,1.0,0.0,Six years after the launch of the Mexico-based...


In [7]:
# Create the Facebook Libra sentiment scores DataFrame

ethereum_sentiments = []

for article in Ethereum_headlines["articles"]:
    try:
        sentiment = analyzer.polarity_scores(article["content"])
      
        ethereum_sentiments.append({
            "Date" : article["publishedAt"][:10],
            "Text": article["content"],
            "Compound": sentiment["compound"],
            "Positive": sentiment["pos"],
            "Negative": sentiment["neg"],
            "Neutral": sentiment["neu"]
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
eth_sent_df = pd.DataFrame(ethereum_sentiments)

# Reorder DataFrame columns
cols =["Date", "Compound", "Negative", "Neutral", "Positive", "Text"]
eth_sent_df = eth_sent_df[cols]

eth_sent_df.head()

Unnamed: 0,Date,Compound,Negative,Neutral,Positive,Text
0,2020-11-23,0.0,0.0,1.0,0.0,FILE PHOTO: Representation of the Ethereum vir...
1,2020-11-23,0.0,0.0,1.0,0.0,FILE PHOTO: Representation of the Ethereum vir...
2,2020-11-23,0.4215,0.0,0.912,0.088,LONDON (Reuters) - Digital currencies Ethereum...
3,2020-12-07,0.1779,0.0,0.948,0.052,NEW YORK (Reuters) - Institutional investors p...
4,2020-12-07,0.1779,0.0,0.948,0.052,NEW YORK (Reuters) - Institutional investors p...


In [8]:
# Describe the Bitcoin Sentiment
btc_sent_df.describe()

Unnamed: 0,Compound,Negative,Neutral,Positive
count,19.0,19.0,19.0,19.0
mean,0.194484,0.003947,0.945316,0.050737
std,0.266427,0.017206,0.077066,0.069277
min,0.0,0.0,0.773,0.0
25%,0.0,0.0,0.8975,0.0
50%,0.0,0.0,1.0,0.0
75%,0.4117,0.0,1.0,0.1025
max,0.765,0.075,1.0,0.174


In [9]:
# Describe the Ethereum Sentiment
eth_sent_df.describe()

Unnamed: 0,Compound,Negative,Neutral,Positive
count,20.0,20.0,20.0,20.0
mean,0.10292,0.02575,0.91655,0.0577
std,0.299156,0.054482,0.104965,0.076181
min,-0.4939,0.0,0.672,0.0
25%,0.0,0.0,0.878,0.0
50%,0.0,0.0,0.948,0.052
75%,0.2263,0.01175,1.0,0.08725
max,0.8779,0.196,1.0,0.318


### Questions:

Q: Which coin had the highest mean positive score?

A: Ethereum had the higherst mean positive sceore coming to

Q: Which coin had the highest compound score?

A: Ethereum at 

Q. Which coin had the highest positive score?

A: Ethereum

---

# Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word
2. Remove Punctuation
3. Remove Stopwords

In [10]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [11]:
from nltk.corpus import reuters, stopwords
import nltk

nltk.download('stopwords')
nltk.download('reuters')
nltk.download('punkt')


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\jacio\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package reuters to
[nltk_data]     C:\Users\jacio\AppData\Roaming\nltk_data...
[nltk_data]   Package reuters is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\jacio\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [12]:
# Expand the default stopwords list if necessary
# YOUR CODE HERE!
addl_stopwords = [',', '', 'https', 'http', 'btc', 'bitcoin', 'eth', 'ethereum']

In [13]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Create a list of the words
    text = word_tokenize(text)
    
    # Convert the words to lowercase
    text = [word.lower() for word in text]
    
    # Remove the punctuation
    regex = re.compile("[^a-zA-Z ]")
    text = [regex.sub('', word) for word in text]
    
    # Remove the stop words
    sw = set(stopwords.words('english') + addl_stopwords)
    
    # Lemmatize Words into root words
    lemmatizer = WordNetLemmatizer()
    
    text = [lemmatizer.lemmatize(word) for word in text]
    text = [word for word in text if word not in sw]
    
    return text


In [15]:
import nltk
from nltk import wordnet
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\jacio\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [16]:
# Create a new tokens column for bitcoin
# YOUR CODE HERE!
btc_tokens = []
[btc_tokens.append(tokenizer(text)) for text in btc_sent_df.Text]
btc_sent_df['Tokens'] = btc_tokens
btc_sent_df.head()

Unnamed: 0,Date,Compound,Negative,Neutral,Positive,Text,Tokens
0,2020-12-03,0.6369,0.0,0.838,0.162,Visa has partnered with cryptocurrency startup...,"[visa, ha, partnered, cryptocurrency, startup,..."
1,2020-11-20,0.2023,0.0,0.95,0.05,"In November 2017, after an absolutely massive,...","[november, absolutely, massive, twomonth, rall..."
2,2020-12-06,0.0,0.0,1.0,0.0,"Unlike ‘conventional’ cryptocurrencies, a cent...","[unlike, conventional, cryptocurrencies, centr..."
3,2020-11-25,0.4404,0.075,0.773,0.152,If youve been watching the crypto markets over...,"[youve, watching, crypto, market, past, week, ..."
4,2020-12-09,0.0,0.0,1.0,0.0,Six years after the launch of the Mexico-based...,"[six, year, launch, mexicobased, crypotcurrenc..."


In [17]:
# Create a new tokens column for ethereum
# YOUR CODE HERE!
eth_tokens = []
[eth_tokens.append(tokenizer(text)) for text in eth_sent_df.Text]
eth_sent_df['Tokens'] = eth_tokens
eth_sent_df.head()

Unnamed: 0,Date,Compound,Negative,Neutral,Positive,Text,Tokens
0,2020-11-23,0.0,0.0,1.0,0.0,FILE PHOTO: Representation of the Ethereum vir...,"[file, photo, representation, virtual, currenc..."
1,2020-11-23,0.0,0.0,1.0,0.0,FILE PHOTO: Representation of the Ethereum vir...,"[file, photo, representation, virtual, currenc..."
2,2020-11-23,0.4215,0.0,0.912,0.088,LONDON (Reuters) - Digital currencies Ethereum...,"[london, reuters, digital, currency, xrp, soar..."
3,2020-12-07,0.1779,0.0,0.948,0.052,NEW YORK (Reuters) - Institutional investors p...,"[new, york, reuters, institutional, investor, ..."
4,2020-12-07,0.1779,0.0,0.948,0.052,NEW YORK (Reuters) - Institutional investors p...,"[new, york, reuters, institutional, investor, ..."


---

# NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [18]:
from collections import Counter
from nltk import ngrams

In [19]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!
btc_bigram_counts = [Counter(ngrams(tokens, n=2)) for tokens in btc_sent_df.Tokens]
dict(btc_bigram_counts[0].most_common(10))

{('visa', 'ha'): 1,
 ('ha', 'partnered'): 1,
 ('partnered', 'cryptocurrency'): 1,
 ('cryptocurrency', 'startup'): 1,
 ('startup', 'blockfi'): 1,
 ('blockfi', 'offer'): 1,
 ('offer', 'first'): 1,
 ('first', 'reward'): 1,
 ('reward', 'credit'): 1,
 ('credit', 'card'): 1}

In [20]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!
eth_bigram_counts = [Counter(ngrams(tokens, n=2)) for tokens in eth_sent_df.Tokens]
dict(eth_bigram_counts[0].most_common(10))

{('file', 'photo'): 1,
 ('photo', 'representation'): 1,
 ('representation', 'virtual'): 1,
 ('virtual', 'currency'): 1,
 ('currency', 'standing'): 1,
 ('standing', 'pc'): 1,
 ('pc', 'motherboard'): 1,
 ('motherboard', 'seen'): 1,
 ('seen', 'illustration'): 1,
 ('illustration', 'picture'): 1}

In [21]:
# Use the token_count function to generate the top 10 words from each coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [22]:
# Get the top 10 words for Bitcoin
# YOUR CODE HERE!
bigstr = []
for token in btc_sent_df.Tokens:
    bigstr += token
btc_token_count = token_count(bigstr)
btc_token_count

[('char', 19),
 ('reuters', 11),
 ('currency', 10),
 ('photo', 8),
 ('representation', 8),
 ('virtual', 8),
 ('file', 7),
 ('illustration', 7),
 ('reutersdado', 7),
 ('ha', 6)]

In [23]:
# Get the top 10 words for Ethereum
# YOUR CODE HERE!
bigstr = []
for token in eth_sent_df.Tokens:
    bigstr += token
eth_token_count = token_count(bigstr)
eth_token_count

[('char', 20),
 ('reuters', 17),
 ('photo', 10),
 ('currency', 10),
 ('representation', 9),
 ('virtual', 9),
 ('illustration', 8),
 ('reutersdado', 8),
 ('file', 7),
 ('seen', 7)]

# Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [27]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [1]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!
btc_corpus = ""
sw = set(stopwords.words("english") + addl_stopwords)
for token in btc_sent_dt.Text:
    btc_corpus +=token
long_string = "".join(str(x) for x in tokenizer(btc-corpus))
wc =WordCloud(collocations=False).generate(long-string)
plt.title("Bitcoin Word Cloud",fontsizes=50, fontwieght="bold")
plt.style.use("seaborn-whitegrid")
plt.imshow(wc)
plt.axis("off")
plt.show

NameError: name 'stopwords' is not defined

In [31]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!
eth_corpus = ""
sw = set(stopwords.words("english") + add1_stopwords)
for token in eth_sent_dt.Text:
    eth_corpus +=token
long_string = "".join(str(x) for x in tokenizer(eth-corpus))
wc =WordCloud(collocations=False).generate(long-string)
plt.title("Ethereum Word Cloud",fontsizes=50, fontwieght="bold")
plt.style.use("seaborn-whitegrid")
plt.imshow(wc)
plt.axis("off")
plt.show

NameError: name 'add1_stopwords' is not defined

# Named Entity Recognition

In this section, you will build a named entity recognition model for both coins and visualize the tags using SpaCy.

In [1]:
import spacy
from spacy import displacy
import matplotlib.pyplot as plt 

In [2]:
# Optional - download a language model for SpaCy
# !python -m spacy download en_core_web_sm

In [3]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

## Bitcoin NER

In [40]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!
btc_corpus

''

In [41]:
# Run the NER processor on all of the text
# YOUR CODE HERE!
btc_ner = nlp(btc_corpus)

# Add a title to the document
# YOUR CODE HERE!
btc_ner.user_data["title"] = "Bitcoin NER")

SyntaxError: unmatched ')' (<ipython-input-41-cba00a67e82d>, line 7)

In [42]:
# Render the visualization
# YOUR CODE HERE!
displacy.render(btc_ner, styles="ent")

NameError: name 'btc_ner' is not defined

In [43]:
# List all Entities
# YOUR CODE HERE!
btc_ents = set([ent.text for ent in btc_ner.ents])
btc_ents

NameError: name 'btc_ner' is not defined

---

## Ethereum NER

In [44]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!
eth_corpus

''

In [45]:
# Run the NER processor on all of the text
# YOUR CODE HERE!
eth_ner = nlp(eth_corpus)
# Add a title to the document
# YOUR CODE HERE!
eth_ner.user_data["title"] = "Ethereum NER")

SyntaxError: unmatched ')' (<ipython-input-45-c9391bd504f4>, line 6)

In [46]:
# Render the visualization
# YOUR CODE HERE!
displacy.render(eth_ner, styles="ent")

NameError: name 'eth_ner' is not defined

In [47]:
# List all Entities
# YOUR CODE HERE!
eth_ents = set([ent.text for ent in eth_ner.ents])
eth_ents

NameError: name 'eth_ner' is not defined