# News Headlines Sentiment

Use the news api to pull the latest news articles for bitcoin and ethereum and create a DataFrame of sentiment scores for each coin. 

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from newsapi.newsapi_client import NewsApiClient
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

In [2]:
# Read your api key environment variable
load_dotenv()
api_key = os.getenv('NEWS_API_KEY')

In [3]:
# Create a newsapi client
newsapi = NewsApiClient(api_key=api_key)

In [4]:
# Fetch the Bitcoin news articles
headlines_btc = newsapi.get_everything(
    q='bitcoin OR BTC',
    language='en',
    page_size=100,
    sort_by='relevancy'
)
print(f"Total Results: {headlines_btc['totalResults']}")

Total Results: 8451


In [5]:
# Fetch the Ethereum news articles
headlines_eth = newsapi.get_everything(
    q='ethereum OR ETH',
    language='en',
    page_size=100,
    sort_by='relevancy'
)
print(f"Total Results: {headlines_eth['totalResults']}")
headlines_eth["articles"][0]

Total Results: 3110


{'source': {'id': 'the-verge', 'name': 'The Verge'},
 'author': 'Adi Robertson',
 'title': 'India will reportedly introduce bill to make owning cryptocurrency illegal',
 'description': 'India’s legislature is reportedly considering a near-total ban on private cryptocurrencies like Bitcoin or Ethereum, including owning the virtual currency. The government has discussed plans for a national digital currency as an alternative.',
 'url': 'https://www.theverge.com/2021/3/15/22332677/india-cryptocurrency-trading-mining-possession-ban-law-report',
 'urlToImage': 'https://cdn.vox-cdn.com/thumbor/IdgNJaOIQBsN8QbQcH2MDU6sAUA=/0x243:2040x1311/fit-in/1200x630/cdn.vox-cdn.com/uploads/chorus_asset/file/10432811/mdoying_180308_2373_0091still.jpg',
 'publishedAt': '2021-03-15T22:25:02Z',
 'content': 'One of the strictest crackdowns worldwide\r\nPhoto by Michele Doying / The Verge\r\nIndia is reportedly moving forward with a sweeping ban on cryptocurrencies. According to Reuters, the countrys legislat…

In [6]:
# Create the Bitcoin sentiment scores DataFrame
btc_sent = []

for article in headlines_btc["articles"]:
    try:
        text=article['content']
        date=article['publishedAt']
        sentiment=analyzer.polarity_scores(text)
        compound=sentiment['compound']
        pos=sentiment['pos']
        neu=sentiment['neu']
        neg=sentiment['neg']
        
        btc_sent.append({
            'text':text,
            'date':date,
            'compound':compound,
            'positive':pos,
            'negative':neg,
            'neutral':neu,
        })
    except AttributeError:
        pass

btc_df = pd.DataFrame(btc_sent)
cols = ['date', 'text', 'compound', 'positive', 'negative', 'neutral']
btc_df=btc_df[cols]
btc_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2021-03-24T08:10:09Z,The inevitable has happened: You can now purch...,0.3182,0.065,0.0,0.935
1,2021-04-01T10:05:00Z,This story originally appeared on MarketBeatAs...,0.6357,0.124,0.043,0.834
2,2021-03-31T14:00:00Z,Whether youre looking to make a larger investm...,0.0772,0.039,0.0,0.961
3,2021-04-06T06:13:00Z,This article was translated from our Spanish e...,-0.34,0.0,0.07,0.93
4,2021-04-08T11:00:00Z,Opinions expressed by Entrepreneur contributor...,0.0,0.0,0.0,1.0


In [7]:
# Create the ethereum sentiment scores DataFrame
eth_sent = []

for article in headlines_eth["articles"]:
    try:
        text=article['content']
        date=article['publishedAt']
        sentiment=analyzer.polarity_scores(text)
        compound=sentiment['compound']
        pos=sentiment['pos']
        neu=sentiment['neu']
        neg=sentiment['neg']
        
        eth_sent.append({
            'text':text,
            'date':date,
            'compound':compound,
            'positive':pos,
            'negative':neg,
            'neutral':neu,
        })
    except AttributeError:
        pass

eth_df = pd.DataFrame(eth_sent)
cols = ['date', 'text', 'compound', 'positive', 'negative', 'neutral']
eth_df = eth_df[cols]
eth_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2021-03-15T22:25:02Z,One of the strictest crackdowns worldwide\r\nP...,-0.5574,0.0,0.11,0.89
1,2021-03-31T14:00:00Z,Whether youre looking to make a larger investm...,0.0772,0.039,0.0,0.961
2,2021-03-15T13:51:11Z,Famed auction house Christies just sold its fi...,0.0,0.0,0.0,1.0
3,2021-04-08T13:00:00Z,Lately Ive taken greatly to this epaper tablet...,-0.3041,0.094,0.145,0.761
4,2021-03-29T15:46:35Z,Payment card network Visa has announced that t...,0.0,0.0,0.0,1.0


In [8]:
# Describe the Bitcoin Sentiment
btc_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,100.0,100.0,100.0,100.0
mean,0.147772,0.05978,0.02441,0.91581
std,0.366833,0.06846,0.045813,0.079966
min,-0.7579,0.0,0.0,0.66
25%,0.0,0.0,0.0,0.865
50%,0.0,0.0495,0.0,0.9295
75%,0.445,0.09225,0.039,1.0
max,0.908,0.34,0.198,1.0


In [9]:
# Describe the Ethereum Sentiment
eth_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,100.0,100.0,100.0,100.0
mean,0.137317,0.06569,0.03064,0.90367
std,0.40126,0.071525,0.051318,0.081532
min,-0.91,0.0,0.0,0.664
25%,-0.057625,0.0,0.0,0.85325
50%,0.0772,0.056,0.0,0.92
75%,0.4588,0.097,0.0625,0.9535
max,0.8506,0.27,0.299,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: ETH

Q: Which coin had the highest compound score?

A: ETH

Q. Which coin had the highest positive score?

A: BTC

---

# Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word
2. Remove Punctuation
3. Remove Stopwords

In [10]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [11]:
# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [12]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    lemmatizer = WordNetLemmatizer()
    
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', text)
    words = word_tokenize(re_clean)
    lem = [lemmatizer.lemmatize(word) for word in words]
    output = [word.lower() for word in lem if word.lower() not in sw]
    return output


In [13]:
# Create a new tokens column for bitcoin
btc_df['tokens'] = btc_df['text'].apply(tokenizer)
btc_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral,tokens
0,2021-03-24T08:10:09Z,The inevitable has happened: You can now purch...,0.3182,0.065,0.0,0.935,"[inevitable, ha, happened, purchase, tesla, ve..."
1,2021-04-01T10:05:00Z,This story originally appeared on MarketBeatAs...,0.6357,0.124,0.043,0.834,"[story, originally, appeared, marketbeatas, cr..."
2,2021-03-31T14:00:00Z,Whether youre looking to make a larger investm...,0.0772,0.039,0.0,0.961,"[whether, youre, looking, make, larger, invest..."
3,2021-04-06T06:13:00Z,This article was translated from our Spanish e...,-0.34,0.0,0.07,0.93,"[article, wa, translated, spanish, edition, us..."
4,2021-04-08T11:00:00Z,Opinions expressed by Entrepreneur contributor...,0.0,0.0,0.0,1.0,"[opinions, expressed, entrepreneur, contributo..."


In [14]:
# Create a new tokens column for ethereum
eth_df['tokens'] = eth_df['text'].apply(tokenizer)
eth_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral,tokens
0,2021-03-15T22:25:02Z,One of the strictest crackdowns worldwide\r\nP...,-0.5574,0.0,0.11,0.89,"[one, strictest, crackdown, worldwidephoto, mi..."
1,2021-03-31T14:00:00Z,Whether youre looking to make a larger investm...,0.0772,0.039,0.0,0.961,"[whether, youre, looking, make, larger, invest..."
2,2021-03-15T13:51:11Z,Famed auction house Christies just sold its fi...,0.0,0.0,0.0,1.0,"[famed, auction, house, christies, sold, first..."
3,2021-04-08T13:00:00Z,Lately Ive taken greatly to this epaper tablet...,-0.3041,0.094,0.145,0.761,"[lately, ive, taken, greatly, epaper, tablet, ..."
4,2021-03-29T15:46:35Z,Payment card network Visa has announced that t...,0.0,0.0,0.0,1.0,"[payment, card, network, visa, ha, announced, ..."


---

# NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [15]:
from collections import Counter
from nltk import ngrams

In [16]:
def bigram(tokens):
    bigrams = dict(ngrams(tokens, n=2))
    return bigrams

In [17]:
# Generate the Bitcoin N-grams where N=2
btc_df['bigrams'] = btc_df['tokens'].apply(bigram)
btc_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral,tokens,bigrams
0,2021-03-24T08:10:09Z,The inevitable has happened: You can now purch...,0.3182,0.065,0.0,0.935,"[inevitable, ha, happened, purchase, tesla, ve...","{'inevitable': 'ha', 'ha': 'happened', 'happen..."
1,2021-04-01T10:05:00Z,This story originally appeared on MarketBeatAs...,0.6357,0.124,0.043,0.834,"[story, originally, appeared, marketbeatas, cr...","{'story': 'originally', 'originally': 'appeare..."
2,2021-03-31T14:00:00Z,Whether youre looking to make a larger investm...,0.0772,0.039,0.0,0.961,"[whether, youre, looking, make, larger, invest...","{'whether': 'youre', 'youre': 'looking', 'look..."
3,2021-04-06T06:13:00Z,This article was translated from our Spanish e...,-0.34,0.0,0.07,0.93,"[article, wa, translated, spanish, edition, us...","{'article': 'wa', 'wa': 'summer', 'translated'..."
4,2021-04-08T11:00:00Z,Opinions expressed by Entrepreneur contributor...,0.0,0.0,0.0,1.0,"[opinions, expressed, entrepreneur, contributo...","{'opinions': 'expressed', 'expressed': 'entrep..."


In [18]:
# Generate the Ethereum N-grams where N=2
eth_df['bigrams'] = eth_df['tokens'].apply(bigram)
eth_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral,tokens,bigrams
0,2021-03-15T22:25:02Z,One of the strictest crackdowns worldwide\r\nP...,-0.5574,0.0,0.11,0.89,"[one, strictest, crackdown, worldwidephoto, mi...","{'one': 'strictest', 'strictest': 'crackdown',..."
1,2021-03-31T14:00:00Z,Whether youre looking to make a larger investm...,0.0772,0.039,0.0,0.961,"[whether, youre, looking, make, larger, invest...","{'whether': 'youre', 'youre': 'looking', 'look..."
2,2021-03-15T13:51:11Z,Famed auction house Christies just sold its fi...,0.0,0.0,0.0,1.0,"[famed, auction, house, christies, sold, first...","{'famed': 'auction', 'auction': 'house', 'hous..."
3,2021-04-08T13:00:00Z,Lately Ive taken greatly to this epaper tablet...,-0.3041,0.094,0.145,0.761,"[lately, ive, taken, greatly, epaper, tablet, ...","{'lately': 'ive', 'ive': 'taken', 'taken': 'gr..."
4,2021-03-29T15:46:35Z,Payment card network Visa has announced that t...,0.0,0.0,0.0,1.0,"[payment, card, network, visa, ha, announced, ...","{'payment': 'card', 'card': 'network', 'networ..."


In [19]:
# Use the token_count function to generate the top 10 words from each coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [20]:
# Get the top 10 words for Bitcoin
token_count(btc_df['tokens'].sum(), N=10)

[('char', 98),
 ('bitcoin', 60),
 ('reuters', 39),
 ('ha', 27),
 ('march', 17),
 ('new', 16),
 ('tesla', 15),
 ('photo', 15),
 ('cryptocurrency', 14),
 ('wa', 13)]

In [21]:
# Get the top 10 words for Ethereum
token_count(eth_df['tokens'].sum(), N=10)

[('char', 97),
 ('nft', 23),
 ('digital', 21),
 ('cryptocurrency', 20),
 ('million', 18),
 ('token', 18),
 ('reuters', 17),
 ('nonfungible', 15),
 ('ethereum', 14),
 ('ha', 14)]

# Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
wc = WordCloud()
img = wc.generate(btc_df['tokens'].sum())
plt.imshow(img)

In [None]:
# Generate the Ethereum word cloud
img2 = wc.generate(btc_df['tokens'].sum())
plt.imshow(img2)

# Named Entity Recognition

In this section, you will build a named entity recognition model for both coins and visualize the tags using SpaCy.

In [22]:
import spacy
from spacy import displacy

In [23]:
# Optional - download a language model for SpaCy
!python -m spacy download en_core_web_sm

[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')


In [24]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

## Bitcoin NER

In [25]:
# Concatenate all of the bitcoin text together
btc_text = btc_df['text'].sum()


In [26]:
# Run the NER processor on all of the text
btc_doc = nlp(btc_text)

# Add a title to the document
btc_doc.user_data['title'] = 'Bitcoin NER'

In [27]:
# Render the visualization
displacy.render(btc_doc, style='ent')

In [28]:
# List all Entities
print([ent.text for ent in btc_doc.ents])

['Tesla', 'Bitcoin', 'Elon Musk', 'Wednesday', 'MarketBeatAs', 'Bitcoin’s', 'the past year', 'Bitcoin', 'Bitcoin, Ethereum', 'Bitcoin Cash', 'Spanish', 'AI', 'Entrepreneur', 'the summer', 'Entrepreneur', 'Bitcoin', 'earlier this year', 'Tesla', 'Bitcoin', 'Tesla', 'Elon Musk', 'Musk', 'Tesla', 'March 26', 'Chipotle Mexican Grill', 'NYSE', 'CMG', '100,000', '100,000', 'National Burrito Day', 'April 1', 'Fidelity, Coinbase', 'Tuesday', 'earlier this year', 'roughly $1.5 billion', 'early February', 'SEC', 'Burrito Day', 'April Fools Day', 'this year', 'tomorrow', 'Pollo Loco', 'Photo', 'Michele Doying', 'India', 'Reuters', '+1656 chars]Fifteen years', 'Twitter', 'Jack Dorsey', 'first', 'nearly $3 million', 'NFT', 'Sina Estavi', 'Bridge', 'last years', 'Twitter', 'Graham Ivan Clark', 'Twitters', 'India', 'Reuters', 'a big year', 'Robinhood', 'today', 'Christine Brown', 'Robinhoods', 'Infection', 'a Remote Code Execution', '22, 2021', 'MarketBeat\r\n', '2021', 'millions', 'Funko', 'NFT', 'T

---

## Ethereum NER

In [29]:
# Concatenate all of the ethereum text together
eth_text = eth_df['text'].sum()

In [30]:
# Run the NER processor on all of the text
eth_doc = nlp(eth_text)

# Add a title to the document
eth_doc.user_data['title'] = 'Ethereum NER'

In [31]:
# Render the visualization
displacy.render(eth_doc, style='ent')

In [32]:
# List all Entities
print([ent.text for ent in eth_doc.ents])

['One', 'Photo', 'Michele Doying', 'India', 'Reuters', 'Bitcoin, Ethereum', 'Bitcoin Cash', 'Christies', 'first', '$69 million', '5,000', 'Apple', 'Amazon', 'iPad', 'Visa', 'USD Coin', 'Ethereum', 'Crypto.com', 'first', 'NFT', 'Ethereum', '19, 2021', 'Entrepreneur', 'Two', 'Ethereum', 'NFT', 'recent weeks', 'chars]MetaMask', 'one', 'Ethereum', 'September 2020', 'Entrepreneur', 'Bitcoin', 'about $5.7 million', 'a big year', 'Robinhood', 'today', 'Christine Brown', 'Robinhoods', 'above $2,700', 'Kraken', 'Kraken', 'Pete Humiston', 'NFT', 'Ethereum', 'more than $224 million', '2021', 'OpenSea', '35%', 'March', 'Kraken', 'Dado Ruvic', 'Reuters', 'above $2,700', '26, 2021', 'Spanish', 'AI', 'New York Times', 'daily', 'Segal', 'Burn Alpha', 'Mirror', '0.1', 'ETH', 'yesterday', 'evening', '25', 'ETH', 'millions of dollars', '1990', 'millions of dollars', '36.32', 'April 3', '90%', 'Fidelity, Coinbase', 'Tuesday', 'Tesla', 'Bitcoin', 'Elon Musk', 'Wednesday', 'ETH Zurich', 'Empa', 'hours', 'Be