# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

load_dotenv()
%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\alexm\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Read your api key environment variable
api_key = os.getenv("news_api")


In [3]:
# Create a newsapi client
from newsapi import NewsApiClient

newsapi = NewsApiClient(api_key=api_key)
newsapi

<newsapi.newsapi_client.NewsApiClient at 0x2475841ea30>

In [4]:
# Fetch the Bitcoin news articles
btc_headlines = newsapi.get_everything(q="Bitcoin AND bitcoin",
                                       language="en",
                                      page_size=100,
                                      sort_by='relevancy'
                                      )
btc_headlines['articles'][0]

{'source': {'id': 'the-verge', 'name': 'The Verge'},
 'author': 'Elizabeth Lopatto',
 'title': 'If you’re a Russian YouTuber, how do you get paid now?',
 'description': 'Russian creators are shut off from the global financial system. Some of them are turning to cryptocurrency.',
 'url': 'https://www.theverge.com/2022/3/17/22982122/russia-youtube-crypto-creators-pay-ruble',
 'urlToImage': 'https://cdn.vox-cdn.com/thumbor/MG_NhB7wSIBIl3S_LG-y-r7iPmg=/0x215:3000x1786/fit-in/1200x630/cdn.vox-cdn.com/uploads/chorus_asset/file/9442221/cryptocurrency_0004__00000_.jpg',
 'publishedAt': '2022-03-17T13:33:43Z',
 'content': 'When Russia invaded Ukraine, Niki Proshin was already a year into making a living as a vlogger — he had a YouTube channel, a TikTok channel, and an Instagram. He also ran an online Russian club for a… [+5883 chars]'}

In [5]:
# Show total articles that have been pulled using the newsapi.

print(f"Total articles: {btc_headlines['totalResults']}")

Total articles: 7370


In [6]:
# Fetch the Ethereum news articles
eth_headlines = newsapi.get_everything(q="Ethereum AND ethereum",
                                       language="en",
                                      page_size=100,
                                      sort_by='relevancy'
                                      )
eth_headlines['articles'][0]

{'source': {'id': 'wired', 'name': 'Wired'},
 'author': 'Shanti Escalante-De Mattei',
 'title': 'Web3 Threatens to Segregate Our Online Lives',
 'description': 'Governance tokens seem like a tantalizing solution to content moderation struggles. They only give the appearance of democracy.',
 'url': 'https://www.wired.com/story/web3-governance-tokens-cryptocurrency-content-moderation/',
 'urlToImage': 'https://media.wired.com/photos/621d66c7ea3b8f283853aa29/191:100/w_1280,c_limit/Web3-Threatens-to-Segregate-Our-Online-Lives.jpg',
 'publishedAt': '2022-03-01T14:00:00Z',
 'content': 'In February, shit hit the fan in the usual way: An old tweet resurfaced. Brantly Millegan, director of operations at Ethereum Name Service (ENS), a web3 business, had written the following in May 201… [+3096 chars]'}

In [7]:
# Show total articles that have been pulled using the newsapi.
print(f"Total articles: {eth_headlines['totalResults']}")

Total articles: 3565


In [8]:
# Transformed the response dictionary into a DataFrame
btc_df = pd.DataFrame.from_dict(btc_headlines["articles"])

btc_df.head()

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'the-verge', 'name': 'The Verge'}",Elizabeth Lopatto,"If you’re a Russian YouTuber, how do you get p...",Russian creators are shut off from the global ...,https://www.theverge.com/2022/3/17/22982122/ru...,https://cdn.vox-cdn.com/thumbor/MG_NhB7wSIBIl3...,2022-03-17T13:33:43Z,"When Russia invaded Ukraine, Niki Proshin was ..."
1,"{'id': None, 'name': 'Slashdot.org'}",EditorDavid,Why Isn't Bitcoin Booming?,"""Bitcoin was seen by many of its libertarian-l...",https://news.slashdot.org/story/22/03/12/05412...,https://a.fsdn.com/sd/topics/bitcoin_64.png,2022-03-12T18:34:00Z,"""Bitcoin was seen by many of its libertarian-l..."
2,"{'id': 'reuters', 'name': 'Reuters'}",,CRYPTOVERSE-Bitcoin could be laid low by miner...,Bitcoin miners are feeling the heat - and the ...,https://www.reuters.com/markets/europe/cryptov...,https://www.reuters.com/resizer/9nBpgfg7pSfpPQ...,2022-02-22T06:17:00Z,Feb 22 (Reuters) - Bitcoin miners are feeling ...
3,"{'id': 'reuters', 'name': 'Reuters'}",,Cryptoverse: Bitcoin gains conflict currency c...,Bitcoin has leapt since Russia's invasion of U...,https://www.reuters.com/markets/europe/cryptov...,https://www.reuters.com/pf/resources/images/re...,2022-03-01T06:10:00Z,March 1 (Reuters) - Bitcoin has leapt since Ru...
4,"{'id': 'wired', 'name': 'Wired'}",Gian M. Volpicelli,War Is Calling Crypto’s ‘Neutrality’ Into Ques...,War in Ukraine and Western sanctions against R...,https://www.wired.com/story/crypto-russia-ukra...,https://media.wired.com/photos/6226a83bd53a49d...,2022-03-08T12:00:00Z,Whose side is cryptocurrency on? If you had as...


In [9]:
# Create the Bitcoin sentiment scores DataFrame

_sentiments = []

for article in btc_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)  #the VADER sentiment scores are retrieved
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        _sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
# Create DataFrame
btc_sent_df = pd.DataFrame(_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
btc_sent_df = btc_sent_df[cols]

btc_sent_df.head()    

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2022-03-17,"When Russia invaded Ukraine, Niki Proshin was ...",0.0,0.0,0.0,1.0
1,2022-03-12,"""Bitcoin was seen by many of its libertarian-l...",-0.7713,0.0,0.169,0.831
2,2022-02-22,Feb 22 (Reuters) - Bitcoin miners are feeling ...,-0.1779,0.046,0.067,0.887
3,2022-03-01,March 1 (Reuters) - Bitcoin has leapt since Ru...,0.0,0.0,0.0,1.0
4,2022-03-08,Whose side is cryptocurrency on? If you had as...,-0.3182,0.055,0.091,0.854


In [10]:
# Transformed the response dictionary into a DataFrame

eth_df = pd.DataFrame.from_dict(eth_headlines["articles"])

eth_df.head()

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'wired', 'name': 'Wired'}",Shanti Escalante-De Mattei,Web3 Threatens to Segregate Our Online Lives,Governance tokens seem like a tantalizing solu...,https://www.wired.com/story/web3-governance-to...,https://media.wired.com/photos/621d66c7ea3b8f2...,2022-03-01T14:00:00Z,"In February, shit hit the fan in the usual way..."
1,"{'id': 'business-insider', 'name': 'Business I...",prosen@insider.com (Phil Rosen),Coinbase earnings show trading of ethereum and...,Ethereum trading volume increased from 15% to ...,https://markets.businessinsider.com/news/curre...,https://i.insider.com/62190267d0009b001904bd96...,2022-02-25T17:02:30Z,Coinbase reported that the share of trading vo...
2,"{'id': 'the-verge', 'name': 'The Verge'}",Elizabeth Lopatto,How Ukrainians are fundraising in cryptocurrency,Millions of dollars of cryptocurrency have flo...,https://www.theverge.com/2022/2/26/22952357/uk...,https://cdn.vox-cdn.com/thumbor/teEVxppIZ_JTW-...,2022-02-26T20:29:04Z,Illustration by James Bareham / The Verge\r\n\...
3,"{'id': None, 'name': 'Entrepreneur'}",Masha Prusso,What You Need to Know About Ethereum's Role in...,This now-seven-year-old decentralized and open...,https://www.entrepreneur.com/article/417850,https://assets.entrepreneur.com/content/3x2/20...,2022-03-03T16:00:00Z,"It seems that in 2022, you cant escape from th..."
4,"{'id': 'wired', 'name': 'Wired'}",Omar L. Gallaga,How People Actually Make Money From Cryptocurr...,Power traders use “staking” and “yield farming...,https://www.wired.com/story/how-to-make-money-...,https://media.wired.com/photos/622bcc6ef48a924...,2022-03-13T13:00:00Z,"If it sounds too good to be true, youre not wr..."


In [11]:
# Create the Ethereum sentiment scores DataFrame

_sentiments = []

for article in eth_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)  #the VADER sentiment scores are retrieved
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        _sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
# Create DataFrame
eth_sent_df = pd.DataFrame(_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
eth_sent_df = eth_sent_df[cols]

eth_sent_df.head()    

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2022-03-01,"In February, shit hit the fan in the usual way...",-0.3182,0.059,0.093,0.848
1,2022-02-25,Coinbase reported that the share of trading vo...,0.6705,0.188,0.0,0.812
2,2022-02-26,Illustration by James Bareham / The Verge\r\n\...,-0.4588,0.0,0.083,0.917
3,2022-03-03,"It seems that in 2022, you cant escape from th...",-0.1326,0.0,0.044,0.956
4,2022-03-13,"If it sounds too good to be true, youre not wr...",0.834,0.236,0.05,0.713


In [12]:
# Describe the Bitcoin Sentiment
btc_sent_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,100.0,100.0,100.0,100.0
mean,0.071776,0.0715,0.0474,0.88109
std,0.429123,0.069213,0.057759,0.082574
min,-0.7783,0.0,0.0,0.694
25%,-0.2736,0.0,0.0,0.8375
50%,0.0,0.0655,0.0,0.8915
75%,0.4068,0.099,0.083,0.93525
max,0.91,0.301,0.187,1.0


In [13]:
# Describe the Ethereum Sentiment
eth_sent_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,100.0,100.0,100.0,100.0
mean,0.14938,0.07732,0.04087,0.88178
std,0.431127,0.070577,0.060132,0.083832
min,-0.9136,0.0,0.0,0.688
25%,0.0,0.0,0.0,0.83525
50%,0.1779,0.069,0.0,0.882
75%,0.5106,0.126,0.06525,0.943
max,0.8625,0.29,0.312,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: Ethereum had the highest mean positive score: 0.08

Q: Which coin had the highest compound score?

A: Bitcoin had the highest compound score: 0.91

Q. Which coin had the highest positive score?

A: Bitcoin had the highest positive score: 0.3

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [14]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\alexm\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [15]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

In [16]:
# Create a list of stopwords
print(stopwords.words('english'))
# Expand the default stopwords list if necessary

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', '

In [None]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text

   
    # Create a tokenized list of the words
    
    
    # Lemmatize words into root words

   
    # Convert the words to lowercase
    
    
    # Remove the stop words
    
    
    return tokens

In [None]:
# Create a new tokens column for Bitcoin
# YOUR CODE HERE!

In [None]:
# Create a new tokens column for Ethereum
# YOUR CODE HERE!

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [17]:
from collections import Counter
from nltk import ngrams

In [18]:
def btc_process_text(btc_headlines):
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z ]")
    re_clean= regex.sub('', btc_headlines)
    words = word_tokenize(re_clean)
    lem = [lemmatizer.lemmatize(word) for word in words]
    output = [word.lower() for word in lem if word.lower() not in sw]
    return output

In [19]:
btc_processed = btc_process_text(btc_headlines)
print(btc_processed)

TypeError: expected string or bytes-like object

In [20]:
# Generate the Bitcoin N-grams where N=2
btc_gram_counts = Counter(ngrams(btc_processed, n=2))
print(dict(btc_gram_counts))

NameError: name 'btc_processed' is not defined

In [None]:
# Generate the Ethereum N-grams where N=2
eth_gram_counts = Counter(ngrams(eth_processed, n=2))
print(dict(eth_gram_counts))

In [None]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [None]:
# Use token_count to get the top 10 words for Bitcoin
print(dict(btc_gram_counts.most_common(10)))

In [None]:
# Use token_count to get the top 10 words for Ethereum
print(dict(eth_gram_counts.most_common(10)))

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
btc_wc = WordCloud().generate(input_text)
plt.imshow(wc)

In [None]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [21]:
import spacy
from spacy import displacy

In [23]:
# Download the language model for SpaCy
!python -m spacy download en_core_web_sm


Collecting en-core-web-sm==3.2.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.2.0/en_core_web_sm-3.2.0-py3-none-any.whl (13.9 MB)
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.2.0
[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')


In [25]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [None]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [None]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---