# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [19]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from newsapi import NewsApiClient
from pathlib import Path
from dotenv import load_dotenv
load_dotenv()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\lucas\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [20]:
analyzer = SentimentIntensityAnalyzer()

In [21]:
# Read your api key environment variable
# YOUR CODE HERE!
api_key = os.getenv("news_api")

In [22]:
# Create a newsapi client
# YOUR CODE HERE!
newsapi = NewsApiClient(api_key=api_key)

In [24]:
# Fetch the Bitcoin news articles
# YOUR CODE HERE!
btc_headlines = newsapi.get_everything(
    q="bitcoin",
    language="en",
    page_size=100,
    sort_by="relevancy"
)

# Show sample article
btc_headlines["articles"][0]


{'source': {'id': 'wired', 'name': 'Wired'},
 'author': 'Khari Johnson',
 'title': 'Why Not Use Self-Driving Cars as Supercomputers?',
 'description': 'Autonomous vehicles use the equivalent of 200 laptops to get around. Some want to tap that computing power to decode viruses or mine bitcoin.',
 'url': 'https://www.wired.com/story/use-self-driving-cars-supercomputers/',
 'urlToImage': 'https://media.wired.com/photos/60f081b4c147fe7a1a367362/191:100/w_1280,c_limit/Business-Autonomous-Vehicles-Supercomputers-1201885684.jpg',
 'publishedAt': '2021-07-19T11:00:00Z',
 'content': 'Like Dogecoin devotees, the mayor of Reno, and the leaders of El Salvador, Aldo Baoicchi is convinced cryptocurrency is the future. The CEO and founder of Canadian scooter maker Daymak believes this … [+4116 chars]'}

In [25]:
# Fetch the Ethereum news articles
# YOUR CODE HERE!
eth_headlines = newsapi.get_everything(
    q="ethereum",
    language="en",
    page_size=100,
    sort_by="relevancy"
)

# Show sample article
eth_headlines["articles"][0]

{'source': {'id': 'techcrunch', 'name': 'TechCrunch'},
 'author': 'Connie Loizos',
 'title': 'Crypto investors like Terraform Labs so much, they’re committing $150 million to its ‘ecosystem’',
 'description': 'There are many blockchain platforms competing for investors’ and developers’ attention right now, from the big daddy of them all, Ethereum, to so-called “Ethereum Killers” like Solana, which we wrote about in May. Often, these technologies are seen as so prom…',
 'url': 'http://techcrunch.com/2021/07/16/crypto-investors-like-terraform-labs-so-much-theyre-committing-150-million-to-its-ecosystem/',
 'urlToImage': 'https://techcrunch.com/wp-content/uploads/2020/06/GettyImages-1174590894.jpg?w=667',
 'publishedAt': '2021-07-16T16:00:55Z',
 'content': 'There are many blockchain platforms competing for investors’ and developers’ attention right now, from the big daddy of them all, Ethereum, to so-called “Ethereum Killers” like Solana, which we wrote… [+2563 chars]'}

In [26]:
# Create the Bitcoin sentiment scores DataFrame
# YOUR CODE HERE!
btc_sentiments = []

for article in btc_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        btc_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
btc_df = pd.DataFrame(btc_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
btc_df = btc_df[cols]

btc_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2021-07-19,"Like Dogecoin devotees, the mayor of Reno, and...",0.6908,0.178,0.0,0.822
1,2021-07-05,Filed under:\r\nThe supply chain attack has re...,-0.5719,0.111,0.184,0.705
2,2021-07-05,image copyrightGetty Images\r\nThe gang behind...,-0.6124,0.0,0.143,0.857
3,2021-07-23,To get a roundup of TechCrunchs biggest and mo...,0.624,0.127,0.0,0.873
4,2021-07-14,While retail investors grew more comfortable b...,0.7264,0.164,0.0,0.836


In [27]:
# Create the Ethereum sentiment scores DataFrame
# YOUR CODE HERE!
eth_sentiments = []

for article in eth_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        eth_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
eth_df = pd.DataFrame(eth_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
eth_df = eth_df[cols]

eth_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2021-07-16,There are many blockchain platforms competing ...,0.3612,0.075,0.0,0.925
1,2021-07-14,While retail investors grew more comfortable b...,0.7264,0.164,0.0,0.836
2,2021-07-02,Bitcoin and Ethereum\r\nYuriko Nakao\r\nEther ...,0.3612,0.11,0.041,0.849
3,2021-07-17,"""Anthony Di Iorio, a co-founder of the Ethereu...",0.6369,0.157,0.0,0.843
4,2021-07-05,"Ether holders have ""staked"" more than $13 bill...",0.7717,0.194,0.0,0.806


In [28]:
# Describe the Bitcoin Sentiment
# YOUR CODE HERE!
btc_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,100.0,100.0,100.0,100.0
mean,0.046143,0.04557,0.03417,0.92025
std,0.374973,0.060143,0.059515,0.08208
min,-0.8271,0.0,0.0,0.653
25%,-0.0516,0.0,0.0,0.86775
50%,0.0,0.0,0.0,0.9265
75%,0.30155,0.08025,0.06725,1.0
max,0.8658,0.276,0.287,1.0


In [29]:
# Describe the Ethereum Sentiment
# YOUR CODE HERE!
eth_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,100.0,100.0,100.0,100.0
mean,0.152496,0.05401,0.02183,0.92417
std,0.340071,0.059799,0.04313,0.070174
min,-0.8126,0.0,0.0,0.714
25%,0.0,0.0,0.0,0.86425
50%,0.0,0.042,0.0,0.927
75%,0.46105,0.09475,0.04125,1.0
max,0.7717,0.194,0.249,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: Ethereum

Q: Which coin had the highest compound score?

A: Bitcoin

Q. Which coin had the highest positive score?

A: Bitcoin

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [41]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import reuters
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import nltk
import re

In [42]:
# Instantiate the lemmatizer
# YOUR CODE HERE!
lemmatizer = WordNetLemmatizer()
# Create a list of stopwords
# YOUR CODE HERE!
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('reuters')
nltk.download('stopwords')
nltk.download('punkt')
# Expand the default stopwords list if necessary
# YOUR CODE HERE!

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\lucas\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\lucas\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package reuters to
[nltk_data]     C:\Users\lucas\AppData\Roaming\nltk_data...
[nltk_data]   Package reuters is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\lucas\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\lucas\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [88]:
#btc_headlines
#eth_headlines
btc_text = btc_df['text']
btc_text_only = btc_text.to_string()
eth_text = eth_df['text']
eth_text_only = eth_text.to_string()

In [89]:
# Complete the tokenizer function
btc_tokenized = [sent_tokenize(i) for i in btc_text_only]
eth_tokenized = [sent_tokenize(i) for i in eth_text_only]

In [90]:
btc_word_tokenized = []

for story in btc_tokenized:
    # get all for each article, which is already sentence tokenized
    words = []
    for sent in btc_text_only:
        words = words + word_tokenize(sent)
    # append all words for each article to the word_tokenized list
    word_tokenized.append(words)

In [96]:
btc_word_tokenized = []

for story in btc_tokenized:
    words = []
    for sent in story:
        words = words + word_tokenize(sent)
    btc_word_tokenized.append(words)

In [97]:
eth_word_tokenized = []

for story in eth_tokenized:
    words = []
    for sent in story:
        words = words + word_tokenize(sent)
    eth_word_tokenized.append(words)

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [62]:
from collections import Counter
from nltk import ngrams

In [98]:
def process_text(doc):
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', btc_text_only)
    words = word_tokenize(re_clean)
    lem = [lemmatizer.lemmatize(word) for word in words]
    output = [word.lower() for word in lem if word.lower() not in sw]
    return output

In [100]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!
btc_processed = process_text(btc_text_only)
print(btc_processed)

['like', 'dogecoin', 'devotee', 'mayor', 'reno', 'filed', 'underrnthe', 'supply', 'chain', 'attack', 'ha', 'image', 'copyrightgetty', 'imagesrnthe', 'gang', 'behind', 'get', 'roundup', 'techcrunchs', 'biggest', 'mo', 'retail', 'investor', 'grew', 'comfortable', 'b', 'longtime', 'techcrunch', 'reader', 'know', 'well', 'mich', 'representation', 'virtual', 'currency', 'bitcoin', 'representation', 'virtual', 'cryptocurrency', 'reuters', 'staffrnfile', 'photo', 'representation', 'james', 'martincnetrna', 'uk', 'man', 'wa', 'arrested', 'reuters', 'staffrnjune', 'reuters', 'bitcoi', 'representation', 'virtual', 'currency', 'bitcoin', 'reuters', 'staffrnjune', 'reuters', 'bitcoi', 'reuters', 'staffrnfile', 'photo', 'representati', 'representation', 'virtual', 'currency', 'bitcoin', 'representations', 'cryptocurrencies', 'bitcoin', 'e', 'opinions', 'expressed', 'entrepreneur', 'contributor', 'opinions', 'expressed', 'entrepreneur', 'contributor', 'representations', 'virtual', 'currency', 'bitco

In [101]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!
eth_processed = process_text(eth_text_only)
print(eth_processed)

['like', 'dogecoin', 'devotee', 'mayor', 'reno', 'filed', 'underrnthe', 'supply', 'chain', 'attack', 'ha', 'image', 'copyrightgetty', 'imagesrnthe', 'gang', 'behind', 'get', 'roundup', 'techcrunchs', 'biggest', 'mo', 'retail', 'investor', 'grew', 'comfortable', 'b', 'longtime', 'techcrunch', 'reader', 'know', 'well', 'mich', 'representation', 'virtual', 'currency', 'bitcoin', 'representation', 'virtual', 'cryptocurrency', 'reuters', 'staffrnfile', 'photo', 'representation', 'james', 'martincnetrna', 'uk', 'man', 'wa', 'arrested', 'reuters', 'staffrnjune', 'reuters', 'bitcoi', 'representation', 'virtual', 'currency', 'bitcoin', 'reuters', 'staffrnjune', 'reuters', 'bitcoi', 'reuters', 'staffrnfile', 'photo', 'representati', 'representation', 'virtual', 'currency', 'bitcoin', 'representations', 'cryptocurrencies', 'bitcoin', 'e', 'opinions', 'expressed', 'entrepreneur', 'contributor', 'opinions', 'expressed', 'entrepreneur', 'contributor', 'representations', 'virtual', 'currency', 'bitco

In [104]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [105]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!
token_count(btc_processed)

[('reuters', 26),
 ('bitcoin', 24),
 ('representation', 11),
 ('virtual', 9),
 ('cryptocurrency', 9),
 ('representations', 8),
 ('july', 8),
 ('currency', 6),
 ('e', 6),
 ('ha', 5)]

In [106]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!
token_count(eth_processed)

[('reuters', 26),
 ('bitcoin', 24),
 ('representation', 11),
 ('virtual', 9),
 ('cryptocurrency', 9),
 ('representations', 8),
 ('july', 8),
 ('currency', 6),
 ('e', 6),
 ('ha', 5)]

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [111]:
from wordcloud import WordCloud
from nltk.util import ngrams
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [112]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!
btc_wc = WordCloud().generate(btc_processed)
plt.imshow(btc_wc)

TypeError: expected string or bytes-like object

In [110]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!
eth_wc = WordCloud().generate(btc_processed)
plt.imshow(eth_wc)

TypeError: expected string or bytes-like object

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [115]:
import spacy
from spacy import displacy

In [113]:
# Download the language model for SpaCy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.1.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.1.0/en_core_web_sm-3.1.0-py3-none-any.whl (13.6 MB)
[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')


In [116]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [117]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!
btc_text_only

'0     Like Dogecoin devotees, the mayor of Reno, and...\n1     Filed under:\\r\\nThe supply chain attack has re...\n2     image copyrightGetty Images\\r\\nThe gang behind...\n3     To get a roundup of TechCrunchs biggest and mo...\n4     While retail investors grew more comfortable b...\n5     As longtime TechCrunch readers know well, Mich...\n6     A representation of virtual currency Bitcoin i...\n7     A representation of the virtual cryptocurrency...\n8     By Reuters Staff\\r\\nFILE PHOTO: Representation...\n9     James Martin/CNET\\r\\nA UK man was arrested in ...\n10    By Reuters Staff\\r\\nJune 25 (Reuters) - Bitcoi...\n11    A representations of virtual currency Bitcoin ...\n12    By Reuters Staff\\r\\nJune 25 (Reuters) - Bitcoi...\n13    By Reuters Staff\\r\\nFILE PHOTO: A representati...\n14    A representation of virtual currency bitcoin i...\n15    Representations of cryptocurrencies Bitcoin, E...\n16    Opinions expressed by Entrepreneur contributor...\n17    Opinions e

In [118]:
# Run the NER processor on all of the text
# YOUR CODE HERE!
btc_doc = nlp(btc_text_only)
# Add a title to the document
# YOUR CODE HERE!
displacy.render(btc_doc, style='ent')

---

### Ethereum NER

In [122]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!
eth_text_only

'0     There are many blockchain platforms competing ...\n1     While retail investors grew more comfortable b...\n2     Bitcoin and Ethereum\\r\\nYuriko Nakao\\r\\nEther ...\n3     "Anthony Di Iorio, a co-founder of the Ethereu...\n4     Ether holders have "staked" more than $13 bill...\n5     Ether is the cryptocurrency of the ethereum ne...\n6     Major upgrades to the ethereum network could h...\n7     You’ve likely seen the headlines surrounding t...\n8     While the ambitions of crypto investors have s...\n9     Personal Finance Insider writes about products...\n10    This article was translated from our Spanish e...\n11    By Reuters Staff\\r\\nJune 25 (Reuters) - Bitcoi...\n12    By Reuters Staff\\r\\nJune 25 (Reuters) - Bitcoi...\n13    After a successful testnet deployement, the Lo...\n14    By Reuters Staff\\r\\nFILE PHOTO: Representation...\n15    At this point the average Hackaday reader is l...\n16    A representations of virtual currency Bitcoin ...\n17    A representati

In [123]:
# Run the NER processor on all of the text
# YOUR CODE HERE!
eth_doc = nlp(eth_text_only)
# Add a title to the document
# YOUR CODE HERE!
displacy.render(eth_doc, style='ent')


---