# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
load_dotenv()
%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\ugokh\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!

Bad key savefig.frameon in file c:\Users\ugokh\anaconda3\envs\pyvizenv\lib\site-packages\matplotlib\mpl-data\stylelib\_classic_test.mplstyle, line 421 ('savefig.frameon : True')
You probably need to get an updated matplotlibrc file from
https://github.com/matplotlib/matplotlib/blob/v3.4.3/matplotlibrc.template
or from the matplotlib source distribution

Bad key verbose.level in file c:\Users\ugokh\anaconda3\envs\pyvizenv\lib\site-packages\matplotlib\mpl-data\stylelib\_classic_test.mplstyle, line 472 ('verbose.level  : silent      # one of silent, helpful, debug, debug-annoying')
You probably need to get an updated matplotlibrc file from
https://github.com/matplotlib/matplotlib/blob/v3.4.3/matplotlibrc.template
or from the matplotlib source distribution

Bad key verbose.fileo in file c:\Users\ugokh\anaconda3\envs\pyvizen

In [2]:
# Read your api key environment variable
apiKey = os.getenv("NEWS_API_KEY")

In [3]:
# Create a newsapi client
from newsapi import NewsApiClient
newsApi = NewsApiClient(api_key=apiKey)

In [4]:
# Fetch the Bitcoin news articles
bitcoinArticles = newsApi.get_everything(q="bitcoin", language="en", sort_by="relevancy")

In [5]:
# Fetch the Ethereum news articles
ethereumArticles = newsApi.get_everything(q="ethereum", language="en", sort_by="relevancy")

In [6]:
# Create the Bitcoin sentiment scores DataFrame
btcSentimentScore = []

for article in bitcoinArticles["articles"]:
    try:
        text = article["content"]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        btcSentimentScore.append({
            "text": text,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
        })
    except ArithmeticError:
        pass
    
#btcSentimentScore

In [7]:
#creating the btc dataframe so that results looks clean and understandable (do the same for eth)
btcDf = pd.DataFrame(btcSentimentScore)
colReorder = ["compound", "negative", "neutral", "positive", "text"]
btcDf = btcDf[colReorder]

In [8]:
btcDf

Unnamed: 0,compound,negative,neutral,positive,text
0,-0.5574,0.098,0.902,0.0,New York lawmakers have passed a bill\r\n that...
1,0.0772,0.0,0.964,0.036,"Now, even though there are a number of women-f..."
2,-0.0516,0.061,0.882,0.056,A Bitcoin mining site powered by otherwise los...
3,-0.1027,0.04,0.96,0.0,You can now reportedly pay for your burritos a...
4,0.34,0.0,0.928,0.072,"Image source, Getty Images\r\nThe value of Bit..."
5,0.3818,0.052,0.833,0.114,"As a kid, I remember when my father tried to u..."
6,0.3182,0.04,0.883,0.077,Customers at Chipotle will now be able to pay ...
7,0.7506,0.0,0.807,0.193,If youve ever felt like introducing some Vegas...
8,-0.4404,0.241,0.557,0.202,Cryptocurrency mixers are sometimes used to he...
9,-0.4767,0.103,0.897,0.0,Photo Illustration by Grayson Blackmon / The V...


In [9]:
# Create the Ethereum sentiment scores DataFrame
ethSentimentScore = []

for article in ethereumArticles["articles"]:
    try:
        text = article["content"]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        ethSentimentScore.append({
            "text": text,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
        })
    except ArithmeticError:
        pass
    
ethSentimentScore

[{'text': "Meta has revealed more of how NFTs will work on Instagram. In the US-based test, you can show what you've bought or created for free by connecting your Instagram account to a compatible digital walle… [+1223 chars]",
  'compound': 0.6486,
  'positive': 0.135,
  'negative': 0.0,
  'neutral': 0.865},
 {'text': 'GameStop has officially thrown itself headlong into the web3 vipers nest with a new app release, though its hard to say whether its proposed population of gamers and game developers will take up the … [+3255 chars]',
  'compound': -0.1027,
  'positive': 0.0,
  'negative': 0.04,
  'neutral': 0.96},
 {'text': 'When Bored Ape Yacht Club creators Yuga Labs announced its Otherside NFT collection would launch on April 30, it was predicted by many to be the biggest NFT launch ever. Otherside is an upcoming Bore… [+6669 chars]',
  'compound': -0.2732,
  'positive': 0.0,
  'negative': 0.055,
  'neutral': 0.945},
 {'text': 'GameStop is going all-in on crypto. The video game retai

In [10]:
#create eth df and reorganize the columns
ethDf = pd.DataFrame(ethSentimentScore)

ethDf = ethDf[colReorder]

ethDf

Unnamed: 0,compound,negative,neutral,positive,text
0,0.6486,0.0,0.865,0.135,Meta has revealed more of how NFTs will work o...
1,-0.1027,0.04,0.96,0.0,GameStop has officially thrown itself headlong...
2,-0.2732,0.055,0.945,0.0,When Bored Ape Yacht Club creators Yuga Labs a...
3,0.128,0.0,0.954,0.046,GameStop is going all-in on crypto. The video ...
4,-0.5574,0.098,0.902,0.0,New York lawmakers have passed a bill\r\n that...
5,0.0258,0.0,0.966,0.034,"DAVOS, Switzerland, May 25 (Reuters) - Ethereu..."
6,0.6908,0.0,0.822,0.178,Editorial IndependenceWe want to help you make...
7,-0.6908,0.178,0.822,0.0,"40 days ago Bitcoin sold for $47,454. It's pri..."
8,-0.3818,0.085,0.847,0.069,When Nvidia launched its Ampere Lite Hash Rate...
9,-0.2732,0.063,0.937,0.0,"May 4 (Reuters) - Bitcoin rose 5.7% to $39,862..."


In [11]:
# Describe the Bitcoin Sentiment
btcDf.describe()

Unnamed: 0,compound,negative,neutral,positive
count,20.0,20.0,20.0,20.0
mean,-0.09726,0.0687,0.8777,0.05365
std,0.371016,0.055068,0.087077,0.061442
min,-0.5574,0.0,0.557,0.0
25%,-0.386825,0.04,0.858,0.0
50%,-0.18795,0.0645,0.8885,0.0415
75%,0.156025,0.0875,0.92425,0.0775
max,0.7506,0.241,0.964,0.202


In [12]:
# Describe the Ethereum Sentiment
ethDf.describe()

Unnamed: 0,compound,negative,neutral,positive
count,20.0,20.0,20.0,20.0
mean,-0.050445,0.04645,0.9133,0.04035
std,0.399819,0.048242,0.050061,0.054699
min,-0.6908,0.0,0.822,0.0
25%,-0.3818,0.0,0.87475,0.0
50%,-0.1279,0.045,0.924,0.0
75%,0.181,0.077,0.95425,0.07275
max,0.6908,0.178,1.0,0.178


### Questions:

Q: Which coin had the highest mean positive score?

A: Btc with 0.058

Q: Which coin had the highest compound score?

A: Btc with 0.750

Q. Which coin had the highest positive score?

A: Btc with 0.202

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [13]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [14]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

# Create a list of stopwords
sw = set(stopwords.words('english'))
#stopW
# Expand the default stopwords list if necessary
stopWordsExtra = {}
sw = sw.union(stopWordsExtra)
sw

{'a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 'her',
 'here',
 'hers',
 'herself',
 'him',
 'himself',
 'his',
 'how',
 'i',
 'if',
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it's",
 'its',
 'itself',
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'only',
 'or',
 'other',
 'our',
 'ours',
 'ourselves',
 'out',
 'over',
 'own',
 'r

In [15]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', text) 
   
    # Create a tokenized list of the words
    tknzdWrds = word_tokenize(re_clean)
    
    # Lemmatize words into root words
    lem  = [lemmatizer.lemmatize(word) for word in tknzdWrds]
   
    # Convert the words to lowercase
    # Remove the stop words
    tokens = [word.lower() for word in lem if word.lower() not in sw]
   
    return tokens

In [16]:
# Create a new tokens column for Bitcoin
btcDf['tokens'] = btcDf.text.apply(tokenizer)
btcDf

Unnamed: 0,compound,negative,neutral,positive,text,tokens
0,-0.5574,0.098,0.902,0.0,New York lawmakers have passed a bill\r\n that...,"[new, york, lawmaker, passed, bill, would, tem..."
1,0.0772,0.0,0.964,0.036,"Now, even though there are a number of women-f...","[even, though, number, womenfocused, crypto, s..."
2,-0.0516,0.061,0.882,0.056,A Bitcoin mining site powered by otherwise los...,"[bitcoin, mining, site, powered, otherwise, lo..."
3,-0.1027,0.04,0.96,0.0,You can now reportedly pay for your burritos a...,"[reportedly, pay, burrito, taco, bitcoin, digi..."
4,0.34,0.0,0.928,0.072,"Image source, Getty Images\r\nThe value of Bit...","[image, source, getty, imagesthe, value, bitco..."
5,0.3818,0.052,0.833,0.114,"As a kid, I remember when my father tried to u...","[kid, remember, father, tried, use, broom, han..."
6,0.3182,0.04,0.883,0.077,Customers at Chipotle will now be able to pay ...,"[customers, chipotle, able, pay, burrito, cryp..."
7,0.7506,0.0,0.807,0.193,If youve ever felt like introducing some Vegas...,"[youve, ever, felt, like, introducing, vegasst..."
8,-0.4404,0.241,0.557,0.202,Cryptocurrency mixers are sometimes used to he...,"[cryptocurrency, mixer, sometimes, used, help,..."
9,-0.4767,0.103,0.897,0.0,Photo Illustration by Grayson Blackmon / The V...,"[photo, illustration, grayson, blackmon, verge..."


In [17]:
# Create a new tokens column for Ethereum
ethDf['tokens'] = ethDf.text.apply(tokenizer)
ethDf

Unnamed: 0,compound,negative,neutral,positive,text,tokens
0,0.6486,0.0,0.865,0.135,Meta has revealed more of how NFTs will work o...,"[meta, ha, revealed, nfts, work, instagram, us..."
1,-0.1027,0.04,0.96,0.0,GameStop has officially thrown itself headlong...,"[gamestop, ha, officially, thrown, headlong, w..."
2,-0.2732,0.055,0.945,0.0,When Bored Ape Yacht Club creators Yuga Labs a...,"[bored, ape, yacht, club, creator, yuga, labs,..."
3,0.128,0.0,0.954,0.046,GameStop is going all-in on crypto. The video ...,"[gamestop, going, allin, crypto, video, game, ..."
4,-0.5574,0.098,0.902,0.0,New York lawmakers have passed a bill\r\n that...,"[new, york, lawmaker, passed, bill, would, tem..."
5,0.0258,0.0,0.966,0.034,"DAVOS, Switzerland, May 25 (Reuters) - Ethereu...","[davos, switzerland, may, reuters, ethereums, ..."
6,0.6908,0.0,0.822,0.178,Editorial IndependenceWe want to help you make...,"[editorial, independencewe, want, help, make, ..."
7,-0.6908,0.178,0.822,0.0,"40 days ago Bitcoin sold for $47,454. It's pri...","[day, ago, bitcoin, sold, price, drop, third, ..."
8,-0.3818,0.085,0.847,0.069,When Nvidia launched its Ampere Lite Hash Rate...,"[nvidia, launched, ampere, lite, hash, rate, l..."
9,-0.2732,0.063,0.937,0.0,"May 4 (Reuters) - Bitcoin rose 5.7% to $39,862...","[may, reuters, bitcoin, rose, wednesday, addin..."


---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [18]:
from collections import Counter
from nltk import ngrams

In [51]:
# Generate the Bitcoin N-grams where N=2
btcNgram = ngrams(tokenizer(btcDf.text.str.cat()), n=2)

btcNgramCount = Counter(btcNgram)

<zip at 0x23826ff7a88>

In [46]:
print(dict(btcNgramCount))

{('new', 'york'): 1, ('york', 'lawmaker'): 1, ('lawmaker', 'passed'): 1, ('passed', 'bill'): 1, ('bill', 'would'): 1, ('would', 'temporarily'): 1, ('temporarily', 'ban'): 1, ('ban', 'new'): 1, ('new', 'bitcoin'): 1, ('bitcoin', 'mining'): 2, ('mining', 'operation'): 1, ('operation', 'early'): 1, ('early', 'friday'): 1, ('friday', 'state'): 1, ('state', 'senator'): 1, ('senator', 'voted'): 1, ('voted', 'pas'): 1, ('pas', 'legislation'): 1, ('legislation', 'bound'): 1, ('bound', 'desk'): 1, ('desk', 'charsnow'): 1, ('charsnow', 'even'): 1, ('even', 'though'): 1, ('though', 'number'): 1, ('number', 'womenfocused'): 1, ('womenfocused', 'crypto'): 1, ('crypto', 'space'): 1, ('space', 'odeniran'): 1, ('odeniran', 'say'): 1, ('say', 'woman'): 1, ('woman', 'still'): 1, ('still', 'underrepresented'): 1, ('underrepresented', 'ive'): 1, ('ive', 'space'): 1, ('space', 'im'): 1, ('im', 'black'): 1, ('black', 'person'): 1, ('person', 'woman'): 1, ('woman', 'b'): 1, ('b', 'charsa'): 1, ('charsa', 'bi

In [50]:
# Generate the Ethereum N-grams where N=2
ethNgram = ngrams(tokenizer(ethDf.text.str.cat()), n=2)
ethNgramCount = Counter(ethNgram)
print(dict(ethNgramCount))

{('meta', 'ha'): 1, ('ha', 'revealed'): 1, ('revealed', 'nfts'): 1, ('nfts', 'work'): 1, ('work', 'instagram'): 1, ('instagram', 'usbased'): 1, ('usbased', 'test'): 1, ('test', 'show'): 1, ('show', 'youve'): 1, ('youve', 'bought'): 1, ('bought', 'created'): 1, ('created', 'free'): 1, ('free', 'connecting'): 1, ('connecting', 'instagram'): 1, ('instagram', 'account'): 1, ('account', 'compatible'): 1, ('compatible', 'digital'): 1, ('digital', 'walle'): 1, ('walle', 'charsgamestop'): 1, ('charsgamestop', 'ha'): 1, ('ha', 'officially'): 1, ('officially', 'thrown'): 1, ('thrown', 'headlong'): 1, ('headlong', 'web'): 1, ('web', 'viper'): 1, ('viper', 'nest'): 1, ('nest', 'new'): 1, ('new', 'app'): 1, ('app', 'release'): 1, ('release', 'though'): 1, ('though', 'hard'): 1, ('hard', 'say'): 1, ('say', 'whether'): 1, ('whether', 'proposed'): 1, ('proposed', 'population'): 1, ('population', 'gamers'): 1, ('gamers', 'game'): 1, ('game', 'developer'): 1, ('developer', 'take'): 1, ('take', 'charswhe

In [21]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [55]:
# Use token_count to get the top 10 words for Bitcoin
btcTopTen = token_count(tokenizer(btcDf.text.str.cat()), int(input()))
btcTopTen

[('bitcoin', 10),
 ('new', 7),
 ('cryptocurrency', 7),
 ('world', 6),
 ('week', 6),
 ('reuters', 5),
 ('biggest', 4),
 ('charsmay', 4),
 ('token', 4),
 ('would', 3)]

In [58]:
# Use token_count to get the top 10 words for Ethereum
ethTopTen = token_count(tokenizer(ethDf.text.str.cat()), int(input()))

ethTopTen

[('cryptocurrency', 11),
 ('bitcoin', 7),
 ('ha', 6),
 ('world', 5),
 ('digital', 4),
 ('nft', 4),
 ('biggest', 4),
 ('reuters', 4),
 ('market', 4),
 ('char', 4)]

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [24]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [25]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [26]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [27]:
import spacy
from spacy import displacy

In [28]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [29]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [30]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [31]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [32]:
# Render the visualization
# YOUR CODE HERE!

In [33]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [34]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [35]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [36]:
# Render the visualization
# YOUR CODE HERE!

In [37]:
# List all Entities
# YOUR CODE HERE!

---