# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [92]:
# Initial imports
import numpy as np
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Harrison\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [70]:
# Read your api key environment variable
load_dotenv("key.env")
api_key = os.getenv("NEWS_API")

In [71]:
# Create a newsapi client
#!pip install newsapi-python
from newsapi.newsapi_client import NewsApiClient
newsapi = NewsApiClient(api_key=api_key)

In [72]:
# Fetch the Bitcoin news articles
Bitcoin_news_en = newsapi.get_everything(
    q="Bitcoin",
    language="en",
    sort_by="relevancy"
)

In [73]:
# Fetch the Ethereum news articles
Ethereum_news_en = newsapi.get_everything(
    q="Ethereum",
    language="en",
    sort_by="relevancy"
)

In [74]:
# Create the Bitcoin sentiment scores DataFrame
bitcoin_sentiments = []

for article in Bitcoin_news_en["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        bitcoin_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
bitcoin_df = pd.DataFrame(bitcoin_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
bitcoin_df = bitcoin_df[cols]

bitcoin_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2021-05-12,Just weeks after Tesla started accepting Bitco...,0.3818,0.071,0.0,0.929
1,2021-06-09,El Salvador's President Nayib Bukele has made ...,0.8402,0.282,0.0,0.718
2,2021-05-12,Image: Tesla\r\n\n \n\n Tesla has stopped acce...,0.4939,0.134,0.05,0.816
3,2021-06-09,El Salvador has become the first country in th...,0.128,0.043,0.0,0.957
4,2021-05-19,Illustration by Alex Castro / The Verge\r\n\n ...,0.0,0.0,0.0,1.0


In [75]:
# Create the Ethereum sentiment scores DataFrame
ethereum_sentiments = []

for article in Ethereum_news_en["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        ethereum_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
ethereum_df = pd.DataFrame(ethereum_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
ethereum_df = ethereum_df[cols]

ethereum_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2021-05-12,"Vitalik Buterin, the creator of Ethereum, on W...",0.2263,0.06,0.0,0.94
1,2021-05-15,Solana isn’t known yet outside of the crypto c...,0.4019,0.083,0.0,0.917
2,2021-05-19,"Bitcoin, Ethereum and a host of Altcoins suffe...",-0.2023,0.066,0.087,0.847
3,2021-05-20,,0.0,0.0,0.0,0.0
4,2021-05-31,A representation of virtual currency Ethereum ...,0.0,0.0,0.0,1.0


In [76]:
# Describe the Bitcoin Sentiment
bitcoin_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,20.0,20.0,20.0,20.0
mean,0.27852,0.09775,0.0355,0.8668
std,0.470441,0.079796,0.051382,0.07632
min,-0.7627,0.0,0.0,0.718
25%,0.096,0.05575,0.0,0.819
50%,0.3818,0.071,0.0,0.876
75%,0.577875,0.15525,0.068,0.929
max,0.8455,0.282,0.18,1.0


In [77]:
# Describe the Ethereum Sentiment
ethereum_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,20.0,20.0,20.0,20.0
mean,-0.01149,0.0453,0.0432,0.8615
std,0.373397,0.050086,0.0784,0.218677
min,-0.8689,0.0,0.0,0.0
25%,-0.236725,0.0,0.0,0.83825
50%,0.0,0.0555,0.0,0.9245
75%,0.2263,0.068,0.07125,0.96175
max,0.6705,0.188,0.286,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: Bitcoin Mean Positive Score : 0.097750 > Ethereum Mean Positive Score : 0.045300

Q: Which coin had the highest compound score?

A: Bitcoin Highest Compound score : 0.845500 > Ethereum Highest Compound Score : 0.670500

Q. Which coin had the highest positive score?

A: Bitcoin Highest Positive score : 0.282000 > Ethereum Highest Positive Score : 0.188000

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [134]:
import string
!pip install nltk
nltk.download('reuters')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt')
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
!pip install re
#import re



[nltk_data] Downloading package reuters to
[nltk_data]     C:\Users\Harrison\AppData\Roaming\nltk_data...
[nltk_data]   Package reuters is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Harrison\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Harrison\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Harrison\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
ERROR: Could not find a version that satisfies the requirement re (from versions: none)
ERROR: No matching distribution found for re


In [135]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

# Create a list of stopwords
sw = set(stopwords.words('english'))
first_result = [word.lower() for word in bitcoin_df["text"] if word.lower() not in sw]
# Expand the default stopwords list if necessary
first_result

['just weeks after tesla started accepting bitcoin as currency for cars, elon musk revealed in a tweet that it will "suspend" the effort. according to the release (tesla does not appear to have a funct… [+768 chars]',
 "el salvador's president nayib bukele has made good on his promise to adopt bitcoin as legal tender. officials in the central american country's congress voted to accept the cryptocurrency by a majori… [+1414 chars]",
 'image: tesla\r\n\n \n\n tesla has stopped accepting bitcoin as payment for its cars out of concern that it will contribute to greater consumption of fossil fuels, according to a statement ceo elon musk tw… [+853 chars]',
 'el salvador has become the first country in the world to recognize the cryptocurrency bitcoin as legal currency, according to president nayib bukele in a tweet on wednesday. citizens will be able to … [+3840 chars]',
 'illustration by alex castro / the verge\r\n\n \n\n cryptocurrency exchange coinbase is experiencing a “partial” outage 

In [138]:
# Complete the tokenizer function
def tokenizer_text(text):
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', article)
    words = word_tokenize(re_clean)
    lem = [lemmatizer.lemmatize(word) for word in words]
    output = [word.lower() for word in lem if word.lower() not in sw]
    return output

In [140]:
# Create a new tokens column for Bitcoin
BT = tokenizer(bitcoin_df["text"])
bitcoin_df["token"] = BT

TypeError: expected string or bytes-like object

In [141]:
# Create a new tokens column for Ethereum
ET = tokenizer(ethereum_df["text"])
ethereum_df_df["token"] = ET

NameError: name 'ethereum_df_df' is not defined

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [15]:
from collections import Counter
from nltk import ngrams

In [16]:
# Generate the Bitcoin N-grams where N=2
bitcoin_counts = Counter(ngrams(BT, n=2))

In [17]:
# Generate the Ethereum N-grams where N=2
ethereum_counts = Counter(ngrams(ET, n=2))

In [18]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [19]:
# Use token_count to get the top 10 words for Bitcoin
token_count(BT)

In [20]:
# Use token_count to get the top 10 words for Ethereum
token_count(ET)

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [145]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

ModuleNotFoundError: No module named 'wordcloud'

In [146]:
# Generate the Bitcoin word cloud
bwc = WordCloud().generate(BT)
plt.imshow(bwc)

NameError: name 'WordCloud' is not defined

In [147]:
# Generate the Ethereum word cloud
ewc = WordCloud().generate(ET)
plt.imshow(ewc)

NameError: name 'WordCloud' is not defined

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [150]:
!pip install spacy
import spacy
from spacy import displacy

Collecting spacy
  Downloading spacy-3.0.6-cp38-cp38-win_amd64.whl (11.9 MB)
Collecting typer<0.4.0,>=0.3.0
  Downloading typer-0.3.2-py3-none-any.whl (21 kB)
Collecting pydantic<1.8.0,>=1.7.1
  Downloading pydantic-1.7.4-cp38-cp38-win_amd64.whl (1.8 MB)
Collecting wasabi<1.1.0,>=0.8.1
  Downloading wasabi-0.8.2-py3-none-any.whl (23 kB)
Collecting catalogue<2.1.0,>=2.0.3
  Downloading catalogue-2.0.4-py3-none-any.whl (16 kB)
Collecting blis<0.8.0,>=0.4.0
  Downloading blis-0.7.4-cp38-cp38-win_amd64.whl (6.5 MB)
Collecting spacy-legacy<3.1.0,>=3.0.4
  Downloading spacy_legacy-3.0.5-py2.py3-none-any.whl (12 kB)
Collecting cymem<2.1.0,>=2.0.2
  Downloading cymem-2.0.5-cp38-cp38-win_amd64.whl (36 kB)
Collecting pathy>=0.3.5
  Downloading pathy-0.5.2-py3-none-any.whl (42 kB)
Collecting srsly<3.0.0,>=2.4.1
  Downloading srsly-2.4.1-cp38-cp38-win_amd64.whl (451 kB)
Collecting murmurhash<1.1.0,>=0.28.0
  Downloading murmurhash-1.0.5-cp38-cp38-win_amd64.whl (21 kB)
Collecting preshed<3.1.0,>=3.

In [151]:
# Download the language model for SpaCy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0-py3-none-any.whl (13.7 MB)
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.0.0
[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')


2021-06-13 06:11:29.712349: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-06-13 06:11:29.712388: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [152]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [160]:
# Concatenate all of the Bitcoin text together
Bitcoin_Conc = ' '.join(bitcoin_df['text'])
Bitcoin_Conc

'Just weeks after Tesla started accepting Bitcoin as currency for cars, Elon Musk revealed in a tweet that it will "suspend" the effort. According to the release (Tesla does not appear to have a funct… [+768 chars] El Salvador\'s President Nayib Bukele has made good on his promise to adopt Bitcoin as legal tender. Officials in the Central American country\'s congress voted to accept the cryptocurrency by a majori… [+1414 chars] Image: Tesla\r\n\n \n\n Tesla has stopped accepting bitcoin as payment for its cars out of concern that it will contribute to greater consumption of fossil fuels, according to a statement CEO Elon Musk tw… [+853 chars] El Salvador has become the first country in the world to recognize the cryptocurrency bitcoin as legal currency, according to President Nayib Bukele in a tweet on Wednesday. Citizens will be able to … [+3840 chars] Illustration by Alex Castro / The Verge\r\n\n \n\n Cryptocurrency exchange Coinbase is experiencing a “partial” outage this morning fo

In [163]:
# Run the NER processor on all of the text
btc_doc = nlp(Bitcoin_Conc)

In [164]:
# Render the visualization
displacy.render(btc_doc, style='ent')

In [168]:
# List all Entities
print([ent.text for ent in btc_doc.ents])

['Just weeks', 'Tesla', 'Bitcoin', 'Elon Musk', "El Salvador's", 'Nayib Bukele', 'Central American', 'Tesla', 'Elon Musk', 'El Salvador', 'first', 'Nayib Bukele', 'Wednesday', 'Citizens', 'Alex Castro', 'The Verge\r\n\n \n\n Cryptocurrency', 'Coinbase', 'this morning', 'Coinbase', 'Bin', 'Last week', 'Musk', 'Dogecoin', 'Mark Zuckerberg', 'two', 'Max', 'Bitcoin', 'Elon Musk', 'March', 'Musk', 'Tesla', 'Earlier this year', 'EV', '1.5', 'Elon Musk', 'Bitcoin', 'Bitcoin', 'Altcoins', 'Tuesday night', 'Wednesday', 'morning', 'months', 'hundreds of billions', 'hours-long', 'this morning', 'May 19th', 'US', 'El Salvador', 'first', 'Bitcoin', 'Miami', 'Florida', 'last weekend', 'covid-19', 'Larry Cermak', 'Last week', 'Disney', 'This week', 'Tec', 'PayPals', 'Wednesday', 'Jose Fernandez da Pontethe', 'more than 5,500 miles', 'the United States', 'Mary-Ann RussonBusiness', 'BBC News', 'US', 'Donald Trump', 'Fox Business', 'Bitcoin', 'US', 'Photo', 'Michele Doying', 'Verge', 'Iran', 'Last week'

---

### Ethereum NER

In [169]:
# Concatenate all of the Ethereum text together
Ethereum_Conc = ' '.join(ethereum_df['text'])
Ethereum_Conc

'Vitalik Buterin, the creator of Ethereum, on Wednesday donated Ethereum and meme coins worth $1.5 billion in one of the largest-ever individual philanthropy efforts.\r\nButerin transferred 500 ETH and … [+1667 chars] Solana isn’t known yet outside of the crypto community. But insiders think the blockchain platform is interesting for a wide variety of reasons, beginning with its amiable founder, Anatoly Yakovenko,… [+7156 chars] Bitcoin, Ethereum and a host of Altcoins suffered massive drops Tuesday night and Wednesday morning, erasing months of gains and hundreds of billions in market cap. The overall crypto market shrunk m… [+1182 chars]  A representation of virtual currency Ethereum is seen in front of a stock graph in this illustration taken February 19, 2021. REUTERS/Dado Ruvic/Illustration/File PhotoCryptocurrency Ethereum extende… [+1099 chars] GPU shortages and inflated prices have become a byproduct of the growth of cryptomining. Needless to say, that\'s bad news for the gamer

In [170]:
# Run the NER processor on all of the text
eth_doc = nlp(Ethereum_Conc)

In [171]:
# Render the visualization
displacy.render(eth_doc, style='ent')

In [172]:
# List all Entities
print([ent.text for ent in eth_doc.ents])

['Vitalik Buterin', 'Ethereum', 'Wednesday', 'Ethereum', '$1.5 billion', 'Buterin', '500', 'ETH', 'Solana', 'Anatoly Yakovenko', 'Altcoins', 'Tuesday night', 'Wednesday', 'morning', 'months', 'hundreds of billions', 'Ethereum', 'February 19, 2021', 'GPU', 'Alex Castro', 'The Verge\r\n\n \n\n Cryptocurrency', 'Coinbase', 'this morning', 'Coinbase', 'Bin', 'Spanish', 'AI', 'Last May', 'Buterin', '27', '99.95%', 'Carl Beekhuizen', 'the Ethereum Foundation', 'Beekhuizen', 'tomorrow', 'Norton', 'Norton', 'Norton Crypto', 'Bitcoin', 'U.S. Dollar', 'May 26, 2020', 'Dado Ruvic/File PhotoBitcoin', 'last week', 'one-day', 'March last year', 'Wednesday', '$1 trillion', 'Wednesday', 'Entrepreneur', 'March 2021', 'more than one million', 'SafeMoon', 'about one hundred billion trillion dollars', 'decades', 'Ill', 'Jacks', 'CriddleTechnology', 'Kim Catdarshian', 'Ethereum', 'Spanish', 'AI', 'This week', 'March 2020']


---