# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from pathlib import Path
from newsapi import NewsApiClient
from dotenv import load_dotenv
from nltk.corpus import stopwords, reuters
load_dotenv()
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/samuelarciniega/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Read your api key environment variable
# YOUR CODE HERE!
api_key = os.getenv("news_api")
print(api_key)

6dde5b0486654982bdc6ebe38ab427a4


In [3]:
# Create a newsapi client
# YOUR CODE HERE!
newsapi = NewsApiClient(api_key=api_key)

In [4]:
# Fetch the Bitcoin news articles
# YOUR CODE HERE!
bitcoin_news = newsapi.get_everything(q = "bitcoin AND Bitcoin", language="en",page_size=100, sort_by="relevancy")
print(f"Total articles about Bitcoin: {bitcoin_news['totalResults']}")
bitcoin_news["articles"][0]

Total articles about Bitcoin: 12407


{'source': {'id': 'engadget', 'name': 'Engadget'},
 'author': 'https://www.engadget.com/about/editors/richard-lawler',
 'title': "Tesla 'suspends' Bitcoin car purchases citing environmental impact",
 'description': "You can't buy a Tesla with Bitcoin anymore..",
 'url': 'https://www.engadget.com/elon-musk-bitcoin-221708146.html',
 'urlToImage': 'https://s.yimg.com/os/creatr-uploaded-images/2021-05/a0f90c30-b36f-11eb-aff6-04fb28cf2f4b',
 'publishedAt': '2021-05-12T22:17:08Z',
 'content': 'Just weeks after Tesla started accepting Bitcoin as currency for cars, Elon Musk revealed in a tweet that it will "suspend" the effort. According to the release (Tesla does not appear to have a funct… [+768 chars]'}

In [5]:
# Fetch the Ethereum news articles
# YOUR CODE HERE!
ethereum_news = newsapi.get_everything(q = "ethereum AND Ethereum", language="en",page_size=100, sort_by="relevancy")
print(f"Total articles about Ethereum: {bitcoin_news['totalResults']}")
ethereum_news["articles"][0]

Total articles about Ethereum: 12407


{'source': {'id': 'techcrunch', 'name': 'TechCrunch'},
 'author': 'Manish Singh',
 'title': 'Vitalik Buterin donates $1 billion worth of ‘meme coins’ to India Covid Relief Fund',
 'description': 'Vitalik Buterin, the creator of Ethereum, on Wednesday donated Ethereum and “meme coins” worth $1.5 billion in one of the largest-ever individual philanthropy efforts. Buterin transferred 500 ETH and over 50 trillion SHIB (Shiba Inu), a meme coin, worth aroun…',
 'url': 'http://techcrunch.com/2021/05/12/vitalik-buterin-donates-1-billion-worth-of-meme-coins-to-india-covid-relief-fund/',
 'urlToImage': 'https://techcrunch.com/wp-content/uploads/2017/09/vitalik-buterin-147a2566.jpg?w=600',
 'publishedAt': '2021-05-12T22:46:10Z',
 'content': 'Vitalik Buterin, the creator of Ethereum, on Wednesday donated Ethereum and meme coins worth $1.5 billion in one of the largest-ever individual philanthropy efforts.\r\nButerin transferred 500 ETH and … [+1667 chars]'}

In [6]:
# Create the Bitcoin sentiment scores DataFrame
# YOUR CODE HERE!
bitcoin_sentiments = []
for article in bitcoin_news["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]

        bitcoin_sentiments.append({
            "Text": text,
            "Date": date,
            "Compound":compound,
            "Positive": pos,
            "Negative": neg,
            "Neutral": neu
        })
    except AttributeError:
        pass
bitcoin_df = pd.DataFrame(bitcoin_sentiments)

columns = ["Date", "Text", "Compound", "Positive", "Negative", "Neutral"]
bitcoin_df = bitcoin_df[columns]

bitcoin_df.head()


Unnamed: 0,Date,Text,Compound,Positive,Negative,Neutral
0,2021-05-12,Just weeks after Tesla started accepting Bitco...,0.3818,0.071,0.0,0.929
1,2021-05-12,Image: Tesla\r\n\n \n\n Tesla has stopped acce...,0.4939,0.134,0.05,0.816
2,2021-05-19,Illustration by Alex Castro / The Verge\r\n\n ...,0.0,0.0,0.0,1.0
3,2021-05-11,"Mark Zuckerberg posted a picture of his two, f...",0.8455,0.217,0.0,0.783
4,2021-05-17,"Last week, whenElon Musk tweeted that he had s...",0.4754,0.075,0.0,0.925


In [7]:
# Create the Ethereum sentiment scores DataFrame
# YOUR CODE HERE!
ethereum_sentiments = []
for article in ethereum_news["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]

        ethereum_sentiments.append({
            "Text": text,
            "Date": date,
            "Compound":compound,
            "Positive": pos,
            "Negative": neg,
            "Neutral": neu
        })
    except AttributeError:
        pass
ethereum_df = pd.DataFrame(ethereum_sentiments)

columns = ["Date", "Text", "Compound", "Positive", "Negative", "Neutral"]
ethereum_df = ethereum_df[columns]

ethereum_df.head()

Unnamed: 0,Date,Text,Compound,Positive,Negative,Neutral
0,2021-05-12,"Vitalik Buterin, the creator of Ethereum, on W...",0.2263,0.06,0.0,0.94
1,2021-05-15,Solana isn’t known yet outside of the crypto c...,0.5499,0.106,0.0,0.894
2,2021-05-19,"Bitcoin, Ethereum and a host of Altcoins suffe...",-0.2023,0.066,0.087,0.847
3,2021-05-20,,0.0,0.0,0.0,0.0
4,2021-05-31,A representation of virtual currency Ethereum ...,0.0,0.0,0.0,1.0


In [8]:
# Describe the Bitcoin Sentiment
# YOUR CODE HERE!
bitcoin_df.describe()

Unnamed: 0,Compound,Positive,Negative,Neutral
count,100.0,100.0,100.0,100.0
mean,0.025316,0.053,0.04602,0.90101
std,0.40231,0.064962,0.05317,0.078485
min,-0.7627,0.0,0.0,0.677
25%,-0.2732,0.0,0.0,0.84675
50%,0.0,0.0405,0.0445,0.9135
75%,0.35045,0.078,0.0795,0.95875
max,0.8455,0.275,0.203,1.0


In [9]:
# Describe the Ethereum Sentiment
# YOUR CODE HERE!
ethereum_df.describe()

Unnamed: 0,Compound,Positive,Negative,Neutral
count,100.0,100.0,100.0,100.0
mean,0.098011,0.06079,0.0351,0.8941
std,0.366118,0.062322,0.055123,0.119086
min,-0.8689,0.0,0.0,0.0
25%,-0.102375,0.0,0.0,0.85075
50%,0.0,0.06,0.0,0.9195
75%,0.4019,0.09725,0.069,0.95525
max,0.7783,0.246,0.286,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: Bitcoin

Q: Which coin had the highest compound score?

A: Bitcoin

Q. Which coin had the highest positive score?

A:Bitcoin 

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [10]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [11]:
# Instantiate the lemmatizer
# YOUR CODE HERE!
lemmatizer = WordNetLemmatizer()
# Create a list of stopwords
# YOUR CODE HERE!
sw = set(stopwords.words('english'))
# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [12]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub(' ', text)
   
    # Create a tokenized list of the words
    words = word_tokenize(re_clean.lower())
        
    # Lemmatize words into root words
    lem = [lemmatizer.lemmatize(word) for word in words]

    # Convert the words to lowercase
    tokens = [word.lower() for word in lem if word.lower() not in sw]
    
    # Remove the stop words
    #sw = set(stopwords.words('english'))
    
    return tokens
tokenizer(bitcoin_df.iloc[0]["Text"])

['week',
 'tesla',
 'started',
 'accepting',
 'bitcoin',
 'currency',
 'car',
 'elon',
 'musk',
 'revealed',
 'tweet',
 'suspend',
 'effort',
 'according',
 'release',
 'tesla',
 'doe',
 'appear',
 'funct',
 'char']

In [13]:
# Create a new tokens column for Bitcoin
#bitcoin_df["Text"].apply(tokenizer)
bitcoin_df["tokens"] = bitcoin_df["Text"].apply(tokenizer)
# YOUR CODE HERE!
bitcoin_df.head()
#tokenizer(bitcoin_df["tokens"])

Unnamed: 0,Date,Text,Compound,Positive,Negative,Neutral,tokens
0,2021-05-12,Just weeks after Tesla started accepting Bitco...,0.3818,0.071,0.0,0.929,"[week, tesla, started, accepting, bitcoin, cur..."
1,2021-05-12,Image: Tesla\r\n\n \n\n Tesla has stopped acce...,0.4939,0.134,0.05,0.816,"[image, tesla, tesla, ha, stopped, accepting, ..."
2,2021-05-19,Illustration by Alex Castro / The Verge\r\n\n ...,0.0,0.0,0.0,1.0,"[illustration, alex, castro, verge, cryptocurr..."
3,2021-05-11,"Mark Zuckerberg posted a picture of his two, f...",0.8455,0.217,0.0,0.783,"[mark, zuckerberg, posted, picture, two, frank..."
4,2021-05-17,"Last week, whenElon Musk tweeted that he had s...",0.4754,0.075,0.0,0.925,"[last, week, whenelon, musk, tweeted, spoken, ..."


In [14]:
# Create a new tokens column for Ethereum
#ethereum_df["Text"].apply(tokenizer)
ethereum_df["tokens"] = ethereum_df["Text"].apply(tokenizer)
# YOUR CODE HERE!
ethereum_df.head()

Unnamed: 0,Date,Text,Compound,Positive,Negative,Neutral,tokens
0,2021-05-12,"Vitalik Buterin, the creator of Ethereum, on W...",0.2263,0.06,0.0,0.94,"[vitalik, buterin, creator, ethereum, wednesda..."
1,2021-05-15,Solana isn’t known yet outside of the crypto c...,0.5499,0.106,0.0,0.894,"[solana, known, yet, outside, crypto, communit..."
2,2021-05-19,"Bitcoin, Ethereum and a host of Altcoins suffe...",-0.2023,0.066,0.087,0.847,"[bitcoin, ethereum, host, altcoins, suffered, ..."
3,2021-05-20,,0.0,0.0,0.0,0.0,[]
4,2021-05-31,A representation of virtual currency Ethereum ...,0.0,0.0,0.0,1.0,"[representation, virtual, currency, ethereum, ..."


---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [15]:
from collections import Counter
from nltk import ngrams

In [16]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!
btc_ngrams = Counter(ngrams(bitcoin_df, n=2))
print(dict(btc_ngrams))

{('Date', 'Text'): 1, ('Text', 'Compound'): 1, ('Compound', 'Positive'): 1, ('Positive', 'Negative'): 1, ('Negative', 'Neutral'): 1, ('Neutral', 'tokens'): 1}


In [17]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!
eth_ngrams = Counter(ngrams(ethereum_df, n=2))
print(dict(eth_ngrams))

{('Date', 'Text'): 1, ('Text', 'Compound'): 1, ('Compound', 'Positive'): 1, ('Positive', 'Negative'): 1, ('Negative', 'Neutral'): 1, ('Neutral', 'tokens'): 1}


In [23]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [19]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!
bitcoin_10 = token_count(btc_ngrams, N=10)
print(bitcoin_10)

[(('Date', 'Text'), 1), (('Text', 'Compound'), 1), (('Compound', 'Positive'), 1), (('Positive', 'Negative'), 1), (('Negative', 'Neutral'), 1), (('Neutral', 'tokens'), 1)]


In [20]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!
ethereum_10 = token_count(eth_ngrams, N=10)
print(ethereum_10)

[(('Date', 'Text'), 1), (('Text', 'Compound'), 1), (('Compound', 'Positive'), 1), (('Positive', 'Negative'), 1), (('Negative', 'Neutral'), 1), (('Neutral', 'tokens'), 1)]


---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [21]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [22]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!
bitcoin_word_cloud = WordCloud().generate(bitcoin_df["tokens"])

TypeError: expected string or bytes-like object

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [28]:
import spacy
from spacy import displacy

In [25]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [29]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [30]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!
bitcoin_text = " ".join(bitcoin_df["Text"])
bitcoin_text

n market cap. The overall crypto market shrunk m… [+1182 chars] Three years after its inception, crypto financial service provider Babel Finance is racking up fundings and partnerships from major institutional investors. The startup said Monday that it has closed… [+2610 chars] Now isn\'t the greatest time to be a cryptocurrency trader. Coinbase suffered an hours-long outage this morning (May 19th) that hindered transactions on the exchange. The company had pinpointed the ma… [+909 chars] In spite of the environmental and regulatory ills that generally come with crypto as a currency, PayPals bitcoin ambitions keep on ramping up. On Wednesday, Jose Fernandez da Pontethe companys VP of … [+2193 chars] Hello friends, and welcome back to Week in Review!\r\nLast week, I wrote about tech taking on Disney. This week, I’m talking about the search for a new crypto messiah.\r\nIf youre reading this on the Tec… [+7741 chars] When it comes to ransomware, you don\'t always get what you pay for.\xa0

In [31]:
# Run the NER processor on all of the text
# YOUR CODE HERE!
doc = nlp(bitcoin_text)
# Add a title to the document
doc.user_data["title"] = "BTC NER"
# YOUR CODE HERE!

In [34]:
# Render the visualization
# YOUR CODE HERE!
displacy.render(doc, style="ent")

In [30]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [35]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!
ethereum_text = " ".join(ethereum_df["Text"])
ethereum_text

siaPac\r\nBitcoin has long been the dominant cryptocurrency, but recently Ethereum\'s native token, ether, has emerged as more than just a clear number two. \r\nIn 2021… [+7281 chars] Ethereum will reduce its energy consumption by 99.95% following its transition to proof-of-stake, according to a new blog post from Carl Beekhuizen of the Ethereum Foundation. Beekhuizen estimated th… [+738 chars] This new mining feature is called \'Norton Crypto\' and will be rolling out tomorrow to Norton 360 users enrolled in Norton\'s early adopter program. When Norton Crypto is enabled, the software will use… [+631 chars] "We are building a team" the page declares, stating: "We welcome exceptional engineers (solidity, react, python), designers, gamers, marketers, and community leaders. If you want to join our team, se… [+478 chars] Representations of virtual currency Bitcoin are placed on U.S. Dollar banknotes in this illustration taken May 26, 2020. REUTERS/Dado Ruvic/File PhotoBitcoin hit record ou

In [36]:
# Run the NER processor on all of the text
# YOUR CODE HERE!
doc = nlp(ethereum_text)
# Add a title to the document
doc.user_data["title"] = "ETH NER"
# Add a title to the document
# YOUR CODE HERE!

In [37]:
# Render the visualization
# YOUR CODE HERE!
displacy.render(doc, style="ent")

In [34]:
# List all Entities
# YOUR CODE HERE!

---