# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the news api to pull the latest news articles for bitcoin and ethereum and create a DataFrame of sentiment scores for each coin. 

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Anirban\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Load environment variables and read api key environment variable
load_dotenv()
api_key = os.getenv("NEWS_API_KEY")

In [3]:
# Create a newsapi client
from newsapi import NewsApiClient
newsapi = NewsApiClient(api_key=api_key)

I checked number of Bitcoin news articles in 2020 and 2019, it gave only 1 news article each.
I also checked the volume of top headlines and again I got only one news article. Adding Canada, I got no news. Therefore, I took all the English news on Bitcoin as below

In [4]:
# Fetch the Bitcoin news articles
Btc_articles = newsapi.get_everything(q="Bitcoin OR bitcoin OR BITCOIN", language="en",page_size=100,sort_by="relevancy")
# Show the total number of news and a sample news article
print(f"Total Bitcoin articles in English: {Btc_articles['totalResults']}")
Btc_articles["articles"][0]

Total Bitcoin articles in English: 4058


{'source': {'id': 'techcrunch', 'name': 'TechCrunch'},
 'author': 'Jonathan Shieber',
 'title': 'Casa pivots to provide self-custody services to secure bitcoin',
 'description': 'Casa, a Colorado-based provider of bitcoin security services, is launching a managed service allowing customers to buy and hold their own bitcoin, rather than using an external custodian like Coinbase. “With self-custody using Casa it’s impossible to be hacke…',
 'url': 'http://techcrunch.com/2020/08/06/casa-pivots-to-provide-self-custody-services-to-secure-bitcoin/',
 'urlToImage': 'https://techcrunch.com/wp-content/uploads/2019/06/GettyImages-1050523528.jpg?w=600',
 'publishedAt': '2020-08-06T18:25:29Z',
 'content': 'Casa, a Colorado-based provider of bitcoin security services, is launching a managed service allowing customers to buy and hold their own bitcoin, rather than using an external custodian like Coinbas… [+1571 chars]'}

In [5]:
# Fetch the Ethereum news articles
Eth_articles = newsapi.get_everything(q="Ethereum OR ethereum OR ETHEREUM", language="en",page_size=100,sort_by="relevancy")
# Show the total number of news and a sample news article
print(f"Total Ethereum articles in English: {Eth_articles['totalResults']}")
Eth_articles["articles"][0]

Total Ethereum articles in English: 1369


{'source': {'id': 'mashable', 'name': 'Mashable'},
 'author': 'Stan Schroeder',
 'title': 'Crypto wallet MetaMask finally launches on iOS and Android, and it supports Apple Pay',
 'description': "If you've interacted with cryptocurrencies in the past couple of years, there's a good chance you've used MetaMask. It's a cryptocurrency wallet in the form of a browser extension that supports Ethereum and its ecosystem, making it easy to connect with a dece…",
 'url': 'https://mashable.com/article/metamask-ios-android/',
 'urlToImage': 'https://mondrian.mashable.com/2020%252F09%252F02%252Ffd%252Fe724b5edb4b644dba45958e17ad591e1.6b9c6.png%252F1200x630.png?signature=xIKBM112GVhTA9mUq0DRjCVGWSE=',
 'publishedAt': '2020-09-02T16:00:00Z',
 'content': "If you've interacted with cryptocurrencies in the past couple of years, there's a good chance you've used MetaMask. It's a cryptocurrency wallet in the form of a browser extension that supports Ether… [+2291 chars]"}

In [6]:
# Create the Bitcoin sentiment scores DataFrame
btc_sentiments = []

for article in Btc_articles["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        btc_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
btc_df = pd.DataFrame(btc_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
btc_df = btc_df[cols]

btc_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2020-08-06,"Casa, a Colorado-based provider of bitcoin sec...",0.5994,0.149,0.0,0.851
1,2020-08-06,"The question still remained, though, whether a...",-0.0516,0.065,0.071,0.864
2,2020-08-23,“The COVID-19 pandemic has resulted in a mass ...,0.2732,0.063,0.0,0.937
3,2020-08-07,In what appears to be a massive coordinated st...,-0.128,0.0,0.046,0.954
4,2020-08-17,LONDON (Reuters) - Bitcoin jumped to its highe...,0.3818,0.069,0.0,0.931


In [7]:
# Create the Ethereum sentiment scores DataFrame
eth_sentiments = []

for article in Eth_articles["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        eth_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
eth_df = pd.DataFrame(eth_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
eth_df = eth_df[cols]
eth_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2020-09-02,If you've interacted with cryptocurrencies in ...,0.7506,0.209,0.0,0.791
1,2020-08-17,TL;DR: The Mega Blockchain Mastery Bundle is o...,0.0,0.0,0.0,1.0
2,2020-08-26,LONDON (Reuters) - It sounds like a surefire b...,0.7579,0.181,0.0,0.819
3,2020-08-25,NEW YORK (Reuters) - Brooklyn-based technology...,0.0,0.0,0.0,1.0
4,2020-08-19,An outspoken Bitcoin whale who rarely shows af...,-0.2677,0.045,0.074,0.881


In [8]:
# Describe the Bitcoin Sentiment
btc_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,98.0,98.0,98.0,98.0
mean,0.167855,0.082837,0.0495,0.867694
std,0.438851,0.057982,0.065199,0.076577
min,-0.8658,0.0,0.0,0.588
25%,-0.06945,0.03175,0.0,0.8195
50%,0.2714,0.09,0.0,0.88
75%,0.507,0.1375,0.085,0.91
max,0.7901,0.222,0.294,1.0


In [9]:
# Describe the Ethereum Sentiment
eth_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,94.0,94.0,94.0,94.0
mean,0.096049,0.066298,0.044649,0.889043
std,0.435816,0.069708,0.067242,0.085338
min,-0.91,0.0,0.0,0.689
25%,-0.0708,0.0,0.0,0.8315
50%,0.0,0.065,0.0,0.9035
75%,0.447375,0.095,0.07475,0.94975
max,0.8519,0.311,0.309,1.0


# Questions:

Q: Which coin had the highest mean positive score?

A: Bitcoin mean positive score is 0.082837 which is higher than that of Ethereum which is 0.066298

Q: Which coin had the highest compound score?

A: Ethereum max compund score is 0.851900 which is better than of Bitcoin

Q. Which coin had the highest positive score?

A: Ethereum's highest positive score is 0.089 more than that of Bitcoin  

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word
2. Remove Punctuation
3. Remove Stopwords

In [10]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [11]:
# Expand the default stopwords list if necessary
lemmatizer = WordNetLemmatizer()
sw_addons = {'profanities', 'anti-Semitic', 'homophobic', 'phishing', 'cybercriminals', 'hacker', 'hacker', 'mastermind'}

In [12]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Create a list of the words
    regex = re.compile("[^a-zA-Z ]") #any character that IS NOT a-z OR A-Z
               
    # Remove the punctuation
    re_clean = regex.sub('', text) 
    words = word_tokenize(re_clean)
    
    # Remove the stop words
    sw = set(stopwords.words('english'))
    
    # Lemmatize Words into root words
    lem = [lemmatizer.lemmatize(word) for word in words]
    
    # Convert the words to lowercase and return only if the words in not in stopwords
    tokens = [word.lower() for word in words if word.lower() not in sw.union(sw_addons)]
    return tokens

In [14]:
# Create a new tokens column for Bitcoin
btc_df["tokens"] = btc_df["text"].apply(tokenizer)
btc_df.head()
# Save the dataframe as a csv file
btc_df.to_csv("btc_df.csv")

In [15]:
# Create a new tokens column for Ethereum
eth_df["tokens"] = eth_df["text"].apply(tokenizer)
eth_df.head()
# Save the dataframe as a csv file
eth_df.to_csv("eth_df.csv")

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [16]:
from collections import Counter
from nltk import ngrams

In [None]:
# Use the token_count function to generate the top 10 words from each coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [23]:
# Generate the Bitcoin N-grams where N=2
btc_ngram = []

for i in btc_df["tokens"]:
    btc_ngram += i
# Printing top 10 Bitcoin bigrams in a dataframe
pd.DataFrame(list(dict(token_count(btc_ngram, 10)).items()), columns=['Bitcoin Top 10 bigrams', 'count'])

Unnamed: 0,Bitcoin Top 10 bigrams,count
0,chars,97
1,bitcoin,72
2,satoshi,40
3,nakaboto,40
4,wireless,24
5,charging,24
6,today,22
7,every,21
8,another,21
9,edition,21


In [20]:
# Generate the Ethereum N-grams where N=2
eth_ngram = []

for i in eth_df["tokens"]:
    eth_ngram += i
# Printing top 10 Ethereum bigrams in a dataframe
pd.DataFrame(list(dict(token_count(eth_ngram, 10)).items()), columns=['Ethereum Top 10 bigrams', 'count'])

Unnamed: 0,Ethereum Top 10 bigrams,count
0,chars,94
1,ethereum,64
2,bitcoin,24
3,blockchain,18
4,network,16
5,defi,16
6,decentralized,16
7,cryptocurrency,12
8,finance,12
9,new,11


In [21]:
# Get the top 10 words for Bitcoin
pd.DataFrame(list(dict(token_count(btc_ngram)).items()), columns=['Bitcoin Top 10 words', 'count'])

Unnamed: 0,Bitcoin Top 10 words,count
0,chars,97
1,bitcoin,72
2,satoshi,40
3,nakaboto,40
4,wireless,24
5,charging,24
6,today,22
7,every,21
8,another,21
9,edition,21


In [22]:
# Get the top 10 words for Ethereum
pd.DataFrame(list(dict(token_count(eth_ngram)).items()), columns=['Ethereum Top 10 words', 'count'])

Unnamed: 0,Ethereum Top 10 words,count
0,chars,94
1,ethereum,64
2,bitcoin,24
3,blockchain,18
4,network,16
5,defi,16
6,decentralized,16
7,cryptocurrency,12
8,finance,12
9,new,11


---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [21]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [22]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [24]:
import spacy
from spacy import displacy

In [25]:
# Optional - download a language model for SpaCy
# !python -m spacy download en_core_web_sm

In [26]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [27]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [28]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [29]:
# Render the visualization
# YOUR CODE HERE!

In [30]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [31]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [32]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [33]:
# Render the visualization
# YOUR CODE HERE!

In [34]:
# List all Entities
# YOUR CODE HERE!

---