# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

In [2]:
# Read your api key environment variable
api_key = os.getenv('news_api') 



In [3]:
# Create a newsapi client
from newsapi import NewsApiClient
newsapi = NewsApiClient(api_key=api_key)

In [4]:
# Fetch the Bitcoin news articles
btc_headlines = newsapi.get_everything(
    q="bitcoin",
    language="en",
    sort_by="relevancy"
)

In [5]:
# Fetch the Ethereum news articles
eth_headlines = newsapi.get_everything(
    q="ethereum",
    language="en",
    sort_by="relevancy"
)

In [6]:
eth_headlines 

{'status': 'ok',
 'totalResults': 2773,
 'articles': [{'source': {'id': 'techcrunch', 'name': 'TechCrunch'},
   'author': 'Lucas Matney',
   'title': 'Offchain Labs raises $120 million to hide Ethereum’s shortcomings with its Arbitrum product',
   'description': 'As the broader crypto world enjoys a late summer surge in enthusiasm, more and more blockchain developers who have taken the plunge are bumping into the blaring scaling issues faced by decentralized apps on the Ethereum blockchain. The popular network has see…',
   'url': 'http://techcrunch.com/2021/08/31/offchain-labs-raises-120-million-to-hide-ethereums-shortcomings-with-arbitrum-scaling-product/',
   'urlToImage': 'https://techcrunch.com/wp-content/uploads/2021/08/Image-from-iOS-5.jpg?w=533',
   'publishedAt': '2021-08-31T12:30:39Z',
   'content': 'As the broader crypto world enjoys a late summer surge in enthusiasm, more and more blockchain developers who have taken the plunge are bumping into the blaring scaling issues fa

In [7]:
# Create the Bitcoin sentiment scores DataFrame
sentiments = []

for articles in btc_headlines["articles"]:
    try:
        text = articles["content"]
        results = analyzer.polarity_scores(text)
        compound = results["compound"]
        pos = results["pos"]
        neu = results["neu"]
        neg = results["neg"]

        sentiments.append({
            "text": text,
            "Compound": compound,
            "Positive": pos,
            "Negative": neg,
            "Neutral": neu,
        })
    except AttributeError:
        pass
    
btc  = pd.DataFrame(sentiments)
btc.head(20)

Unnamed: 0,text,Compound,Positive,Negative,Neutral
0,PayPal will now allow users outside the U.S. t...,0.4215,0.098,0.0,0.902
1,A recently-installed Bitcoin ATM.\r\n\n \n\n A...,0.1779,0.052,0.0,0.948
2,The government of El Salvador purchased at lea...,0.128,0.046,0.0,0.954
3,Retailers are increasingly accepting cryptocur...,0.6187,0.153,0.0,0.847
4,"PayPal is bringing the ability to buy, hold an...",0.6908,0.161,0.0,0.839
5,By Joe TidyCyber reporter \r\nTaxi driver Chri...,-0.296,0.053,0.114,0.833
6,New York (CNN Business)It's a volatile day for...,0.2023,0.074,0.054,0.872
7,"8. Were just days into September, and its time...",0.0,0.0,0.0,1.0
8,A representation of cryptocurrency Bitcoin is ...,0.0,0.0,0.0,1.0
9,"PayPal is to allow users in the UK to buy, hol...",0.4215,0.095,0.0,0.905


In [8]:
# Create the Ethereum sentiment scores DataFrame
sentiments = []

for articles in eth_headlines["articles"]:
    try:
        text = articles["content"]
        results = analyzer.polarity_scores(text)
        compound = results["compound"]
        pos = results["pos"]
        neu = results["neu"]
        neg = results["neg"]

        sentiments.append({
            "text": text,
            "Compound": compound,
            "Positive": pos,
            "Negative": neg,
            "Neutral": neu,
        })
    except AttributeError:
        pass
    
eth  = pd.DataFrame(sentiments)
eth.head(20)

Unnamed: 0,text,Compound,Positive,Negative,Neutral
0,As the broader crypto world enjoys a late summ...,0.7351,0.167,0.0,0.833
1,PayPal will now allow users outside the U.S. t...,0.4215,0.098,0.0,0.902
2,"PayPal is bringing the ability to buy, hold an...",0.6908,0.161,0.0,0.839
3,One of the most unusual cryptocurrency heists ...,-0.1027,0.0,0.043,0.957
4,"Vitalik Buterin, founder of ethereum, during T...",0.0,0.0,0.0,1.0
5,Justin Sullivan/Getty Images\r\nCitigroup is a...,0.3182,0.076,0.0,0.924
6,"LONDON, Aug 25 (Reuters) - Tags for identifyin...",0.4404,0.136,0.0,0.864
7,Solana's SOL token surged above $100 for the f...,0.4588,0.081,0.0,0.919
8,PayPal launched its crypto services in the UKi...,0.2263,0.094,0.0,0.906
9,More than $144 million worth of ether has been...,0.2263,0.048,0.0,0.952


In [9]:
# Describe the Bitcoin Sentiment
btc.describe()

Unnamed: 0,Compound,Positive,Negative,Neutral
count,20.0,20.0,20.0,20.0
mean,0.154425,0.06185,0.02145,0.9167
std,0.307322,0.046084,0.037859,0.049952
min,-0.5719,0.0,0.0,0.833
25%,0.0,0.044,0.0,0.89775
50%,0.128,0.0505,0.0,0.9075
75%,0.4068,0.09425,0.043,0.951
max,0.6908,0.161,0.115,1.0


In [10]:
# Describe the Ethereum Sentiment
eth.describe()

Unnamed: 0,Compound,Positive,Negative,Neutral
count,20.0,20.0,20.0,20.0
mean,0.19373,0.0672,0.0241,0.9087
std,0.403785,0.065592,0.071127,0.077148
min,-0.8934,0.0,0.0,0.688
25%,0.0,0.0,0.0,0.8925
50%,0.2263,0.062,0.0,0.9225
75%,0.445,0.095,0.0,0.9525
max,0.8126,0.219,0.312,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: Eth by only a small amount @ 0.067200

Q: Which coin had the highest compound score?

A: Eth @ 0.81260

Q. Which coin had the highest positive score?

A: Eth @ 0.219

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [11]:
#Install nltk module and wordlist in terminal

!pip install --user -U nltk

^C


In [12]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [13]:
# Instantiate the lemmatizer
wnl = WordNetLemmatizer()

# Create a list of stopwords
stop = stopwords.words('english')

# Expand the default stopwords list if necessary
stop.append("u")
stop.append("it'")
stop.append("'s")
stop.append("n't")
stop.append('…')
stop.append("\`")
stop.append('``')
stop.append('char')
stop.append("''")
stop = set(stop)

In [14]:
btc['text']

0     PayPal will now allow users outside the U.S. t...
1     A recently-installed Bitcoin ATM.\r\n\n \n\n A...
2     The government of El Salvador purchased at lea...
3     Retailers are increasingly accepting cryptocur...
4     PayPal is bringing the ability to buy, hold an...
5     By Joe TidyCyber reporter \r\nTaxi driver Chri...
6     New York (CNN Business)It's a volatile day for...
7     8. Were just days into September, and its time...
8     A representation of cryptocurrency Bitcoin is ...
9     PayPal is to allow users in the UK to buy, hol...
10    T-Mobile will offer two years of free identity...
11    Posted \r\nEl Zonte, El Salvador, home to 'Bit...
12    Aug 27 (Reuters) - The first cryptocurrency AT...
13    Aug 27 (Reuters) - The first cryptocurrency AT...
14    Posted \r\nEl Zonte, El Salvador, home to 'Bit...
15    Twitter's latest beta update introduces suppor...
16    Bitcoin plunged as much as 17 per cent to its ...
17    Plunges as much as 17 per cent to US$43,05

In [15]:
# Complete the tokenizer function







In [16]:
# Create a new tokens column for Bitcoin


In [17]:
# Create a new tokens column for Ethereum


---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [18]:
from collections import Counter
from nltk import ngrams

In [19]:
# Generate the Bitcoin N-grams where N=2
N = 2
grams = ngrams(tokenizer(btc.text.str.cat()), N)
Counter(grams).most_common(20)

NameError: name 'tokenizer' is not defined

In [None]:
# Generate the Ethereum N-grams where N=2
N = 2
grams = ngrams(tokenizer(eth.text.str.cat()), N)
Counter(grams).most_common(20)

In [None]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [None]:
# Use token_count to get the top 10 words for Bitcoin


In [None]:
# Use token_count to get the top 10 words for Ethereum


---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the function for word cloud
def wordcloud(text, title=""):
    df_cloud = WordCloud(width=500, colormap='RdYlBu').generate(text)
    plt.imshow(df_cloud)
    plt.axis("off")
    fontdict = {"fontsize": 48, "fontweight" : "bold"}
    plt.title(title, fontdict=fontdict)
    plt.show()

In [None]:
# Generate the BTC word cloud calling the function 
 wordcloud(btc.text.str.cat(), title="Bitcoin Word Cloud")

In [None]:
# Generate the Ethereum word cloud calling the function 
 wordcloud(eth.text.str.cat(), title="Ethereum Word Cloud)

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [None]:
import spacy
from spacy import displacy

In [None]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [None]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [None]:
# Concatenate all of the Bitcoin text together


In [None]:
# Run the NER processor on all of the text


# Add a title to the document


In [None]:
# Render the visualization


In [None]:
# List all Entities


---

### Ethereum NER

In [None]:
# Concatenate all of the Ethereum text together


In [None]:
# Run the NER processor on all of the text


# Add a title to the document


In [None]:
# Render the visualization


In [None]:
# List all Entities


---