# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

load_dotenv()
%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\alexm\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Read your api key environment variable
api_key = os.getenv("news_api")


In [3]:
# Create a newsapi client
from newsapi import NewsApiClient

newsapi = NewsApiClient(api_key=api_key)
newsapi

<newsapi.newsapi_client.NewsApiClient at 0x2906ac19490>

In [4]:
# Fetch the Bitcoin news articles
btc_headlines = newsapi.get_everything(q="Bitcoin AND bitcoin",
                                       language="en",
                                      page_size=100,
                                      sort_by='relevancy'
                                      )
btc_headlines['articles'][0]

{'source': {'id': None, 'name': 'New York Times'},
 'author': 'Katie Benner',
 'title': 'Justice Dept. Announces Raft of Changes Meant to Deter Cyberthreats',
 'description': 'The moves came a week after the department made its largest financial seizure ever, confiscating over $3.6 billion worth of Bitcoin stolen in a 2016 hacking.',
 'url': 'https://www.nytimes.com/2022/02/17/us/politics/justice-department-cybersecurity.html',
 'urlToImage': 'https://static01.nyt.com/images/2022/02/17/us/politics/17dc-justice/merlin_199612353_b05bfb07-3da8-404e-8a75-221181e5d014-facebookJumbo.jpg',
 'publishedAt': '2022-02-17T23:51:49Z',
 'content': 'Even in cyberspace, the Department of Justice is able to use a tried and true investigative technique, following the money, Ms. Monaco said. Its what led us to Al Capone in the 30s. It helped us dest… [+1176 chars]'}

In [5]:
# Show total articles that have been pulled using the newsapi.

print(f"Total articles: {btc_headlines['totalResults']}")

Total articles: 7621


In [6]:
# Fetch the Ethereum news articles
eth_headlines = newsapi.get_everything(q="Ethereum AND ethereum",
                                       language="en",
                                      page_size=100,
                                      sort_by='relevancy'
                                      )
eth_headlines['articles'][0]

{'source': {'id': None, 'name': 'Investorplace.com'},
 'author': 'InvestorPlace',
 'title': 'The Market Has Spoken, and It Says Ethereum Is Valuable',
 'description': 'Technical analysis isn’t a perfect tool, but it may point the way for Ethereum Ethereum\xa0(ETH-USD) continues to be a volatile crypto investment. Crypto is volatile by nature — I’m not setting it apart from the asset class. It\xa0 has clear catalysts as well as mu…',
 'url': 'https://investorplace.com/2022/02/the-market-has-spoken-and-it-says-ethereum-matters/',
 'urlToImage': 'https://images.readwrite.com/wp-content/uploads/2021/12/The-Role-of-Crypto-in-The-Fintech-Industry-and-The-Wider-Economy.jpg',
 'publishedAt': '2022-02-17T17:10:12Z',
 'content': 'Technical analysis isnt a perfect tool, but it may point the way for Ethereum\r\nEthereum\xa0(ETH-USD\r\n) continues to be a volatile crypto investment. Crypto is volatile by nature Im not setting it apart … [+3612 chars]'}

In [7]:
# Show total articles that have been pulled using the newsapi.
print(f"Total articles: {eth_headlines['totalResults']}")

Total articles: 3546


In [8]:
# Transformed the response dictionary into a DataFrame
btc_df = pd.DataFrame.from_dict(btc_headlines["articles"])

btc_df.head()

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': None, 'name': 'New York Times'}",Katie Benner,Justice Dept. Announces Raft of Changes Meant ...,The moves came a week after the department mad...,https://www.nytimes.com/2022/02/17/us/politics...,https://static01.nyt.com/images/2022/02/17/us/...,2022-02-17T23:51:49Z,"Even in cyberspace, the Department of Justice ..."
1,"{'id': None, 'name': 'Slashdot.org'}",EditorDavid,Why Isn't Bitcoin Booming?,"""Bitcoin was seen by many of its libertarian-l...",https://news.slashdot.org/story/22/03/12/05412...,https://a.fsdn.com/sd/topics/bitcoin_64.png,2022-03-12T18:34:00Z,"""Bitcoin was seen by many of its libertarian-l..."
2,"{'id': 'reuters', 'name': 'Reuters'}",,CRYPTOVERSE-Bitcoin could be laid low by miner...,Bitcoin miners are feeling the heat - and the ...,https://www.reuters.com/markets/europe/cryptov...,https://www.reuters.com/resizer/9nBpgfg7pSfpPQ...,2022-02-22T06:17:00Z,Feb 22 (Reuters) - Bitcoin miners are feeling ...
3,"{'id': 'reuters', 'name': 'Reuters'}",,Cryptoverse: Bitcoin gains conflict currency c...,Bitcoin has leapt since Russia's invasion of U...,https://www.reuters.com/markets/europe/cryptov...,https://www.reuters.com/pf/resources/images/re...,2022-03-01T06:10:00Z,March 1 (Reuters) - Bitcoin has leapt since Ru...
4,"{'id': None, 'name': 'CNET'}",Julian Dossett,Cryptocurrency Donations Pour Into Ukraine. Th...,"Nonfungible Tidbits: All the bitcoin, cryptocu...",https://www.cnet.com/personal-finance/crypto/c...,https://www.cnet.com/a/img/ZqMWuhVOxJ8FqLi8Vs9...,2022-03-05T14:45:07Z,Getty\r\nWelcome to Nonfungible Tidbits. Our f...


In [9]:
# Create the Bitcoin sentiment scores DataFrame

_sentiments = []

for article in btc_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)  #the VADER sentiment scores are retrieved
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        _sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
# Create DataFrame
btc_sent_df = pd.DataFrame(_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
btc_sent_df = btc_sent_df[cols]

btc_sent_df.head()    

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2022-02-17,"Even in cyberspace, the Department of Justice ...",0.7351,0.147,0.0,0.853
1,2022-03-12,"""Bitcoin was seen by many of its libertarian-l...",-0.7713,0.0,0.169,0.831
2,2022-02-22,Feb 22 (Reuters) - Bitcoin miners are feeling ...,-0.1779,0.046,0.067,0.887
3,2022-03-01,March 1 (Reuters) - Bitcoin has leapt since Ru...,0.0,0.0,0.0,1.0
4,2022-03-05,Getty\r\nWelcome to Nonfungible Tidbits. Our f...,-0.6808,0.074,0.187,0.739


In [10]:
# Transformed the response dictionary into a DataFrame

eth_df = pd.DataFrame.from_dict(eth_headlines["articles"])

eth_df.head()

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': None, 'name': 'Investorplace.com'}",InvestorPlace,"The Market Has Spoken, and It Says Ethereum Is...","Technical analysis isn’t a perfect tool, but i...",https://investorplace.com/2022/02/the-market-h...,https://images.readwrite.com/wp-content/upload...,2022-02-17T17:10:12Z,"Technical analysis isnt a perfect tool, but it..."
1,"{'id': 'wired', 'name': 'Wired'}",Shanti Escalante-De Mattei,Web3 Threatens to Segregate Our Online Lives,Governance tokens seem like a tantalizing solu...,https://www.wired.com/story/web3-governance-to...,https://media.wired.com/photos/621d66c7ea3b8f2...,2022-03-01T14:00:00Z,"In February, shit hit the fan in the usual way..."
2,"{'id': 'business-insider', 'name': 'Business I...",insider@insider.com (Adam Morgan McCarthy),Colorado will accept crypto for payment of sta...,But the state of Colorado won't hold ethereum ...,https://markets.businessinsider.com/news/curre...,https://i.insider.com/620d0171da5ac00018fe85d9...,2022-02-16T15:36:57Z,People in Colorado will be able to pay their s...
3,"{'id': 'business-insider', 'name': 'Business I...",prosen@insider.com (Phil Rosen),Coinbase earnings show trading of ethereum and...,Ethereum trading volume increased from 15% to ...,https://markets.businessinsider.com/news/curre...,https://i.insider.com/62190267d0009b001904bd96...,2022-02-25T17:02:30Z,Coinbase reported that the share of trading vo...
4,"{'id': 'the-verge', 'name': 'The Verge'}",Elizabeth Lopatto,How Ukrainians are fundraising in cryptocurrency,Millions of dollars of cryptocurrency have flo...,https://www.theverge.com/2022/2/26/22952357/uk...,https://cdn.vox-cdn.com/thumbor/teEVxppIZ_JTW-...,2022-02-26T20:29:04Z,Illustration by James Bareham / The Verge\r\n\...


In [11]:
# Create the Ethereum sentiment scores DataFrame

_sentiments = []

for article in eth_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)  #the VADER sentiment scores are retrieved
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        _sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
# Create DataFrame
eth_sent_df = pd.DataFrame(_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
eth_sent_df = eth_sent_df[cols]

eth_sent_df.head()    

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2022-02-17,"Technical analysis isnt a perfect tool, but it...",-0.2498,0.0,0.059,0.941
1,2022-03-01,"In February, shit hit the fan in the usual way...",-0.3182,0.059,0.093,0.848
2,2022-02-16,People in Colorado will be able to pay their s...,-0.1027,0.0,0.036,0.964
3,2022-02-25,Coinbase reported that the share of trading vo...,0.6705,0.188,0.0,0.812
4,2022-02-26,Illustration by James Bareham / The Verge\r\n\...,-0.4588,0.0,0.083,0.917


In [12]:
# Describe the Bitcoin Sentiment
btc_sent_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,100.0,100.0,100.0,100.0
mean,0.060608,0.07068,0.05046,0.87884
std,0.439524,0.070591,0.06324,0.087367
min,-0.8957,0.0,0.0,0.627
25%,-0.2732,0.0,0.0,0.83
50%,0.0,0.062,0.0165,0.8905
75%,0.4068,0.099,0.083,0.94075
max,0.91,0.301,0.265,1.0


In [13]:
# Describe the Ethereum Sentiment
eth_sent_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,100.0,100.0,100.0,100.0
mean,0.156885,0.08026,0.04093,0.87879
std,0.43207,0.069516,0.061892,0.08474
min,-0.9136,0.0,0.0,0.627
25%,-0.0129,0.0,0.0,0.836
50%,0.1779,0.072,0.0,0.8785
75%,0.5106,0.12525,0.0645,0.935
max,0.8625,0.29,0.312,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: Ethereum had the highest mean positive score: 0.08

Q: Which coin had the highest compound score?

A: Bitcoin had the highest compound score: 0.91

Q. Which coin had the highest positive score?

A: Bitcoin had the highest positive score: 0.3

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [17]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\alexm\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [None]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

In [None]:
# Create a list of stopwords
print(stopwords.words('english'))
# Expand the default stopwords list if necessary

In [None]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text

   
    # Create a tokenized list of the words
    
    
    # Lemmatize words into root words

   
    # Convert the words to lowercase
    
    
    # Remove the stop words
    
    
    return tokens

In [None]:
# Create a new tokens column for Bitcoin
# YOUR CODE HERE!

In [None]:
# Create a new tokens column for Ethereum
# YOUR CODE HERE!

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [18]:
from collections import Counter
from nltk import ngrams

In [19]:
def btc_process_text(btc_headlines):
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z ]")
    re_clean= regex.sub('', btc_headlines)
    words = word_tokenize(re_clean)
    lem = [lemmatizer.lemmatize(word) for word in words]
    output = [word.lower() for word in lem if word.lower() not in sw]
    return output

In [20]:
btc_processed = btc_process_text(btc_headlines)
print(btc_processed)

TypeError: expected string or bytes-like object

In [None]:
# Generate the Bitcoin N-grams where N=2
btc_gram_counts = Counter(ngrams(btc_processed, n=2))
print(dict(btc_gram_counts))

In [None]:
# Generate the Ethereum N-grams where N=2
eth_gram_counts = Counter(ngrams(eth_processed, n=2))
print(dict(eth_gram_counts))

In [None]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [None]:
# Use token_count to get the top 10 words for Bitcoin
print(dict(btc_gram_counts.most_common(10)))

In [None]:
# Use token_count to get the top 10 words for Ethereum
print(dict(eth_gram_counts.most_common(10)))

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [14]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
btc_wc = WordCloud().generate(input_text)
plt.imshow(wc)

In [None]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [None]:
import spacy
from spacy import displacy

In [None]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [None]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [None]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [None]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---