# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from newsapi import NewsApiClient
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\antho\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Read your api key environment variable

# YOUR CODE HERE!
load_dotenv()
alpaca_api_key = os.getenv('ALPACA_API_KEY')
alpaca_secret_key = os.getenv('ALPACA_SECRET_KEY')
api_key = os.getenv('NEWS_API_KEY')

In [3]:
# Check api keys
print(type(alpaca_api_key))
print(type(alpaca_secret_key))
print(type(api_key))

<class 'str'>
<class 'str'>
<class 'str'>


In [4]:
# Create a newsapi client
# YOUR CODE HERE!
newsapi = NewsApiClient(api_key=api_key)

In [5]:
# Fetch the Bitcoin news articles
# YOUR CODE HERE!
bitcoin_news_en = newsapi.get_everything(
    q="bitcoin",
    language="en"
)

# Show the total number of news
bitcoin_news_en["totalResults"]

7288

In [6]:
# Fetch the Ethereum news articles
# YOUR CODE HERE!
ethereum_news_en = newsapi.get_everything(
    q="ethereum",
    language="en"
)

# Show the total number of news
ethereum_news_en["totalResults"]

3544

In [7]:
# Create the Bitcoin sentiment scores DataFrame
# YOUR CODE HERE!

#Function to create the bitcoin and ethereum dataframes
def create_df(news, language):
    articles = []
    for article in news:
        try:
            title = article["title"]
            description = article["description"]
            text = article["content"]
            date = article["publishedAt"][:10]

            articles.append({
                "title": title,
                "description": description,
                "text": text,
                "date": date,
                "language": language
            })
        except AttributeError as ae:
            pass

    return pd.DataFrame(articles)

In [8]:
# Bitcoin sentiment scores Dataframe
bitcoin_en_df = create_df(bitcoin_news_en["articles"], "en")

# Ethereum sentiment scores DataFrame
ethereum_en_df = create_df(ethereum_news_en["articles"], "en")

In [9]:
""" Create sentiment score function """

""" BITCOIN """

def get_sentiment(score):
    """
    Calculates the sentiment based on the compound score.
    """
    result = 0  # Neutral by default
    if score >= 0.05:  # Positive
        result = 1
    elif score <= -0.05:  # Negative
        result = -1

    return result

# Sentiment scores dictionaries
title_sent = {
    "title_compound": [],
    "title_pos": [],
    "title_neu": [],
    "title_neg": [],
    "title_sent": [],
}
text_sent = {
    "text_compound": [],
    "text_pos": [],
    "text_neu": [],
    "text_neg": [],
    "text_sent": [],
}

# Get sentiment for the text and the title
for index, row in bitcoin_en_df.iterrows():
    try:
        # Sentiment scoring with VADER
        title_sentiment = analyzer.polarity_scores(row["title"])
        title_sent["title_compound"].append(title_sentiment["compound"])
        title_sent["title_pos"].append(title_sentiment["pos"])
        title_sent["title_neu"].append(title_sentiment["neu"])
        title_sent["title_neg"].append(title_sentiment["neg"])
        title_sent["title_sent"].append(get_sentiment(title_sentiment["compound"]))

        text_sentiment = analyzer.polarity_scores(row["text"])
        text_sent["text_compound"].append(text_sentiment["compound"])
        text_sent["text_pos"].append(text_sentiment["pos"])
        text_sent["text_neu"].append(text_sentiment["neu"])
        text_sent["text_neg"].append(text_sentiment["neg"])
        text_sent["text_sent"].append(get_sentiment(text_sentiment["compound"]))
    except AttributeError:
        pass

# Attaching sentiment columns to the News DataFrame
title_sentiment_df = pd.DataFrame(title_sent)
text_sentiment_df = pd.DataFrame(text_sent)
bitcoin_en_df = bitcoin_en_df.join(title_sentiment_df).join(text_sentiment_df)

In [10]:
""" ETHEREUM """

# Sentiment scores dictionaries
title_sent = {
    "title_compound": [],
    "title_pos": [],
    "title_neu": [],
    "title_neg": [],
    "title_sent": [],
}
text_sent = {
    "text_compound": [],
    "text_pos": [],
    "text_neu": [],
    "text_neg": [],
    "text_sent": [],
}

# Get sentiment for the text and the title
for index, row in ethereum_en_df.iterrows():
    try:
        # Sentiment scoring with VADER
        title_sentiment = analyzer.polarity_scores(row["title"])
        title_sent["title_compound"].append(title_sentiment["compound"])
        title_sent["title_pos"].append(title_sentiment["pos"])
        title_sent["title_neu"].append(title_sentiment["neu"])
        title_sent["title_neg"].append(title_sentiment["neg"])
        title_sent["title_sent"].append(get_sentiment(title_sentiment["compound"]))

        text_sentiment = analyzer.polarity_scores(row["text"])
        text_sent["text_compound"].append(text_sentiment["compound"])
        text_sent["text_pos"].append(text_sentiment["pos"])
        text_sent["text_neu"].append(text_sentiment["neu"])
        text_sent["text_neg"].append(text_sentiment["neg"])
        text_sent["text_sent"].append(get_sentiment(text_sentiment["compound"]))
    except AttributeError:
        pass

# Attaching sentiment columns to the News DataFrame
title_sentiment_df = pd.DataFrame(title_sent)
text_sentiment_df = pd.DataFrame(text_sent)
ethereum_en_df = ethereum_en_df.join(title_sentiment_df).join(text_sentiment_df)

In [23]:
bitcoin_en_df.head(1)['text']

Unnamed: 0,title,description,text,date,language,title_compound,title_pos,title_neu,title_neg,title_sent,text_compound,text_pos,text_neu,text_neg,text_sent
0,"If you’re a Russian YouTuber, how do you get p...",Russian creators are shut off from the global ...,"When Russia invaded Ukraine, Niki Proshin was ...",2022-03-17,en,0.0,0.0,1.0,0.0,0,0.0,0.0,1.0,0.0,0
1,Why Isn't Bitcoin Booming?,"""Bitcoin was seen by many of its libertarian-l...","""Bitcoin was seen by many of its libertarian-l...",2022-03-12,en,0.0,0.0,1.0,0.0,0,-0.7713,0.0,0.831,0.169,-1
2,CRYPTOVERSE-Bitcoin could be laid low by miner...,Bitcoin miners are feeling the heat - and the ...,Feb 22 (Reuters) - Bitcoin miners are feeling ...,2022-02-22,en,-0.2732,0.0,0.792,0.208,-1,-0.1779,0.046,0.887,0.067,-1
3,Cryptoverse: Bitcoin gains conflict currency c...,Bitcoin has leapt since Russia's invasion of U...,March 1 (Reuters) - Bitcoin has leapt since Ru...,2022-03-01,en,0.0258,0.247,0.515,0.237,0,0.0,0.0,1.0,0.0,0
4,War Is Calling Crypto’s ‘Neutrality’ Into Ques...,War in Ukraine and Western sanctions against R...,Whose side is cryptocurrency on? If you had as...,2022-03-08,en,-0.5994,0.0,0.606,0.394,-1,-0.3182,0.055,0.854,0.091,-1


In [12]:
ethereum_en_df.head()

Unnamed: 0,title,description,text,date,language,title_compound,title_pos,title_neu,title_neg,title_sent,text_compound,text_pos,text_neu,text_neg,text_sent
0,Web3 Threatens to Segregate Our Online Lives,Governance tokens seem like a tantalizing solu...,"In February, shit hit the fan in the usual way...",2022-03-01,en,-0.3818,0.0,0.698,0.302,-1,-0.3182,0.059,0.848,0.093,-1
1,Coinbase earnings show trading of ethereum and...,Ethereum trading volume increased from 15% to ...,Coinbase reported that the share of trading vo...,2022-02-25,en,0.0,0.0,1.0,0.0,0,0.6705,0.188,0.812,0.0,1
2,How Ukrainians are fundraising in cryptocurrency,Millions of dollars of cryptocurrency have flo...,Illustration by James Bareham / The Verge\r\n\...,2022-02-26,en,0.0,0.0,1.0,0.0,0,-0.4588,0.0,0.917,0.083,-1
3,How People Actually Make Money From Cryptocurr...,Power traders use “staking” and “yield farming...,"If it sounds too good to be true, youre not wr...",2022-03-13,en,0.0,0.0,1.0,0.0,0,0.834,0.236,0.713,0.05,1
4,What You Need to Know About Ethereum's Role in...,This now-seven-year-old decentralized and open...,"It seems that in 2022, you cant escape from th...",2022-03-03,en,0.0,0.0,1.0,0.0,0,-0.1326,0.0,0.956,0.044,-1


In [13]:
# Describe the Bitcoin Sentiment
# YOUR CODE HERE!
bitcoin_en_df.describe()

Unnamed: 0,title_compound,title_pos,title_neu,title_neg,title_sent,text_compound,text_pos,text_neu,text_neg,text_sent
count,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0
mean,-0.02669,0.07375,0.8298,0.0964,0.0,-0.028205,0.04765,0.90215,0.0501,-0.1
std,0.260131,0.108064,0.188622,0.146133,0.725476,0.401889,0.045747,0.075156,0.062982,0.91191
min,-0.5994,0.0,0.515,0.0,-1.0,-0.7713,0.0,0.739,0.0,-1.0
25%,-0.0193,0.0,0.72825,0.0,-0.25,-0.26705,0.0,0.8525,0.0,-1.0
50%,0.0,0.0,0.8595,0.0,0.0,0.0,0.0515,0.913,0.019,0.0
75%,0.057625,0.16425,1.0,0.21375,0.25,0.32895,0.07575,0.9525,0.085,1.0
max,0.3612,0.301,1.0,0.438,1.0,0.6369,0.152,1.0,0.187,1.0


In [17]:
# Describe the Ethereum Sentiment
# YOUR CODE HERE!
ethereum_en_df.describe()

Unnamed: 0,title_compound,title_pos,title_neu,title_neg,title_sent,text_compound,text_pos,text_neu,text_neg,text_sent
count,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0
mean,0.05706,0.04855,0.91975,0.0317,0.15,0.190925,0.07875,0.89145,0.02975,0.3
std,0.232918,0.082891,0.102651,0.083104,0.67082,0.401422,0.085383,0.10092,0.04006,0.801315
min,-0.3818,0.0,0.698,0.0,-1.0,-0.5267,0.0,0.692,0.0,-1.0
25%,0.0,0.0,0.86375,0.0,0.0,0.0,0.0,0.841,0.0,0.0
50%,0.0,0.0,1.0,0.0,0.0,0.0258,0.0665,0.9105,0.0,0.5
75%,0.201725,0.09775,1.0,0.0,1.0,0.514625,0.1205,1.0,0.06125,1.0
max,0.4588,0.25,1.0,0.302,1.0,0.834,0.249,1.0,0.115,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: Ethereium had the highest mean positive score at 0.078750

Q: Which coin had the highest compound score?

A: Ethereium had the highestest compound score at 0.834

Q. Which coin had the highest positive score?

A:  Ethereium had the highestest positive score of 0.249

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [18]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [21]:
# Instantiate the lemmatizer
# YOUR CODE HERE!
lemmatizer = WordNetLemmatizer()

# Create a list of stopwords
# YOUR CODE HERE!

# Complete the `clean_text` function
def clean_text(article):
    
    # Define a set of stopwords using `stopwords.words()`
    sw = set(stopwords.words('english'))

    # Define the regex parameters
    regex = re.compile("[^a-zA-Z ]")

    # Apply regex parameters to article
    re_clean = regex.sub('', article)

    # Apply `word_tokenize` to the regex scrubbed text
    words = word_tokenize(re_clean)

    # Create list of lower-case words that are not in the stopword set
    output = [word.lower() for word in words if word.lower() not in sw]
    
    # Return the final list
    return output

# Expand the default stopwords list if necessary
# YOUR CODE HERE!
def clean_text(article):
    
    # Define a set of stopwords using `stopwords.words()`
    sw = set(stopwords.words('english'))
    
    # Create custom stopwords
    sw_addons = {'said', 'sent', 'found', 'including', 'today', 'announced', 'week', 'basically', 'also'}

    # Define the regex parameters
    regex = re.compile("[^a-zA-Z ]")

    # Apply regex parameters to article
    re_clean = regex.sub('', article)

    # Apply `word_tokenize` to the regex scrubbed text
    words = word_tokenize(re_clean)

    # Create list of lower-case words that are not in the stopword set
    output = [word.lower() for word in words if word.lower() not in sw.union(sw_addons)]
    
    # Return the final list
    return output

In [18]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text

   
    # Create a tokenized list of the words
    
    
    # Lemmatize words into root words

   
    # Convert the words to lowercase
    
    
    # Remove the stop words
    
    
    return tokens

In [19]:
# Create a new tokens column for Bitcoin
# YOUR CODE HERE!

In [20]:
# Create a new tokens column for Ethereum
# YOUR CODE HERE!

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [21]:
from collections import Counter
from nltk import ngrams

In [22]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [23]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [24]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [25]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [26]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [27]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [28]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [29]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [30]:
import spacy
from spacy import displacy

In [31]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [32]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [33]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [34]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [35]:
# Render the visualization
# YOUR CODE HERE!

In [36]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [37]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [38]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [39]:
# Render the visualization
# YOUR CODE HERE!

In [40]:
# List all Entities
# YOUR CODE HERE!

---