# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [36]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import re
import string
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
from newsapi.newsapi_client import NewsApiClient
from datetime import datetime, timedelta

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\erikl\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
load_dotenv()

True

In [3]:
# Read your api key environment variable
api_key = os.getenv('NEWS_API_KEY')

In [4]:
type(api_key)

str

In [5]:
# Create a newsapi client
newsapi = NewsApiClient(api_key)

In [6]:
type(newsapi)

newsapi.newsapi_client.NewsApiClient

In [7]:
newsapi

<newsapi.newsapi_client.NewsApiClient at 0x1ff903a8c48>

In [8]:
# Find the Current Date Time
current_date = pd.Timestamp(datetime.now(), tz="America/New_York").isoformat()
print(current_date)

2021-09-20T19:10:05.603956-04:00


In [9]:
# Find the Past Date from 24 hours ago
past24hr_date = pd.Timestamp(datetime.now() - timedelta(hours=24), tz="America/New_York").isoformat()
print(past24hr_date)

2021-09-19T19:10:09.037112-04:00


In [10]:
# Checking for the correct datetime format:
test_date = datetime.strptime(current_date[:19], "%Y-%m-%dT%H:%M:%S")
print(test_date)

2021-09-20 19:10:05


In [15]:
# Create a Function for Fetching News:
def get_articles(keyword):
    all_headlines = []
    all_datetime=[]
    all_descriptions=[]
    all_urls=[]
    all_content=[]
    date = datetime.strptime(current_date[:19], "%Y-%m-%dT%H:%M:%S")
    end_date = datetime.strptime(past24hr_date[:19], "%Y-%m-%dT%H:%M:%S")
    print(f"Fetching news about '{keyword}'")
    print("*" * 30)
    if date > end_date:
        print(f"retrieving news from: {date}")
        articles = newsapi.get_everything(
            q=keyword,
            from_param=str(end_date),
            to=str(date),
            language="en",
#             page_size=100,
            sort_by="relevancy",
            page=1,
            )
#         headlines=[]
        for i in range(0, len(articles["articles"])):
            all_headlines.append(articles["articles"][i]["title"])
            all_datetime.append(articles["articles"][i]["publishedAt"])
            all_descriptions.append(articles["articles"][i]["description"])
            all_urls.append(articles["articles"][i]["url"])
            all_content.append(articles["articles"][i]["content"])
    
    article_df = pd.concat([pd.Series(all_headlines), pd.Series(all_datetime), pd.Series(all_urls), pd.Series(all_descriptions), pd.Series(all_content)], axis=1)
    article_df.rename({0:'Headlines', 1:'Date_Time', 2:'URL', 3:'Description', 4:'Content'}, axis=1, inplace=True)
    return article_df

## Analyzing Bitcoin Sentiment:

In [16]:
# Fetch the Bitcoin news articles
btc_articles_df = get_articles("bitcoin")

Fetching news about 'bitcoin'
******************************
retrieving news from: 2021-09-20 19:10:05


In [17]:
btc_articles_df.shape

(20, 5)

In [18]:
btc_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content
0,"El Salvador buys 150 more bitcoins, president ...",2021-09-20T12:24:00Z,https://www.reuters.com/business/finance/el-sa...,"El Salvador has bought 150 more bitcoins, Pres...",A representation of cryptocurrency Bitcoin is ...
1,"Bitcoin Falls Below $43,000 as Global Market R...",2021-09-20T12:58:57Z,https://ca.finance.yahoo.com/news/bitcoin-fall...,"<ol><li>Bitcoin Falls Below $43,000 as Global ...",(Bloomberg) -- Cryptocurrency prices slumped a...
2,"Bitcoin slides below $45,000 in a broad crypto...",2021-09-20T11:44:54Z,https://markets.businessinsider.com/news/curre...,Some analysts attributed the sudden dip to emb...,Yuriko Nakao/Getty Images\r\nBitcoin fell belo...
3,7 Best Cryptos to Buy During Altcoin Season,2021-09-20T15:28:43Z,https://investorplace.com/2021/09/7-best-crypt...,I’ve been on the sidelines with cryptocurrenci...,Ive been on the sidelines with cryptocurrencie...
4,"Following SEC lawsuit threat, Coinbase cancels...",2021-09-20T17:25:32Z,http://techcrunch.com/2021/09/20/following-sec...,Coinbase efforts to play hardball with the Sec...,Coinbase efforts to play hardball with the Sec...


In [20]:
btc_articles_df.loc[0]['Content']

'A representation of cryptocurrency Bitcoin is seen in this illustration taken August 6, 2021. REUTERS/Dado Ruvic/IllustrationSept 20 (Reuters) - El Salvador has bought 150 more bitcoins, President Na… [+485 chars]'

In [28]:
btc_articles_df.loc[0]['Description']

"El Salvador has bought 150 more bitcoins, President Nayib Bukele announced, taking the Central American country's holdings of the volatile cryptocurrency to 700 coins."

In [29]:
btc_articles_df.loc[0]['Headlines']

'El Salvador buys 150 more bitcoins, president says - Reuters'

In [25]:
btc_articles_df.loc[3]['Content']

'Ive been on the sidelines with cryptocurrencies, but its an asset class that is fascinating for many reasons. For many crypto investors, the conversation about best cryptos begins and ends with\xa0Bitco… [+8204 chars]'

In [26]:
btc_articles_df.loc[3]['Description']

'I’ve been on the sidelines with cryptocurrencies, but it’s an asset class that is fascinating for many reasons. For many crypto investors, the conversation about best cryptos begins and ends with\xa0Bitcoin\xa0(CCC:BTC-USD). However, it’s becoming increasingly appa…'

In [27]:
btc_articles_df.loc[3]['Headlines']

'7 Best Cryptos to Buy During Altcoin Season'

### Clean the Text:

In [53]:
def clean_text(text):
#     regex = re.compile('[%s]' % re.escape(string.punctuation))
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', text)
    return re_clean

In [54]:
test_clean = clean_text(btc_articles_df.loc[3]['Description'])

In [55]:
type(test_clean)

str

In [56]:
test_clean

'Ive been on the sidelines with cryptocurrencies but its an asset class that is fascinating for many reasons For many crypto investors the conversation about best cryptos begins and ends withBitcoinCCCBTCUSD However its becoming increasingly appa'

In [58]:
# Lambda function of "clean_text()" for use with apply
text_cleaner = lambda x: clean_text(x)

In [59]:
# Review updated text:
btc_clean = pd.DataFrame(btc_articles_df['Description'].apply(text_cleaner))
btc_clean

Unnamed: 0,Description
0,El Salvador has bought more bitcoins Presiden...
1,olliBitcoin Falls Below as Global Market Rout...
2,Some analysts attributed the sudden dip to emb...
3,Ive been on the sidelines with cryptocurrencie...
4,Coinbase efforts to play hardball with the Sec...
5,Blockchain is a digital database that secures ...
6,The offshore Chinese yuan weakened versus the ...
7,The total crypto complex was down over billio...
8,FOREXEvergrande worries drag risk FX lower dol...
9,It is not a surprise that cryptocurrency trans...


In [62]:
# Function to get compound sentiments
def get_compound_sentiment(content):
    sentiment = analyzer.polarity_scores(content)
    compound = sentiment["compound"]
    return compound
get_compoundScore = lambda x: get_compound_sentiment(x)

In [63]:
btc_articles_df['Compound Sentiment'] = btc_articles_df['Description'].apply(get_compoundScore)
btc_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content,Compound Sentiment
0,"El Salvador buys 150 more bitcoins, president ...",2021-09-20T12:24:00Z,https://www.reuters.com/business/finance/el-sa...,"El Salvador has bought 150 more bitcoins, Pres...",A representation of cryptocurrency Bitcoin is ...,0.0
1,"Bitcoin Falls Below $43,000 as Global Market R...",2021-09-20T12:58:57Z,https://ca.finance.yahoo.com/news/bitcoin-fall...,"<ol><li>Bitcoin Falls Below $43,000 as Global ...",(Bloomberg) -- Cryptocurrency prices slumped a...,-0.4404
2,"Bitcoin slides below $45,000 in a broad crypto...",2021-09-20T11:44:54Z,https://markets.businessinsider.com/news/curre...,Some analysts attributed the sudden dip to emb...,Yuriko Nakao/Getty Images\r\nBitcoin fell belo...,-0.3612
3,7 Best Cryptos to Buy During Altcoin Season,2021-09-20T15:28:43Z,https://investorplace.com/2021/09/7-best-crypt...,I’ve been on the sidelines with cryptocurrenci...,Ive been on the sidelines with cryptocurrencie...,0.9413
4,"Following SEC lawsuit threat, Coinbase cancels...",2021-09-20T17:25:32Z,http://techcrunch.com/2021/09/20/following-sec...,Coinbase efforts to play hardball with the Sec...,Coinbase efforts to play hardball with the Sec...,0.5574


In [64]:
# Function to get positive sentiments
def get_positive_sentiment(content):
    sentiment = analyzer.polarity_scores(content)
    positive = sentiment["pos"]
    return positive
get_positiveScore = lambda x: get_positive_sentiment(x)

In [65]:
btc_articles_df['Positive Sentiment'] = btc_articles_df['Description'].apply(get_positiveScore)
btc_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content,Compound Sentiment,Positive Sentiment
0,"El Salvador buys 150 more bitcoins, president ...",2021-09-20T12:24:00Z,https://www.reuters.com/business/finance/el-sa...,"El Salvador has bought 150 more bitcoins, Pres...",A representation of cryptocurrency Bitcoin is ...,0.0,0.0
1,"Bitcoin Falls Below $43,000 as Global Market R...",2021-09-20T12:58:57Z,https://ca.finance.yahoo.com/news/bitcoin-fall...,"<ol><li>Bitcoin Falls Below $43,000 as Global ...",(Bloomberg) -- Cryptocurrency prices slumped a...,-0.4404,0.0
2,"Bitcoin slides below $45,000 in a broad crypto...",2021-09-20T11:44:54Z,https://markets.businessinsider.com/news/curre...,Some analysts attributed the sudden dip to emb...,Yuriko Nakao/Getty Images\r\nBitcoin fell belo...,-0.3612,0.0
3,7 Best Cryptos to Buy During Altcoin Season,2021-09-20T15:28:43Z,https://investorplace.com/2021/09/7-best-crypt...,I’ve been on the sidelines with cryptocurrenci...,Ive been on the sidelines with cryptocurrencie...,0.9413,0.283
4,"Following SEC lawsuit threat, Coinbase cancels...",2021-09-20T17:25:32Z,http://techcrunch.com/2021/09/20/following-sec...,Coinbase efforts to play hardball with the Sec...,Coinbase efforts to play hardball with the Sec...,0.5574,0.108


In [66]:
# Function to get neutral sentiments
def get_neutral_sentiment(content):
    sentiment = analyzer.polarity_scores(content)
    neutral = sentiment["neu"]
    return neutral
get_neutralScore = lambda x: get_neutral_sentiment(x)

In [67]:
btc_articles_df['Neutral Sentiment'] = btc_articles_df['Description'].apply(get_neutralScore)
btc_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content,Compound Sentiment,Positive Sentiment,Neutral Sentiment
0,"El Salvador buys 150 more bitcoins, president ...",2021-09-20T12:24:00Z,https://www.reuters.com/business/finance/el-sa...,"El Salvador has bought 150 more bitcoins, Pres...",A representation of cryptocurrency Bitcoin is ...,0.0,0.0,1.0
1,"Bitcoin Falls Below $43,000 as Global Market R...",2021-09-20T12:58:57Z,https://ca.finance.yahoo.com/news/bitcoin-fall...,"<ol><li>Bitcoin Falls Below $43,000 as Global ...",(Bloomberg) -- Cryptocurrency prices slumped a...,-0.4404,0.0,0.923
2,"Bitcoin slides below $45,000 in a broad crypto...",2021-09-20T11:44:54Z,https://markets.businessinsider.com/news/curre...,Some analysts attributed the sudden dip to emb...,Yuriko Nakao/Getty Images\r\nBitcoin fell belo...,-0.3612,0.0,0.865
3,7 Best Cryptos to Buy During Altcoin Season,2021-09-20T15:28:43Z,https://investorplace.com/2021/09/7-best-crypt...,I’ve been on the sidelines with cryptocurrenci...,Ive been on the sidelines with cryptocurrencie...,0.9413,0.283,0.717
4,"Following SEC lawsuit threat, Coinbase cancels...",2021-09-20T17:25:32Z,http://techcrunch.com/2021/09/20/following-sec...,Coinbase efforts to play hardball with the Sec...,Coinbase efforts to play hardball with the Sec...,0.5574,0.108,0.892


In [68]:
# Function to get negative sentiments
def get_negative_sentiment(content):
    sentiment = analyzer.polarity_scores(content)
    negative = sentiment["neg"]
    return negative
get_negativeScore = lambda x: get_negative_sentiment(x)

In [69]:
btc_articles_df['Negative Sentiment'] = btc_articles_df['Description'].apply(get_negativeScore)
btc_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content,Compound Sentiment,Positive Sentiment,Neutral Sentiment,Negative Sentiment
0,"El Salvador buys 150 more bitcoins, president ...",2021-09-20T12:24:00Z,https://www.reuters.com/business/finance/el-sa...,"El Salvador has bought 150 more bitcoins, Pres...",A representation of cryptocurrency Bitcoin is ...,0.0,0.0,1.0,0.0
1,"Bitcoin Falls Below $43,000 as Global Market R...",2021-09-20T12:58:57Z,https://ca.finance.yahoo.com/news/bitcoin-fall...,"<ol><li>Bitcoin Falls Below $43,000 as Global ...",(Bloomberg) -- Cryptocurrency prices slumped a...,-0.4404,0.0,0.923,0.077
2,"Bitcoin slides below $45,000 in a broad crypto...",2021-09-20T11:44:54Z,https://markets.businessinsider.com/news/curre...,Some analysts attributed the sudden dip to emb...,Yuriko Nakao/Getty Images\r\nBitcoin fell belo...,-0.3612,0.0,0.865,0.135
3,7 Best Cryptos to Buy During Altcoin Season,2021-09-20T15:28:43Z,https://investorplace.com/2021/09/7-best-crypt...,I’ve been on the sidelines with cryptocurrenci...,Ive been on the sidelines with cryptocurrencie...,0.9413,0.283,0.717,0.0
4,"Following SEC lawsuit threat, Coinbase cancels...",2021-09-20T17:25:32Z,http://techcrunch.com/2021/09/20/following-sec...,Coinbase efforts to play hardball with the Sec...,Coinbase efforts to play hardball with the Sec...,0.5574,0.108,0.892,0.0


In [74]:
# Save BTC Articles data to CSV:
btc_articles_df.to_csv("btc_articles_df.csv")

## Ethereum Sentiment Analysis

In [71]:
# Fetch the Ethereum news articles
eth_articles_df = get_articles("ethereum")

Fetching news about 'ethereum'
******************************
retrieving news from: 2021-09-20 19:10:05


NewsAPIException: {'status': 'error', 'code': 'rateLimited', 'message': 'You have made too many requests recently. Developer accounts are limited to 100 requests over a 24 hour period (50 requests available every 12 hours). Please upgrade to a paid plan if you need more requests.'}

In [72]:
# Create the Bitcoin sentiment scores DataFrame
btc_sentiment_df = btc_articles_df[['Compound Sentiment', 'Positive Sentiment', 'Neutral Sentiment', 'Negative Sentiment']].copy(deep=True)
btc_sentiment_df

Unnamed: 0,Compound Sentiment,Positive Sentiment,Neutral Sentiment,Negative Sentiment
0,0.0,0.0,1.0,0.0
1,-0.4404,0.0,0.923,0.077
2,-0.3612,0.0,0.865,0.135
3,0.9413,0.283,0.717,0.0
4,0.5574,0.108,0.892,0.0
5,0.1655,0.113,0.887,0.0
6,-0.7717,0.0,0.806,0.194
7,-0.2023,0.117,0.743,0.14
8,-0.6369,0.157,0.337,0.506
9,-0.8081,0.0,0.799,0.201


In [7]:
# Create the Ethereum sentiment scores DataFrame
# YOUR CODE HERE!

In [73]:
# Describe the Bitcoin Sentiment
btc_sentiment_df.describe()

Unnamed: 0,Compound Sentiment,Positive Sentiment,Neutral Sentiment,Negative Sentiment
count,20.0,20.0,20.0,20.0
mean,-0.176505,0.0692,0.80605,0.12475
std,0.507021,0.082157,0.148498,0.133025
min,-0.8081,0.0,0.337,0.0
25%,-0.59985,0.0,0.7385,0.0
50%,-0.28175,0.037,0.8225,0.089
75%,0.080075,0.12025,0.8955,0.1895
max,0.9413,0.283,1.0,0.506


In [9]:
# Describe the Ethereum Sentiment
# YOUR CODE HERE!

### Questions:

Q: Which coin had the highest mean positive score?

A: 

Q: Which coin had the highest compound score?

A: 

Q. Which coin had the highest positive score?

A: 

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [10]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [11]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

# Create a list of stopwords
sw = set(stopwords.words('english'))

# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [12]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', doc)
   
    # Create a tokenized list of the words
    words = word_tokenize(re_clean)
    
    # Lemmatize words into root words
    lem = [lemmatizer.lemmatize(word) for word in words]
   
    # Convert the words to lowercase & Remove the stop words
    output = [word.lower() for word in lem if word.lower() not in sw]
    
    tokens = ' '.join(output)
    
    return tokens

# Below asks for a column?  Where or What dataframe are we adding this column too?????

In [13]:
# Create a new tokens column for Bitcoin
btc_tokens = tokenizer()

In [14]:
# Create a new tokens column for Ethereum
# YOUR CODE HERE!

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [76]:
from collections import Counter
from nltk import ngrams
import spacy

# Load the English language model for spaCy
nlp = spacy.load("en_core_web_sm")

In [None]:
def create_bigrams(text):
    bigrams = ngrams(text, 2)
    output = ['_'.join(i) for i in bigrams]
    return ' '.join(output)

In [16]:
# Generate the Bitcoin N-grams where N=2
btc_bigrams = create_bigrams(text)

In [17]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [18]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [19]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [78]:
# Get a list of Adjectives, Nouns, and Proper Nouns from text.  Returns each word with a count.
def most_freq_words(text):
    """
    This function gets all of the adjectives and nouns in the text.
    Args:  text (string):  The text to analyze
    Returns:  most_common_word(list):  A list with all Adjectives, Nouns, and Proper Nouns
    """
    # Tokenizes text and parse each token
    doc = nlp(text)
    
    # Creates a list with all the adjectives in the text
    words = [token.text.lower() for token in doc if ((token.pos_ == 'ADJ') or (token.pos_ == 'PROPN') or (token.pos_ == 'NOUN'))]
    
    # Retrieves the most frequent adjective in the `adjs` list using the Counter module
    most_common_word = Counter( words).most_common(1)
    
    return most_common_word

count_words = lambda x: most_freq_words(x)

In [79]:
# Create a list most common words
word_count = btc_articles_df['Description'].apply(count_words)

In [83]:
# Display Sample
print(word_count[:10])

0              [(el, 1)]
1         [(bitcoin, 2)]
2        [(analysts, 1)]
3            [(many, 2)]
4        [(exchange, 2)]
5      [(blockchain, 1)]
6        [(offshore, 1)]
7           [(total, 1)]
8           [(forex, 1)]
9    [(transactions, 2)]
Name: Description, dtype: object


In [81]:
type(word_count)

pandas.core.series.Series

#### Use the "most_common()" function from the Counter module to fetcht the the 10 most frequent words in the articles.  
The "most_common()" function returns a Python list that can be stored in the variable most_frequent_words.

In [84]:
# Retreive the most frequent words:
most_frequent_words = Counter(word_count).most_common(10)
print(most_frequent_words)

TypeError: unhashable type: 'list'

In [None]:
# Review functionality of the most common word for each article:


In [20]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [21]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [22]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [24]:
import spacy
from spacy import displacy

In [25]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [26]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [27]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [28]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [29]:
# Render the visualization
# YOUR CODE HERE!

In [30]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [31]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [32]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [33]:
# Render the visualization
# YOUR CODE HERE!

In [34]:
# List all Entities
# YOUR CODE HERE!

---