# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from newsapi import NewsApiClient
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\antho\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Read your api key environment variable

# YOUR CODE HERE!
load_dotenv()
alpaca_api_key = os.getenv('ALPACA_API_KEY')
alpaca_secret_key = os.getenv('ALPACA_SECRET_KEY')
api_key = os.getenv('NEWS_API_KEY')

In [3]:
# Check api keys
print(type(alpaca_api_key))
print(type(alpaca_secret_key))
print(type(api_key))

<class 'str'>
<class 'str'>
<class 'str'>


In [4]:
# Create a newsapi client
# YOUR CODE HERE!
newsapi = NewsApiClient(api_key=api_key)

In [5]:
# Fetch the Bitcoin news articles
# YOUR CODE HERE!
bitcoin_news_en = newsapi.get_everything(
    q="bitcoin",
    language="en"
)

# Show the total number of news
bitcoin_news_en["totalResults"]

7667

In [6]:
# Fetch the Ethereum news articles
# YOUR CODE HERE!
ethereum_news_en = newsapi.get_everything(
    q="ethereum",
    language="en"
)

# Show the total number of news
ethereum_news_en["totalResults"]

3772

In [7]:
# Create the Bitcoin sentiment scores DataFrame
# YOUR CODE HERE!

#Function to create the bitcoin and ethereum dataframes
def create_df(news, language):
    articles = []
    for article in news:
        try:
            title = article["title"]
            description = article["description"]
            text = article["content"]
            date = article["publishedAt"][:10]

            articles.append({
                "title": title,
                "description": description,
                "text": text,
                "date": date,
                "language": language
            })
        except AttributeError as ae:
            pass

    return pd.DataFrame(articles)

In [8]:
# Bitcoin sentiment scores Dataframe
bitcoin_en_df = create_df(bitcoin_news_en["articles"], "en")

# Ethereum sentiment scores DataFrame
ethereum_en_df = create_df(ethereum_news_en["articles"], "en")
ethereum_en_df

Unnamed: 0,title,description,text,date,language
0,Web3 Threatens to Segregate Our Online Lives,Governance tokens seem like a tantalizing solu...,"In February, shit hit the fan in the usual way...",2022-03-01,en
1,Coinbase earnings show trading of ethereum and...,Ethereum trading volume increased from 15% to ...,Coinbase reported that the share of trading vo...,2022-02-25,en
2,How Ukrainians are fundraising in cryptocurrency,Millions of dollars of cryptocurrency have flo...,Illustration by James Bareham / The Verge\r\n\...,2022-02-26,en
3,Vitalik Buterin talks about the problems of cr...,The founder of Ethereum confessed his concerns...,His name is Vitalik Buterin and after dedicati...,2022-03-22,en
4,What You Need to Know About Ethereum's Role in...,This now-seven-year-old decentralized and open...,"It seems that in 2022, you cant escape from th...",2022-03-03,en
5,How People Actually Make Money From Cryptocurr...,Power traders use “staking” and “yield farming...,"If it sounds too good to be true, youre not wr...",2022-03-13,en
6,'The Goal Of Crypto Is Not To Play Games With ...,An anonymous reader shares a report: Non-fungi...,Non-fungible tokens have risen in interest and...,2022-03-22,en
7,NFT sales fall to $237 million over the past w...,Total NFT sales volume hit $23 billion over th...,The cryptocurrency boom over the past few year...,2022-03-02,en
8,Politicians Show Their Increasing Interest In ...,A dispatch from a dizzying week at one of Nort...,A version of this article was published in TIM...,2022-02-24,en
9,"Biden to Feds: Figure Out This Crypto Thing, Stat",Joe Biden is dipping his toes into crypto. On ...,Joe Biden is dipping his toes into crypto. On ...,2022-03-09,en


In [9]:
""" Create sentiment score function """

""" BITCOIN """

def get_sentiment(score):
    """
    Calculates the sentiment based on the compound score.
    """
    result = 0  # Neutral by default
    if score >= 0.05:  # Positive
        result = 1
    elif score <= -0.05:  # Negative
        result = -1

    return result

# Sentiment scores dictionaries
title_sent = {
    "title_compound": [],
    "title_pos": [],
    "title_neu": [],
    "title_neg": [],
    "title_sent": [],
}
text_sent = {
    "text_compound": [],
    "text_pos": [],
    "text_neu": [],
    "text_neg": [],
    "text_sent": [],
}

# Get sentiment for the text and the title
for index, row in bitcoin_en_df.iterrows():
    try:
        # Sentiment scoring with VADER
        title_sentiment = analyzer.polarity_scores(row["title"])
        title_sent["title_compound"].append(title_sentiment["compound"])
        title_sent["title_pos"].append(title_sentiment["pos"])
        title_sent["title_neu"].append(title_sentiment["neu"])
        title_sent["title_neg"].append(title_sentiment["neg"])
        title_sent["title_sent"].append(get_sentiment(title_sentiment["compound"]))

        text_sentiment = analyzer.polarity_scores(row["text"])
        text_sent["text_compound"].append(text_sentiment["compound"])
        text_sent["text_pos"].append(text_sentiment["pos"])
        text_sent["text_neu"].append(text_sentiment["neu"])
        text_sent["text_neg"].append(text_sentiment["neg"])
        text_sent["text_sent"].append(get_sentiment(text_sentiment["compound"]))
    except AttributeError:
        pass

# Attaching sentiment columns to the News DataFrame
title_sentiment_df = pd.DataFrame(title_sent)
text_sentiment_df = pd.DataFrame(text_sent)
bitcoin_en_df = bitcoin_en_df.join(title_sentiment_df).join(text_sentiment_df)

In [10]:
""" ETHEREUM """

# Sentiment scores dictionaries
title_sent = {
    "title_compound": [],
    "title_pos": [],
    "title_neu": [],
    "title_neg": [],
    "title_sent": [],
}
text_sent = {
    "text_compound": [],
    "text_pos": [],
    "text_neu": [],
    "text_neg": [],
    "text_sent": [],
}

# Get sentiment for the text and the title
for index, row in ethereum_en_df.iterrows():
    try:
        # Sentiment scoring with VADER
        title_sentiment = analyzer.polarity_scores(row["title"])
        title_sent["title_compound"].append(title_sentiment["compound"])
        title_sent["title_pos"].append(title_sentiment["pos"])
        title_sent["title_neu"].append(title_sentiment["neu"])
        title_sent["title_neg"].append(title_sentiment["neg"])
        title_sent["title_sent"].append(get_sentiment(title_sentiment["compound"]))

        text_sentiment = analyzer.polarity_scores(row["text"])
        text_sent["text_compound"].append(text_sentiment["compound"])
        text_sent["text_pos"].append(text_sentiment["pos"])
        text_sent["text_neu"].append(text_sentiment["neu"])
        text_sent["text_neg"].append(text_sentiment["neg"])
        text_sent["text_sent"].append(get_sentiment(text_sentiment["compound"]))
    except AttributeError:
        pass

# Attaching sentiment columns to the News DataFrame
title_sentiment_df = pd.DataFrame(title_sent)
text_sentiment_df = pd.DataFrame(text_sent)
ethereum_en_df = ethereum_en_df.join(title_sentiment_df).join(text_sentiment_df)

In [11]:
bitcoin_en_df['text'][0]

'When Russia invaded Ukraine, Niki Proshin was already a year into making a living as a vlogger — he had a YouTube channel, a TikTok channel, and an Instagram. He also ran an online Russian club for a… [+5883 chars]'

In [12]:
ethereum_en_df.head()

Unnamed: 0,title,description,text,date,language,title_compound,title_pos,title_neu,title_neg,title_sent,text_compound,text_pos,text_neu,text_neg,text_sent
0,Web3 Threatens to Segregate Our Online Lives,Governance tokens seem like a tantalizing solu...,"In February, shit hit the fan in the usual way...",2022-03-01,en,-0.3818,0.0,0.698,0.302,-1,-0.3182,0.059,0.848,0.093,-1
1,Coinbase earnings show trading of ethereum and...,Ethereum trading volume increased from 15% to ...,Coinbase reported that the share of trading vo...,2022-02-25,en,0.0,0.0,1.0,0.0,0,0.6705,0.188,0.812,0.0,1
2,How Ukrainians are fundraising in cryptocurrency,Millions of dollars of cryptocurrency have flo...,Illustration by James Bareham / The Verge\r\n\...,2022-02-26,en,0.0,0.0,1.0,0.0,0,-0.4588,0.0,0.917,0.083,-1
3,Vitalik Buterin talks about the problems of cr...,The founder of Ethereum confessed his concerns...,His name is Vitalik Buterin and after dedicati...,2022-03-22,en,-0.1027,0.11,0.762,0.129,-1,0.0,0.0,1.0,0.0,0
4,What You Need to Know About Ethereum's Role in...,This now-seven-year-old decentralized and open...,"It seems that in 2022, you cant escape from th...",2022-03-03,en,0.0,0.0,1.0,0.0,0,-0.1326,0.0,0.956,0.044,-1


In [13]:
# Describe the Bitcoin Sentiment
# YOUR CODE HERE!
bitcoin_en_df.describe()

Unnamed: 0,title_compound,title_pos,title_neu,title_neg,title_sent,text_compound,text_pos,text_neu,text_neg,text_sent
count,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0
mean,-0.047845,0.0708,0.8252,0.10395,0.0,0.11179,0.06535,0.8959,0.0388,0.3
std,0.252077,0.101993,0.188772,0.144677,0.725476,0.416309,0.052391,0.074144,0.060936,0.864505
min,-0.5994,0.0,0.515,0.0,-1.0,-0.7713,0.0,0.739,0.0,-1.0
25%,-0.0193,0.0,0.615,0.0,-0.25,-0.032,0.0,0.8525,0.0,-0.25
50%,0.0,0.0,0.8595,0.0,0.0,0.2231,0.0715,0.889,0.0,1.0
75%,0.057625,0.16725,1.0,0.21,0.25,0.430825,0.09275,0.9525,0.052,1.0
max,0.296,0.247,1.0,0.438,1.0,0.6369,0.171,1.0,0.187,1.0


In [14]:
# Describe the Ethereum Sentiment
# YOUR CODE HERE!
ethereum_en_df.describe()

Unnamed: 0,title_compound,title_pos,title_neu,title_neg,title_sent,text_compound,text_pos,text_neu,text_neg,text_sent
count,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0
mean,-0.008795,0.0387,0.91755,0.0438,-0.05,0.233365,0.0854,0.88995,0.0246,0.3
std,0.198917,0.074482,0.104636,0.086559,0.686333,0.428281,0.092623,0.104349,0.037614,0.801315
min,-0.3818,0.0,0.698,0.0,-1.0,-0.5267,0.0,0.682,0.0,-1.0
25%,-0.025675,0.0,0.84475,0.0,-0.25,0.0,0.0,0.841,0.0,0.0
50%,0.0,0.0,1.0,0.0,0.0,0.20095,0.0685,0.9105,0.0,0.5
75%,0.0,0.02275,1.0,0.02825,0.0,0.528675,0.1415,1.0,0.0485,1.0
max,0.4588,0.25,1.0,0.302,1.0,0.8676,0.27,1.0,0.115,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: Ethereium had the highest mean positive score at 0.078750

Q: Which coin had the highest compound score?

A: Ethereium had the highestest compound score at 0.834

Q. Which coin had the highest positive score?

A:  Ethereium had the highestest positive score of 0.249

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [15]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [16]:
# Instantiate the lemmatizer
# YOUR CODE HERE!
lemmatizer = WordNetLemmatizer()

In [17]:
# Create a list of stopwords
# YOUR CODE HERE!

mystopwords = ['said', 'sent', 'found', 'including', 'today', 'announced', 'week', 'basically', 'also']

# Create a list of stopwords
stop = stopwords.words('english')

# Expand the default stopwords list if necessary
stop.append("u")
stop.append("it")
stop.append("!")
stop.append("(")
stop.append(")")
stop.append("/")
stop.append("/")
stop.append("-")
            
stop = set(stop)


# Expand the default stopwords list if necessary
# YOUR CODE HERE!

stopwords_expanded = ()

In [18]:
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [19]:
# Complete the tokenizer function
def tokenizer(text):
    
    """Tokenizes text."""
    
    # Create a tokenized list of the words
    words = word_tokenize(text)
                          
    # Remove the punctuation from text 
    words = list(filter(lambda t: t not in punctuation, words))
    
    # Lemmatize words into root word
    words = [lemmatizer.lemmatize(word) for word in words]
    
    # Convert the words to lowercase
    words = list(filter(lambda w: w.lower(), words))
             
    # Remove the stop words 
    output = [word for word in ['description'] if not word in stopwords.words()]
    
    # # Lowercase
    # output = [word.lower() for word in ['description'] if word.lower() not in words]
    
    return words

In [20]:
# Create a new tokens column for Bitcoin
# YOUR CODE HERE!

bitcoin_en_df['token_text'] = bitcoin_en_df.text.apply(tokenizer)

In [21]:
bitcoin_en_df

Unnamed: 0,title,description,text,date,language,title_compound,title_pos,title_neu,title_neg,title_sent,text_compound,text_pos,text_neu,text_neg,text_sent,token_text
0,"If you’re a Russian YouTuber, how do you get p...",Russian creators are shut off from the global ...,"When Russia invaded Ukraine, Niki Proshin was ...",2022-03-17,en,0.0,0.0,1.0,0.0,0,0.0,0.0,1.0,0.0,0,"[When, Russia, invaded, Ukraine, Niki, Proshin..."
1,Why Isn't Bitcoin Booming?,"""Bitcoin was seen by many of its libertarian-l...","""Bitcoin was seen by many of its libertarian-l...",2022-03-12,en,0.0,0.0,1.0,0.0,0,-0.7713,0.0,0.831,0.169,-1,"[``, Bitcoin, wa, seen, by, many, of, it, libe..."
2,Cryptoverse: Remember when bitcoin was 'anonym...,Bitcoin just isn't anonymous enough for a grow...,March 22 (Reuters) - Bitcoin just isn't anonym...,2022-03-22,en,0.0,0.0,1.0,0.0,0,0.6369,0.171,0.829,0.0,1,"[March, 22, Reuters, Bitcoin, just, is, n't, a..."
3,Cryptoverse: Bitcoin gains conflict currency c...,Bitcoin has leapt since Russia's invasion of U...,March 1 (Reuters) - Bitcoin has leapt since Ru...,2022-03-01,en,0.0258,0.247,0.515,0.237,0,0.0,0.0,1.0,0.0,0,"[March, 1, Reuters, Bitcoin, ha, leapt, since,..."
4,War Is Calling Crypto’s ‘Neutrality’ Into Ques...,War in Ukraine and Western sanctions against R...,Whose side is cryptocurrency on? If you had as...,2022-03-08,en,-0.5994,0.0,0.606,0.394,-1,-0.3182,0.055,0.854,0.091,-1,"[Whose, side, is, cryptocurrency, on, If, you,..."
5,Cryptocurrency Donations Pour Into Ukraine. Th...,"Nonfungible Tidbits: All the bitcoin, cryptocu...",Getty\r\nWelcome to Nonfungible Tidbits. Our f...,2022-03-05,en,0.0,0.0,1.0,0.0,0,-0.6808,0.074,0.739,0.187,-1,"[Getty, Welcome, to, Nonfungible, Tidbits, Our..."
6,Is the US Developing a Digital Dollar? This We...,"Nonfungible Tidbits: All the bitcoin, cryptocu...",Here's what happened this week in the crypto w...,2022-03-19,en,0.2023,0.13,0.87,0.0,1,0.4588,0.081,0.919,0.0,1,"[Here, 's, what, happened, this, week, in, the..."
7,Bitcoin and Ether Are Helping Fund Ukraine's R...,Nearly $20 million has been raised in cryptocu...,Nurphoto/Getty\r\nAs Russia launched an invasi...,2022-02-28,en,0.296,0.216,0.784,0.0,1,0.128,0.048,0.952,0.0,1,"[Nurphoto/Getty, As, Russia, launched, an, inv..."
8,Cryptocurrencies in a time of war - Reuters.com,Cryptocurrencies have been close to the headli...,"LONDON, March 4 (Reuters) - Cryptocurrencies h...",2022-03-04,en,-0.5994,0.0,0.562,0.438,-1,-0.128,0.0,0.954,0.046,-1,"[LONDON, March, 4, Reuters, Cryptocurrencies, ..."
9,"Cryptoverse: Bitcoin's scared of commitment, M...",Bitcoin loves flirting with the mainstream. Bu...,March 15 (Reuters) - Bitcoin loves flirting wi...,2022-03-15,en,-0.0772,0.226,0.522,0.252,-1,0.3182,0.098,0.864,0.038,1,"[March, 15, Reuters, Bitcoin, love, flirting, ..."


In [22]:
# # Create a new tokens column for Ethereum
# # YOUR CODE HERE!

ethereum_en_df['token_text'] = ethereum_en_df.text.apply(tokenizer)
ethereum_en_df.head()

Unnamed: 0,title,description,text,date,language,title_compound,title_pos,title_neu,title_neg,title_sent,text_compound,text_pos,text_neu,text_neg,text_sent,token_text
0,Web3 Threatens to Segregate Our Online Lives,Governance tokens seem like a tantalizing solu...,"In February, shit hit the fan in the usual way...",2022-03-01,en,-0.3818,0.0,0.698,0.302,-1,-0.3182,0.059,0.848,0.093,-1,"[In, February, shit, hit, the, fan, in, the, u..."
1,Coinbase earnings show trading of ethereum and...,Ethereum trading volume increased from 15% to ...,Coinbase reported that the share of trading vo...,2022-02-25,en,0.0,0.0,1.0,0.0,0,0.6705,0.188,0.812,0.0,1,"[Coinbase, reported, that, the, share, of, tra..."
2,How Ukrainians are fundraising in cryptocurrency,Millions of dollars of cryptocurrency have flo...,Illustration by James Bareham / The Verge\r\n\...,2022-02-26,en,0.0,0.0,1.0,0.0,0,-0.4588,0.0,0.917,0.083,-1,"[Illustration, by, James, Bareham, The, Verge,..."
3,Vitalik Buterin talks about the problems of cr...,The founder of Ethereum confessed his concerns...,His name is Vitalik Buterin and after dedicati...,2022-03-22,en,-0.1027,0.11,0.762,0.129,-1,0.0,0.0,1.0,0.0,0,"[His, name, is, Vitalik, Buterin, and, after, ..."
4,What You Need to Know About Ethereum's Role in...,This now-seven-year-old decentralized and open...,"It seems that in 2022, you cant escape from th...",2022-03-03,en,0.0,0.0,1.0,0.0,0,-0.1326,0.0,0.956,0.044,-1,"[It, seems, that, in, 2022, you, cant, escape,..."


---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [25]:
from collections import Counter
from nltk import ngrams
import pandas as pd


# tokens = bitcoin_en_df['token_text'][0]
# tokens 

In [31]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!


token_text_df = bitcoin_en_df['token_text']
token_text_df = pd.DataFrame(token_text_df)

print(type(token_text_df))
print(type(bitcoin_en_df))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>


In [32]:
token_text_df.head()

In [51]:
N = 2
grams = ngrams(tokenizer(bitcoin_en_df.text.str.cat()), N)
Counter(grams).most_common(20)

[(('in', 'the'), 4),
 (('March', '22'), 4),
 (('22', 'Reuters'), 4),
 (('Reuters', 'Bitcoin'), 4),
 (('of', 'the'), 4),
 (('char', 'March'), 3),
 (('with', 'the'), 3),
 (('since', 'Russia'), 3),
 (('Russia', "'s"), 3),
 (('Welcome', 'to'), 3),
 (('this', 'week'), 3),
 (('a', 'the'), 3),
 (('char', 'SAN'), 3),
 (('SAN', 'SALVADOR'), 3),
 (('SALVADOR', 'March'), 3),
 (('Reuters', 'El'), 3),
 (('El', 'Salvador'), 3),
 (('bitcoin-backed', 'bond'), 3),
 (('Russia', 'invaded'), 2),
 (('invaded', 'Ukraine'), 2)]

In [47]:
N = 2
grams = ngrams(tokenizer(ethereum_en_df.text.str.cat()), N)
Counter(grams).most_common(20)

[(('of', 'the'), 8),
 (('over', 'the'), 7),
 (('char', 'The'), 5),
 (('of', 'this'), 3),
 (('The', 'cryptocurrency'), 3),
 (('cryptocurrency', 'boom'), 3),
 (('boom', 'over'), 3),
 (('the', 'past'), 3),
 (('past', 'few'), 3),
 (('few', 'year'), 3),
 (('year', 'ha'), 3),
 (('ha', 'helped'), 3),
 (('helped', 'propel'), 3),
 (('propel', 'a'), 3),
 (('a', 'newer'), 3),
 (('newer', 'market'), 3),
 (('market', 'to'), 3),
 (('to', 'record'), 3),
 (('record', 'height'), 3),
 (('height', 'digital'), 3)]

In [50]:
bitcoin_en_df.text.str.cat()

'When Russia invaded Ukraine, Niki Proshin was already a year into making a living as a vlogger — he had a YouTube channel, a TikTok channel, and an Instagram. He also ran an online Russian club for a… [+5883 chars]"Bitcoin was seen by many of its libertarian-leaning fans as a kind of doomsday insurance," argues a columnist in the New York Times, "a form of \'digital gold\' that would be a source of stability as … [+3914 chars]March 22 (Reuters) - Bitcoin just isn\'t anonymous enough for a growing cohort of crypto users who are seeking greater seclusion.\r\nA volatile class of crypto known as privacy coins, created with the p… [+4776 chars]March 1 (Reuters) - Bitcoin has leapt since Russia\'s invasion of Ukraine, bolstered by people in those countries looking to store and move money in anonymous and decentralised crypto.\r\nBitcoin tradin… [+3955 chars]Whose side is cryptocurrency on? If you had asked Satoshi Nakamoto, the pseudonymous person (or persons) who created the Bitcoin platfo

In [35]:
# Define preprocess function

def process_text(doc):
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', doc)
    words = word_tokenize(re_clean)
    lem = [lemmatizer.lemmatize(word) for word in words]
    output = [word.lower() for word in lem if word.lower() not in sw]
    return output

 # Define the counter function
def word_counter(token_text_df): 
    # Combine all articles in corpus into one large string
    big_string = ' '.join(token_text_df)
    processed = process_text(big_string)
    top_10 = dict(Counter(processed).most_common(20))
    return pd.DataFrame(list(top_10.items()), columns=['word', 'count'])

In [36]:
word_counter(token_text_df)

Unnamed: 0,word,count
0,tokentext,1


In [37]:
def bigram_counter(token_text_df):
    big_string = ' '.join(token_text_df)
    processed = process_text(big_string)
    bigrams = ngrams(processed, n=2)
    top_10 = dict(Counter(bigrams).most_common(10))
    return pd.DataFrame(list(top_10.items()), columns=['bigram','count'])

In [39]:
bigram_counter(token_text_df)

Unnamed: 0,bigram,count


In [None]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

ethereum_en_df['token_text'].value_counts()
word_counter(ethereum_en_df)

In [None]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [None]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [None]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [None]:
import spacy
from spacy import displacy

In [None]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [None]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [None]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [None]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---