# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import re
import string
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
from newsapi.newsapi_client import NewsApiClient
from datetime import datetime, timedelta

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\erikl\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
load_dotenv()

True

In [3]:
# Read your api key environment variable
api_key = os.getenv('NEWS_API_KEY')

In [4]:
type(api_key)

str

In [5]:
# Create a newsapi client
newsapi = NewsApiClient(api_key)

In [6]:
type(newsapi)

newsapi.newsapi_client.NewsApiClient

In [7]:
newsapi

<newsapi.newsapi_client.NewsApiClient at 0x1f5d9cf8b88>

In [8]:
# Find the Current Date Time
current_date = pd.Timestamp(datetime.now(), tz="America/New_York").isoformat()
print(current_date)

2021-09-25T01:11:08.629439-04:00


In [9]:
# Find the Past Date from 24 hours ago
past24hr_date = pd.Timestamp(datetime.now() - timedelta(hours=24), tz="America/New_York").isoformat()
print(past24hr_date)

2021-09-24T01:11:11.433209-04:00


In [10]:
# Checking for the correct datetime format:
test_date = datetime.strptime(current_date[:19], "%Y-%m-%dT%H:%M:%S")
print(test_date)

2021-09-25 01:11:08


In [11]:
# Create a Function for Fetching News:
def get_articles(keyword):
    all_headlines = []
    all_datetime=[]
    all_descriptions=[]
    all_urls=[]
    all_content=[]
    date = datetime.strptime(current_date[:19], "%Y-%m-%dT%H:%M:%S")
    end_date = datetime.strptime(past24hr_date[:19], "%Y-%m-%dT%H:%M:%S")
    print(f"Fetching news about '{keyword}'")
    print("*" * 30)
    if date > end_date:
        print(f"retrieving news from: {date}")
        articles = newsapi.get_everything(
            q=keyword,
            from_param=str(end_date),
            to=str(date),
            language="en",
#             page_size=100,
            sort_by="relevancy",
            page=1,
            )
#         headlines=[]
        for i in range(0, len(articles["articles"])):
            all_headlines.append(articles["articles"][i]["title"])
            all_datetime.append(articles["articles"][i]["publishedAt"])
            all_descriptions.append(articles["articles"][i]["description"])
            all_urls.append(articles["articles"][i]["url"])
            all_content.append(articles["articles"][i]["content"])
    
    article_df = pd.concat([pd.Series(all_headlines), pd.Series(all_datetime), pd.Series(all_urls), pd.Series(all_descriptions), pd.Series(all_content)], axis=1)
    article_df.rename({0:'Headlines', 1:'Date_Time', 2:'URL', 3:'Description', 4:'Content'}, axis=1, inplace=True)
    return article_df

## Analyzing Bitcoin Sentiment:

In [12]:
# Fetch the Bitcoin news articles
btc_articles_df = get_articles("bitcoin")

Fetching news about 'bitcoin'
******************************
retrieving news from: 2021-09-25 01:11:08


In [13]:
btc_articles_df.shape

(20, 5)

In [47]:
btc_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content,Compound Sentiment,Positive Sentiment,Neutral Sentiment,Negative Sentiment
0,China’s central bank bans cryptocurrency trans...,2021-09-24T16:22:55Z,https://www.theverge.com/2021/9/24/22691472/ch...,China’s central bank on Friday said cryptocurr...,Its the countrys latest crackdown on digital c...,-0.3549,0.07,0.789,0.14
1,Twitter will allow people to tip their favorit...,2021-09-24T12:14:00Z,https://techncruncher.blogspot.com/2021/09/twi...,Twitter will now allow people to tip their fav...,Twitter will now allow people to tip their fav...,0.7351,0.146,0.827,0.027
2,Old coal plant is now mining bitcoin for a uti...,2021-09-24T20:30:09Z,https://arstechnica.com/tech-policy/2021/09/ol...,Bitcoin is breathing new life into another ail...,19 with 16 posters participating\r\nBitcoins m...,0.0,0.0,1.0,0.0
3,Marathon Digital and other crypto-linked stock...,2021-09-24T13:58:00Z,https://markets.businessinsider.com/news/stock...,China says bitcoin and ether cannot be used as...,Bitcoin\r\nDan Kitwood/ Getty Images\r\nShares...,-0.296,0.0,0.901,0.099
4,Bitcoin dumps after Chinese condemnation | Kit...,2021-09-24T10:14:00Z,https://www.kitco.com/news/2021-09-24/Bitcoin-...,<ol><li>Bitcoin dumps after Chinese condemnati...,"Editor's Note: With so much market volatility,...",-0.9022,0.0,0.706,0.294


In [15]:
btc_articles_df.loc[0]['Content']

'Its the countrys latest crackdown on digital currencies\r\nIllustration by Alex Castro / The Verge\r\nThe Peoples Bank of China, the countrys central bank, said Friday that cryptocurrency transactions ar… [+1461 chars]'

In [16]:
btc_articles_df.loc[0]['Description']

'China’s central bank on Friday said cryptocurrency transactions in the country are illegal, banning all transactions. It said cryptocurrencies like bitcoin and Ethereum are not legal tender and can’t be circulated.'

In [17]:
btc_articles_df.loc[0]['Headlines']

'China’s central bank bans cryptocurrency transactions to avoid ‘risks’'

In [18]:
btc_articles_df.loc[3]['Content']

'Bitcoin\r\nDan Kitwood/ Getty Images\r\nShares of bitcoin miners and other stocks tied to the cryptocurrency space slid Friday after China said all crypto-related transactions are illegal, with the count… [+1677 chars]'

In [19]:
btc_articles_df.loc[3]['Description']

"China says bitcoin and ether cannot be used as currency in the world's second-largest economy, sending bitcoin and crypto-linked stocks lower."

In [20]:
btc_articles_df.loc[3]['Headlines']

'Marathon Digital and other crypto-linked stocks drop after China declares cryptocurrency transactions illegal'

### Clean the Text:

In [21]:
def clean_text(text):
#     regex = re.compile('[%s]' % re.escape(string.punctuation))
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', text)
    return re_clean

In [22]:
test_clean = clean_text(btc_articles_df.loc[3]['Description'])

In [23]:
type(test_clean)

str

In [24]:
test_clean

'China says bitcoin and ether cannot be used as currency in the worlds secondlargest economy sending bitcoin and cryptolinked stocks lower'

In [25]:
# Lambda function of "clean_text()" for use with apply
text_cleaner = lambda x: clean_text(x)

In [26]:
# Review updated text:
btc_clean = pd.DataFrame(btc_articles_df['Description'].apply(text_cleaner))
btc_clean

Unnamed: 0,Description
0,Chinas central bank on Friday said cryptocurre...
1,Twitter will now allow people to tip their fav...
2,Bitcoin is breathing new life into another ail...
3,China says bitcoin and ether cannot be used as...
4,olliBitcoin dumps after Chinese condemnation ...
5,Bitcoin fell nearly on Friday after Chinas ce...
6,Chinas hostility towards crypto stretches back...
7,Chinas moves to crack down on bitcoin trading ...
8,The Central Bank of China announced that it wi...
9,Central bank says all crypto currencies includ...


In [48]:
# Function to get compound sentiments
def get_compound_sentiment(content):
    sentiment = analyzer.polarity_scores(content)
    compound = sentiment["compound"]
    return compound
get_compoundScore = lambda x: get_compound_sentiment(x)

In [49]:
btc_articles_df['Compound Sentiment'] = btc_articles_df['Description'].apply(get_compoundScore)
btc_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content,Compound Sentiment,Positive Sentiment,Neutral Sentiment,Negative Sentiment
0,China’s central bank bans cryptocurrency trans...,2021-09-24T16:22:55Z,https://www.theverge.com/2021/9/24/22691472/ch...,China’s central bank on Friday said cryptocurr...,Its the countrys latest crackdown on digital c...,-0.3549,0.07,0.789,0.14
1,Twitter will allow people to tip their favorit...,2021-09-24T12:14:00Z,https://techncruncher.blogspot.com/2021/09/twi...,Twitter will now allow people to tip their fav...,Twitter will now allow people to tip their fav...,0.7351,0.146,0.827,0.027
2,Old coal plant is now mining bitcoin for a uti...,2021-09-24T20:30:09Z,https://arstechnica.com/tech-policy/2021/09/ol...,Bitcoin is breathing new life into another ail...,19 with 16 posters participating\r\nBitcoins m...,0.0,0.0,1.0,0.0
3,Marathon Digital and other crypto-linked stock...,2021-09-24T13:58:00Z,https://markets.businessinsider.com/news/stock...,China says bitcoin and ether cannot be used as...,Bitcoin\r\nDan Kitwood/ Getty Images\r\nShares...,-0.296,0.0,0.901,0.099
4,Bitcoin dumps after Chinese condemnation | Kit...,2021-09-24T10:14:00Z,https://www.kitco.com/news/2021-09-24/Bitcoin-...,<ol><li>Bitcoin dumps after Chinese condemnati...,"Editor's Note: With so much market volatility,...",-0.9022,0.0,0.706,0.294


In [50]:
# Function to get positive sentiments
def get_positive_sentiment(content):
    sentiment = analyzer.polarity_scores(content)
    positive = sentiment["pos"]
    return positive
get_positiveScore = lambda x: get_positive_sentiment(x)

In [51]:
btc_articles_df['Positive Sentiment'] = btc_articles_df['Description'].apply(get_positiveScore)
btc_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content,Compound Sentiment,Positive Sentiment,Neutral Sentiment,Negative Sentiment
0,China’s central bank bans cryptocurrency trans...,2021-09-24T16:22:55Z,https://www.theverge.com/2021/9/24/22691472/ch...,China’s central bank on Friday said cryptocurr...,Its the countrys latest crackdown on digital c...,-0.3549,0.07,0.789,0.14
1,Twitter will allow people to tip their favorit...,2021-09-24T12:14:00Z,https://techncruncher.blogspot.com/2021/09/twi...,Twitter will now allow people to tip their fav...,Twitter will now allow people to tip their fav...,0.7351,0.146,0.827,0.027
2,Old coal plant is now mining bitcoin for a uti...,2021-09-24T20:30:09Z,https://arstechnica.com/tech-policy/2021/09/ol...,Bitcoin is breathing new life into another ail...,19 with 16 posters participating\r\nBitcoins m...,0.0,0.0,1.0,0.0
3,Marathon Digital and other crypto-linked stock...,2021-09-24T13:58:00Z,https://markets.businessinsider.com/news/stock...,China says bitcoin and ether cannot be used as...,Bitcoin\r\nDan Kitwood/ Getty Images\r\nShares...,-0.296,0.0,0.901,0.099
4,Bitcoin dumps after Chinese condemnation | Kit...,2021-09-24T10:14:00Z,https://www.kitco.com/news/2021-09-24/Bitcoin-...,<ol><li>Bitcoin dumps after Chinese condemnati...,"Editor's Note: With so much market volatility,...",-0.9022,0.0,0.706,0.294


In [52]:
# Function to get neutral sentiments
def get_neutral_sentiment(content):
    sentiment = analyzer.polarity_scores(content)
    neutral = sentiment["neu"]
    return neutral
get_neutralScore = lambda x: get_neutral_sentiment(x)

In [53]:
btc_articles_df['Neutral Sentiment'] = btc_articles_df['Description'].apply(get_neutralScore)
btc_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content,Compound Sentiment,Positive Sentiment,Neutral Sentiment,Negative Sentiment
0,China’s central bank bans cryptocurrency trans...,2021-09-24T16:22:55Z,https://www.theverge.com/2021/9/24/22691472/ch...,China’s central bank on Friday said cryptocurr...,Its the countrys latest crackdown on digital c...,-0.3549,0.07,0.789,0.14
1,Twitter will allow people to tip their favorit...,2021-09-24T12:14:00Z,https://techncruncher.blogspot.com/2021/09/twi...,Twitter will now allow people to tip their fav...,Twitter will now allow people to tip their fav...,0.7351,0.146,0.827,0.027
2,Old coal plant is now mining bitcoin for a uti...,2021-09-24T20:30:09Z,https://arstechnica.com/tech-policy/2021/09/ol...,Bitcoin is breathing new life into another ail...,19 with 16 posters participating\r\nBitcoins m...,0.0,0.0,1.0,0.0
3,Marathon Digital and other crypto-linked stock...,2021-09-24T13:58:00Z,https://markets.businessinsider.com/news/stock...,China says bitcoin and ether cannot be used as...,Bitcoin\r\nDan Kitwood/ Getty Images\r\nShares...,-0.296,0.0,0.901,0.099
4,Bitcoin dumps after Chinese condemnation | Kit...,2021-09-24T10:14:00Z,https://www.kitco.com/news/2021-09-24/Bitcoin-...,<ol><li>Bitcoin dumps after Chinese condemnati...,"Editor's Note: With so much market volatility,...",-0.9022,0.0,0.706,0.294


In [54]:
# Function to get negative sentiments
def get_negative_sentiment(content):
    sentiment = analyzer.polarity_scores(content)
    negative = sentiment["neg"]
    return negative
get_negativeScore = lambda x: get_negative_sentiment(x)

In [55]:
btc_articles_df['Negative Sentiment'] = btc_articles_df['Description'].apply(get_negativeScore)
btc_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content,Compound Sentiment,Positive Sentiment,Neutral Sentiment,Negative Sentiment
0,China’s central bank bans cryptocurrency trans...,2021-09-24T16:22:55Z,https://www.theverge.com/2021/9/24/22691472/ch...,China’s central bank on Friday said cryptocurr...,Its the countrys latest crackdown on digital c...,-0.3549,0.07,0.789,0.14
1,Twitter will allow people to tip their favorit...,2021-09-24T12:14:00Z,https://techncruncher.blogspot.com/2021/09/twi...,Twitter will now allow people to tip their fav...,Twitter will now allow people to tip their fav...,0.7351,0.146,0.827,0.027
2,Old coal plant is now mining bitcoin for a uti...,2021-09-24T20:30:09Z,https://arstechnica.com/tech-policy/2021/09/ol...,Bitcoin is breathing new life into another ail...,19 with 16 posters participating\r\nBitcoins m...,0.0,0.0,1.0,0.0
3,Marathon Digital and other crypto-linked stock...,2021-09-24T13:58:00Z,https://markets.businessinsider.com/news/stock...,China says bitcoin and ether cannot be used as...,Bitcoin\r\nDan Kitwood/ Getty Images\r\nShares...,-0.296,0.0,0.901,0.099
4,Bitcoin dumps after Chinese condemnation | Kit...,2021-09-24T10:14:00Z,https://www.kitco.com/news/2021-09-24/Bitcoin-...,<ol><li>Bitcoin dumps after Chinese condemnati...,"Editor's Note: With so much market volatility,...",-0.9022,0.0,0.706,0.294


In [36]:
# Save BTC Articles data to CSV:
btc_articles_df.to_csv(f"btc_articles_df_{current_date[:10]}.csv")

## Ethereum Sentiment Analysis

In [37]:
# Fetch the Ethereum news articles
eth_articles_df = get_articles("ethereum")

Fetching news about 'ethereum'
******************************
retrieving news from: 2021-09-25 01:11:08


In [38]:
eth_articles_df.shape

(20, 5)

In [39]:
eth_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content
0,China’s central bank bans cryptocurrency trans...,2021-09-24T16:22:55Z,https://www.theverge.com/2021/9/24/22691472/ch...,China’s central bank on Friday said cryptocurr...,Its the countrys latest crackdown on digital c...
1,China declares cryptocurrency transactions ill...,2021-09-24T16:08:57Z,https://abcnews.go.com/Business/china-declares...,"Bitcoin, Ethereum and others dipped on Friday ...","Bitcoin, Ethereum and other cryptocurrencies d..."
2,Twitter will allow people to tip their favorit...,2021-09-24T12:14:00Z,https://techncruncher.blogspot.com/2021/09/twi...,Twitter will now allow people to tip their fav...,Twitter will now allow people to tip their fav...
3,Bitcoin dumps after Chinese condemnation | Kit...,2021-09-24T10:14:00Z,https://www.kitco.com/news/2021-09-24/Bitcoin-...,<ol><li>Bitcoin dumps after Chinese condemnati...,"Editor's Note: With so much market volatility,..."
4,Bitcoin dives after China declares all crypto ...,2021-09-24T13:45:23Z,https://thenextweb.com/news/china-declares-all...,Did you know Hard Fork is taking the stage on ...,Virtual currency-related business activities a...


In [40]:
eth_articles_df.loc[0]['Content']

'Its the countrys latest crackdown on digital currencies\r\nIllustration by Alex Castro / The Verge\r\nThe Peoples Bank of China, the countrys central bank, said Friday that cryptocurrency transactions ar… [+1461 chars]'

In [41]:
eth_articles_df.loc[0]['Description']

'China’s central bank on Friday said cryptocurrency transactions in the country are illegal, banning all transactions. It said cryptocurrencies like bitcoin and Ethereum are not legal tender and can’t be circulated.'

### Clean the Text

In [42]:
# Using functions created for Cleaning BTC text:
eth_clean = pd.DataFrame(eth_articles_df['Description'].apply(text_cleaner))
eth_clean

Unnamed: 0,Description
0,Chinas central bank on Friday said cryptocurre...
1,Bitcoin Ethereum and others dipped on Friday a...
2,Twitter will now allow people to tip their fav...
3,olliBitcoin dumps after Chinese condemnation ...
4,Did you know Hard Fork is taking the stage on ...
5,China says bitcoin and ether cannot be used as...
6,Some cryptocurrencies are down over since the...
7,China has reiterated its crackdown on cryptocu...
8,The Central Bank of China announced that it wi...
9,Chinas central bank has declared all transacti...


## Create the Ethereum sentiment scores DataFrame

In [44]:
# Function to get compound sentiments
eth_articles_df['Compound Sentiment'] = eth_articles_df['Description'].apply(get_compoundScore)
eth_articles_df['Positive Sentiment'] = eth_articles_df['Description'].apply(get_positiveScore)
eth_articles_df['Neutral Sentiment'] = eth_articles_df['Description'].apply(get_neutralScore)
eth_articles_df['Negative Sentiment'] = eth_articles_df['Description'].apply(get_negativeScore)
eth_articles_df.head()

Unnamed: 0,Headlines,Date_Time,URL,Description,Content,Compound Sentiment,Positive Sentiment,Neutral Sentiment,Negative Sentiment
0,China’s central bank bans cryptocurrency trans...,2021-09-24T16:22:55Z,https://www.theverge.com/2021/9/24/22691472/ch...,China’s central bank on Friday said cryptocurr...,Its the countrys latest crackdown on digital c...,-0.3549,0.07,0.789,0.14
1,China declares cryptocurrency transactions ill...,2021-09-24T16:08:57Z,https://abcnews.go.com/Business/china-declares...,"Bitcoin, Ethereum and others dipped on Friday ...","Bitcoin, Ethereum and other cryptocurrencies d...",0.0,0.0,1.0,0.0
2,Twitter will allow people to tip their favorit...,2021-09-24T12:14:00Z,https://techncruncher.blogspot.com/2021/09/twi...,Twitter will now allow people to tip their fav...,Twitter will now allow people to tip their fav...,0.7351,0.146,0.827,0.027
3,Bitcoin dumps after Chinese condemnation | Kit...,2021-09-24T10:14:00Z,https://www.kitco.com/news/2021-09-24/Bitcoin-...,<ol><li>Bitcoin dumps after Chinese condemnati...,"Editor's Note: With so much market volatility,...",-0.9022,0.0,0.706,0.294
4,Bitcoin dives after China declares all crypto ...,2021-09-24T13:45:23Z,https://thenextweb.com/news/china-declares-all...,Did you know Hard Fork is taking the stage on ...,Virtual currency-related business activities a...,0.7263,0.13,0.842,0.028


In [45]:
# Save BTC Articles data to CSV:
eth_articles_df.to_csv(f"eth_articles_df_{current_date[:10]}.csv")

In [56]:
# Describe the Bitcoin Sentiment
btc_articles_df.describe()

Unnamed: 0,Compound Sentiment,Positive Sentiment,Neutral Sentiment,Negative Sentiment
count,20.0,20.0,20.0,20.0
mean,-0.288055,0.02335,0.866,0.1106
std,0.484851,0.045966,0.096175,0.098305
min,-0.9022,0.0,0.698,0.0
25%,-0.70835,0.0,0.81225,0.02025
50%,-0.45615,0.0,0.861,0.122
75%,0.0,0.01075,0.92575,0.14625
max,0.7351,0.146,1.0,0.302


In [57]:
# Describe the Ethereum Sentiment
eth_articles_df.describe()

Unnamed: 0,Compound Sentiment,Positive Sentiment,Neutral Sentiment,Negative Sentiment
count,20.0,20.0,20.0,20.0
mean,-0.11211,0.04725,0.8605,0.09215
std,0.495794,0.07298,0.098021,0.093939
min,-0.9022,0.0,0.704,0.0
25%,-0.40175,0.0,0.785,0.0
50%,-0.2506,0.0,0.866,0.088
75%,0.1101,0.07175,0.92575,0.1235
max,0.7351,0.227,1.0,0.296


### Questions:

Q: Which coin had the highest mean positive score?

A: Ethereum with 0.04725

Q: Which coin had the highest compound score?

A: Both Ethereum and Bitcoin shared the same highest Compound Score at 0.7351

Q. Which coin had the highest positive score?

A: Ethereum with 0.227

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [58]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [59]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

# Create a list of stopwords
sw = set(stopwords.words('english'))

# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [73]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', text)
   
    # Create a tokenized list of the words
    words = word_tokenize(re_clean)
    
    # Lemmatize words into root words
    lem = [lemmatizer.lemmatize(word) for word in words]
   
    # Convert the words to lowercase & Remove the stop words
    output = [word.lower() for word in lem if word.lower() not in sw]
    
    tokens = ' '.join(output)
    
    return tokens

# Below asks for a column?  Where or What dataframe are we adding this column too?????

In [75]:
# Turn the discriptive column into a list of text:
btc_text = btc_articles_df['Description'].to_list()
btc_text[0]

'China’s central bank on Friday said cryptocurrency transactions in the country are illegal, banning all transactions. It said cryptocurrencies like bitcoin and Ethereum are not legal tender and can’t be circulated.'

In [79]:
for i in range(0, len(btc_text)):
    btc_combined_text = ' '.join(btc_text)

btc_combined_text

'China’s central bank on Friday said cryptocurrency transactions in the country are illegal, banning all transactions. It said cryptocurrencies like bitcoin and Ethereum are not legal tender and can’t be circulated. Twitter will now allow people to tip their favorite content creators with bitcoin and will also launch a fund to pay some users who host audio chat rooms on its Spaces feature, the company said on Thursday.The company also said it will test new ways to help u… Bitcoin is breathing new life into another ailing power plant. China says bitcoin and ether cannot be used as currency in the world\'s second-largest economy, sending bitcoin and crypto-linked stocks lower. <ol><li>Bitcoin dumps after Chinese condemnation | Kitco News\xa0\xa0Kitco NEWS\r\n</li><li>Bitcoin falls as China deems all cryptocurrency transactions illegal\xa0\xa0Financial Post\r\n</li><li>China turns the screws on crypto, Bitcoin stumbles\xa0\xa0The Globe and Mail\r\n</li><li… Bitcoin fell nearly 5% on Frida

In [81]:
# Create a new tokens column for Bitcoin
btc_tokens = tokenizer(btc_combined_text)
print(btc_tokens)

chinas central bank friday said cryptocurrency transaction country illegal banning transaction said cryptocurrencies like bitcoin ethereum legal tender cant circulated twitter allow people tip favorite content creator bitcoin also launch fund pay user host audio chat room spaces feature company said thursdaythe company also said test new way help u bitcoin breathing new life another ailing power plant china say bitcoin ether used currency world secondlargest economy sending bitcoin cryptolinked stock lower ollibitcoin dump chinese condemnation kitco newskitco newslilibitcoin fall china deems cryptocurrency transaction illegalfinancial postlilichina turn screw crypto bitcoin stumblesthe globe maillili bitcoin fell nearly friday chinas central bank said would crack cryptocurrency trading banning overseas exchange providing service mainland investor chinas hostility towards crypto stretch back country banned bank handling bitcoin transaction chinas move crack bitcoin trading dealt another

In [82]:
# Create a new tokens column for Ethereum
# Turn the discriptive column into a list of text:
eth_text = eth_articles_df['Description'].to_list()
for i in range(0, len(eth_text)):
    eth_combined_text = ' '.join(eth_text)

eth_tokens = tokenizer(eth_combined_text)
print(eth_tokens)

chinas central bank friday said cryptocurrency transaction country illegal banning transaction said cryptocurrencies like bitcoin ethereum legal tender cant circulated bitcoin ethereum others dipped friday news beijing twitter allow people tip favorite content creator bitcoin also launch fund pay user host audio chat room spaces feature company said thursdaythe company also said test new way help u ollibitcoin dump chinese condemnation kitco newskitco newslilibitcoin fall china deems cryptocurrency transaction illegalfinancial postlilichina turn screw crypto bitcoin stumblesthe globe maillili know hard fork taking stage sept oct together amazing lineup expert explore future crypto tnw conference secure ticket cryptocurrencies nosedived friday chinas central b china say bitcoin ether used currency world secondlargest economy sending bitcoin cryptolinked stock lower cryptocurrencies since news brokecryptocurrency trader set rocky day chinas central bank announced cryptocurrencyrelated tr

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [83]:
from collections import Counter
from nltk import ngrams
import spacy

# Load the English language model for spaCy
nlp = spacy.load("en_core_web_sm")

In [84]:
def create_bigrams(text):
    bigrams = ngrams(text, 2)
    output = ['_'.join(i) for i in bigrams]
    return ' '.join(output)

In [85]:
# Generate the Bitcoin N-grams where N=2
btc_bigrams = create_bigrams(btc_tokens)
btc_bigrams

'c_h h_i i_n n_a a_s s_   _c c_e e_n n_t t_r r_a a_l l_   _b b_a a_n n_k k_   _f f_r r_i i_d d_a a_y y_   _s s_a a_i i_d d_   _c c_r r_y y_p p_t t_o o_c c_u u_r r_r r_e e_n n_c c_y y_   _t t_r r_a a_n n_s s_a a_c c_t t_i i_o o_n n_   _c c_o o_u u_n n_t t_r r_y y_   _i i_l l_l l_e e_g g_a a_l l_   _b b_a a_n n_n n_i i_n n_g g_   _t t_r r_a a_n n_s s_a a_c c_t t_i i_o o_n n_   _s s_a a_i i_d d_   _c c_r r_y y_p p_t t_o o_c c_u u_r r_r r_e e_n n_c c_i i_e e_s s_   _l l_i i_k k_e e_   _b b_i i_t t_c c_o o_i i_n n_   _e e_t t_h h_e e_r r_e e_u u_m m_   _l l_e e_g g_a a_l l_   _t t_e e_n n_d d_e e_r r_   _c c_a a_n n_t t_   _c c_i i_r r_c c_u u_l l_a a_t t_e e_d d_   _t t_w w_i i_t t_t t_e e_r r_   _a a_l l_l l_o o_w w_   _p p_e e_o o_p p_l l_e e_   _t t_i i_p p_   _f f_a a_v v_o o_r r_i i_t t_e e_   _c c_o o_n n_t t_e e_n n_t t_   _c c_r r_e e_a a_t t_o o_r r_   _b b_i i_t t_c c_o o_i i_n n_   _a a_l l_s s_o o_   _l l_a a_u u_n n_c c_h h_   _f f_u u_n n_d d_   _p p_a a_y y_   _u u_s s_e e_r

In [17]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [18]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [19]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [78]:
# Get a list of Adjectives, Nouns, and Proper Nouns from text.  Returns each word with a count.
def most_freq_words(text):
    """
    This function gets all of the adjectives and nouns in the text.
    Args:  text (string):  The text to analyze
    Returns:  most_common_word(list):  A list with all Adjectives, Nouns, and Proper Nouns
    """
    # Tokenizes text and parse each token
    doc = nlp(text)
    
    # Creates a list with all the adjectives in the text
    words = [token.text.lower() for token in doc if ((token.pos_ == 'ADJ') or (token.pos_ == 'PROPN') or (token.pos_ == 'NOUN'))]
    
    # Retrieves the most frequent adjective in the `adjs` list using the Counter module
    most_common_word = Counter( words).most_common(1)
    
    return most_common_word

count_words = lambda x: most_freq_words(x)

In [79]:
# Create a list most common words
word_count = btc_articles_df['Description'].apply(count_words)

In [83]:
# Display Sample
print(word_count[:10])

0              [(el, 1)]
1         [(bitcoin, 2)]
2        [(analysts, 1)]
3            [(many, 2)]
4        [(exchange, 2)]
5      [(blockchain, 1)]
6        [(offshore, 1)]
7           [(total, 1)]
8           [(forex, 1)]
9    [(transactions, 2)]
Name: Description, dtype: object


In [81]:
type(word_count)

pandas.core.series.Series

#### Use the "most_common()" function from the Counter module to fetcht the the 10 most frequent words in the articles.  
The "most_common()" function returns a Python list that can be stored in the variable most_frequent_words.

In [84]:
# Retreive the most frequent words:
most_frequent_words = Counter(word_count).most_common(10)
print(most_frequent_words)

TypeError: unhashable type: 'list'

In [None]:
# Review functionality of the most common word for each article:


In [20]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [21]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [22]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [24]:
import spacy
from spacy import displacy

In [25]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [26]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [27]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [28]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [29]:
# Render the visualization
# YOUR CODE HERE!

In [30]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [31]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [32]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [33]:
# Render the visualization
# YOUR CODE HERE!

In [34]:
# List all Entities
# YOUR CODE HERE!

---