Skip to content

Analysis of the sentiment of the latest news articles on Bitcoin and Ethereum using sentiment analysis, natural language processing and named entity recognition.

Notifications You must be signed in to change notification settings

sarahm44/crypto-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Crypto Sentiment Analysis

Table of Contents

Overview

In this repository I applied natural language processing to understand the sentiment in the latest news articles featuring Bitcoin and Ethereum. I also applied fundamental NLP techniques to better understand the other factors involved with the coin prices such as common words and phrases and organizations and entities mentioned in the articles.

I completed the following tasks:

  1. Sentiment Analysis
  2. Natural Language Processing
  3. Named Entity Recognition

See this contained in this Jupyter Lab notebook.

Sentiment Analysis

I used the newsapi to pull the latest news articles for Bitcoin and Ethereum and created a DataFrame of sentiment scores for each coin.

Bitcoin Sentiment

I created the Bitcoin sentiment scores dataframe:

See Bitcoin sentiment below:

Ethereum Sentiment

I created the Ethereum sentiment scores dataframe:

See Ethereum sentiment as follows:

Some observations include that:

  • Ethereum had the highest mean positive score.
  • Ethereum had the highest mean compound score.
  • Bitcoin had the highest max compound score.

Natural Language Processing

In this section, I used NLTK and Python to tokenize text, find n-gram counts, and create word clouds for both coins.

Tokenize

I used NLTK and Python to tokenize the text for each coin. I completed the following:

  1. Changed each word to lowercase.
  2. Removed punctuation.
  3. Removed stop words.

See relevant code below:

I then added the "Tokens" column of the tokenized text to the dataframe:

N-grams

Then I looked at the ngrams and word frequency for each coin.

I completed as follows:

  1. Used NLTK to produce the ngrams for N = 2.
  2. Listed the top 10 words for each coin.

See below the count for ngrams for N = 2:

See below the code and results for the top 10 words for each coin:

Word Clouds

Finally, I generated word clouds for each coin to summarize the news for each coin.

See Bitcoin word cloud:

See Ethereum word cloud:

Named Entity Recognition

In this section, I built a named entity recognition (NER) model for both coins and visualized the tags using SpaCy.

See Bitcoin NER:

See Ethereum NER:

About

Analysis of the sentiment of the latest news articles on Bitcoin and Ethereum using sentiment analysis, natural language processing and named entity recognition.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published