In this repository I applied natural language processing to understand the sentiment in the latest news articles featuring Bitcoin and Ethereum. I also applied fundamental NLP techniques to better understand the other factors involved with the coin prices such as common words and phrases and organizations and entities mentioned in the articles.
I completed the following tasks:
- Sentiment Analysis
- Natural Language Processing
- Named Entity Recognition
See this contained in this Jupyter Lab notebook.
I used the newsapi to pull the latest news articles for Bitcoin and Ethereum and created a DataFrame of sentiment scores for each coin.
I created the Bitcoin sentiment scores dataframe:
See Bitcoin sentiment below:
I created the Ethereum sentiment scores dataframe:
See Ethereum sentiment as follows:
Some observations include that:
- Ethereum had the highest mean positive score.
- Ethereum had the highest mean compound score.
- Bitcoin had the highest max compound score.
In this section, I used NLTK and Python to tokenize text, find n-gram counts, and create word clouds for both coins.
I used NLTK and Python to tokenize the text for each coin. I completed the following:
- Changed each word to lowercase.
- Removed punctuation.
- Removed stop words.
See relevant code below:
I then added the "Tokens" column of the tokenized text to the dataframe:
Then I looked at the ngrams and word frequency for each coin.
I completed as follows:
- Used NLTK to produce the ngrams for N = 2.
- Listed the top 10 words for each coin.
See below the count for ngrams for N = 2:
See below the code and results for the top 10 words for each coin:
Finally, I generated word clouds for each coin to summarize the news for each coin.
See Bitcoin word cloud:
See Ethereum word cloud:
In this section, I built a named entity recognition (NER) model for both coins and visualized the tags using SpaCy.
See Bitcoin NER:
See Ethereum NER: