Sentiment analysis for financial news

Sentiment analysis, also known as opinion mining, refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. (Wikipedia) In real life, Financial Market Analysts make predictions on the stock market based on opinions and happenings in the news. A simple example for the financial sector can be explained by the task of assigning positive, negative or neutral sentiment values to the words. For instance, words such as “good“, “benefit“, “positive“, and “growth” are all tagged with positive scores while words such as “risk“, “fall“, “bankruptcy“, and “loss” are tagged with negative scores.

Motivation

Create tools for Text Analysis.

As a data scientist for an insurance company, I found myself working on text data.

Text is an unstructured data which can provide a lot of information. And doing a statistical analysis on it allows to draw some information.

The idea is to use Data Analysis for helps us identify words, or combination of word for a better understanding of the market.

What are the most commun words ?
Are there significant words ?
How to use N-grams ?

Identified the most common words allows us to discover useful information, informing conclusions and supporting decision-making. But sometimes some common words can be considered as stop words. Indeed, if they are present in each label with similar proportions, it is very likely that they do not provide information.

File Descriptions

In this repository, you will find :

Results

The main findings of the code can be found at the post available

Tools

Wordcloud

A Wordcloud function is available by using Wordcloud's package

def plot_word_cloud(data,text='text',label=None,save=True):
  """ Inputs : Dataset, text colums,labels column
  Output : Word cloud for all the corpus and for each label"""
  word_cloud_data = " ".join([post for post in data[text] ])
  word_cloud_data = WordCloud(stopwords=STOPWORDS).generate(word_cloud_data)
  plt.figure()
  plt.imshow(word_cloud_data)
  plt.title('All corpus')
  plt.axis("off")
  plt.show()
  if save:
    plt.savefig('wordcloud.png', dpi=300)
  if label !=None:
    labels=data[label].unique()
    for i in range(len(labels)):
          word_cloud_data = " ".join([post for (post,label) in zip(data[text],data[label]) if label==labels[i]])
          word_cloud_data = WordCloud(stopwords=STOPWORDS).generate(word_cloud_data)
          plt.figure(i)
          plt.imshow(word_cloud_data)
          plt.title('{}'.format(labels[i]))
          plt.axis("off")

Word Frequencies

A Word Frequencies function is available. It prints the most commun word for each labels and remove the most commun word in some case.

def word_frequencies(data,word):

  c_unique = Counter()
  for ind in data.index:
      c_unique.update(Counter(set(data.loc[ind][word])))

  print('First 20 common words:\n')
  for word in c_unique.most_common(20):
      print(word[0],'-->', 'appeared in',word[1],'documents out of {} documents i.e.'.format(len(data)),np.round(100*word[1]/len(data),2),'%')

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
images		images
.DS_Store		.DS_Store
README.md		README.md
all-data.csv		all-data.csv
sentiment_analysis_for_financial_news.ipynb		sentiment_analysis_for_financial_news.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment analysis for financial news

Table of contents

Motivation

File Descriptions

Results

Tools

Wordcloud

Word Frequencies

Technologies

Languages

Dependencies

Contact

About

Releases

Packages

Languages

isaaccs/sentiment-analysis-for-financial-news

Folders and files

Latest commit

History

Repository files navigation

Sentiment analysis for financial news

Table of contents

Motivation

File Descriptions

Results

Tools

Wordcloud

Word Frequencies

Technologies

Languages

Dependencies

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages