Skip to content

poltextlab/sentiment_hun

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sentiment_hun

Possibilities and limitations of a lexicon-based sentiment analysis of Hungarian political news

Draft version

Features

  • Word Embedding with Word2vec model and parameters presented in the draft
  • Sentiment dictionary: finds sentiment values based on given dictonary and corpus with the methods presented in the draft

Usage

Word Embedding

Run Word_embedding_w2v.py!

  • Give the path of your folder containing all excel files of the embedding corpus!
  • Give the column of the excels containing the text to embed on - NOTE: the name of this column must be identical in each excel!
  • The Word2vec model is initialised with the parameters given in the draft!
  • You have two options:
    • One: Embedding of a list of positive and negative words - the result is an excel, containing all your embeddings
    • Two: Embedding of a single word and output a .txt

Sentiment dictionary

Requirements: magyarlanc

Run MAIN_sentiment_dictionary.py!

  • Input the excel name to analyse!
  • Input the name of the column containing ids for the articles or a given text. Each row in the excel must have a unique id!
  • Input the content column! The column composed of the main textual part of each excel row.
  • Input the location of the dictonaries! Input the exact path where your dictionaries are located!
  • Input the positive dictonary! The name of you .txt dictonary of positive words, each written seperately in a new line!
  • Input the negative dictonary! The name of you .txt dictonary of negative words, each written seperately in a new line!
  • You have two four ways to analyise: 'One: Simple' = After preprocessing use brute-force search to find words in positive and negative dictonaries. Each token accounts for +1 or -1 respectively.

'Two': Simple with the addition of applying the "hungarian_2" stoplist

'Three: Sentiment-score' = Use sentiment scoring after search.

  • sentiment_value: The result of the brute-force method search
  • ossz_sentiment = sum of all words with sentiment values
  • sentiment_threshold: ossz_sentiment / count of all tokens in an entry
  • sentiment_nullify: The ratio between negative and positive words in an entry if sentiment_value < 0 and (sentiment_threshold > 0.1 or sentiment_nullify < 0.95 --> negative if sentiment_value > 0 and (sentiment_threshold > 0.1 or sentiment_nullify < 0.95 --> postitive if sentiment_threshold < 0.1 or sentiment_nullify > 0.95 --> neutral

'Four': Sentiment-score with the addition of applying the "hungarian_2" stoplist

The output is an excel file named "sentiment.xlsx" in the output folder along with a brief overview of choice of sentiment for each row of the desired excel.

The packages used in both programs belong to their rightful owners!

Dependencies and credits:

Orsolya Ring, Martina Katalin Szabó, Csenge Guba, Bendegúz Váradi, István Üveges: Approaches to Sentiment Analysis of Hungarian Political News at Sentence Level with Dictionary-based Method and with Machine Learning (Under review)

The research was supported by the Ministry of Innovation and Technology NRDI Office within the framework of the Artificial Intelligence National Laboratory Program.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages