## What is the sentiment of an English alphabet?
This is fun experiment in which the sentiment (polarity) of each of the 26 English alphabet is explored. The polarity is computed using a sentiment lexicon. In this experiment, lexicon from Stanford is used.
* Once the sentiment of each alphabet is found, then it can be used to measure the polarity of words/phrases. Please note that, this score is the direct reflection of the score of the individual alphabets in the words and doesn't necessarily reflect any semantics of the word/phrases.

In [1]:
import pickle
import os
import string
import pandas as pd

Data set: https://nlp.stanford.edu/projects/socialsent/

In [21]:
sentiment_lexicon_file = "./socialsent_hist_freq/frequent_words/2000.tsv"
sentiment_lexicon = pd.read_csv(sentiment_lexicon_file, sep='\t', header=None, encoding='utf-8',)

In [3]:
def build_polarity(sentiment_lexicon):
    """
    This method returns a dictionary object that holds the polarity score of each of the 26 alphabets in English.
    The score is calculated based on the number of occurances of words starting with the alphabet in the sentiment
    lexicon. For this purpose, the sentiment lexicon is re-scored to -1(negative) and 1(positive), by re-mapping
    values less than zero as negative(-1) and greater than zero as positive(1).
    :param sentiment_lexicon: A dataframe of the lexicon of polarity.
    :return: A dictionary of size 26 with a numerical score for each alphabet. A value less than zero indicates
    negative polarity and a value greater than zero indicates positive polarity. Higher the value means higher the
    polarity.
    """
    sentiment_lexicon = sentiment_lexicon.dropna()
    sentiment_lexicon.loc[sentiment_lexicon[1] > 0, 1] = 1
    sentiment_lexicon.loc[sentiment_lexicon[1] < 0, 1] = -1
    positive_words = sentiment_lexicon.loc[sentiment_lexicon[1] == 1.0][0].str.lower().values.tolist()
    negative_words = sentiment_lexicon.loc[sentiment_lexicon[1] == -1.0][0].str.lower().values.tolist()
    alphabets = list(string.ascii_lowercase)
    # Compute positive score for each alphabet
    alpha_pos_score = {}
    for each_alphabet in alphabets:
        score = [(True) for each_word in positive_words if each_word.startswith(each_alphabet)]
        alpha_pos_score[each_alphabet] = len(score)
    # Compute negative score for each alphabet
    alpha_neg_score = {}
    for each_alphabet in alphabets:
        score = [(True) for each_word in negative_words if each_word.startswith(each_alphabet)]
        alpha_neg_score[each_alphabet] = len(score)
    alphabet_polarity = dict()
    for each_alphabet in alphabets:
        alphabet_polarity[each_alphabet] = alpha_pos_score[each_alphabet] - alpha_neg_score[each_alphabet]
    return alphabet_polarity

In [4]:
def compute_phrase_score(phrase, alphabet_polarity):
    """
    Calculates a numerical score for the phrase based on the alphabet_polarity computed from the sentiment lexicon.
    A negative value indicates poor polarity whereas a positive value indicates a good polarity. Bigger the value, 
    higher is the phrase's polarity in the respective region.
    :param phrase: A string of characters for which score is computed.
    :param alphabet_polarity: A dictionary indicating polarity score for each of the 26 English alphabets.
    :return: A numerical score indicating the polarity of the phrase as calculated from alphabet_polarity. The 
    score is normalized by phrase length.
    """
    polarity_score = 0
    for each_letter in phrase:
        try:
            polarity_score += alphabet_polarity[each_letter.lower()]
        except KeyError:
            continue
    phrase_score = round(polarity_score/(len(phrase)), 2)
    return phrase_score

In [5]:
def save_word_polarity(path, alphabet_polarity):
    with open(os.path.join(path,'alphabet_polarity.pkl'), 'wb') as f:
        pickle.dump(alphabet_polarity, f, pickle.HIGHEST_PROTOCOL)

In [6]:
def load_word_polarity(path):
    with open(os.path.join(path, 'alphabet_polarity.pkl'), 'rb') as f:
        return pickle.load(f)

In [7]:
alphabet_polarity = build_polarity(sentiment_lexicon)

In [8]:
save_word_polarity("./",alphabet_polarity)

## Some random tests

In [9]:
alpha_polarity = load_word_polarity("./")
test_word = "happy"
print("Polarity of word '%s' is %s" %(test_word,str(compute_phrase_score(test_word, alpha_polarity))))

Polarity of word 'happy' is 56.0


test_word = "idle"
print("Polarity of word '%s' is %s" %(test_word,str(compute_phrase_score(test_word, alpha_polarity))))

In [11]:
test_word = "hero"
print("Polarity of word '%s' is %s" %(test_word,str(compute_phrase_score(test_word, alpha_polarity))))

Polarity of word 'hero' is 19.25


In [13]:
test_word = "die"
print("Polarity of word '%s' is %s" %(test_word,str(compute_phrase_score(test_word, alpha_polarity))))

Polarity of word 'die' is -8.67


In [24]:
test_word = "idle"
print("Polarity of word '%s' is %s" %(test_word,str(compute_phrase_score(test_word, alpha_polarity))))

Polarity of word 'idle' is -1.25


In [19]:
test_word = "love"
print("Polarity of word '%s' is %s" %(test_word,str(compute_phrase_score(test_word, alpha_polarity))))

Polarity of word 'love' is 11.5


## What are the most positive and negative English alphabets?!!

In [20]:
import operator
sorted_dict = sorted(alpha_polarity.items(), key=operator.itemgetter(1))
print("Most positive alphabets are: ", sorted_dict[-3:])
print("Most negative alphabets are: ", sorted_dict[:3])

Most positive alphabets are:  [('m', 71), ('s', 99), ('p', 116)]
Most negative alphabets are:  [('i', -25), ('d', -14), ('k', -3)]
