### Sentiment Analysis

In [1]:
import pandas as pd

books = pd.read_csv("books_with_categories.csv")

What if we wanted to find books based on emotions like happy, sad, or joyful? In this final step of our Jupyter notebook, we will perform another text classification process, specifically focused on emotion and sentiment analysis, to better understand the emotional context of a book's description. This is a key part of our broader goal: building a semantic recommender system that can suggest books aligned with specific emotional themes, sentiments, or user interests.

In [23]:
from transformers import pipeline
classifier = pipeline("text-classification",
                      model="j-hartmann/emotion-english-distilroberta-base",
                      top_k = None,
                    )
classifier("I love this!")

Device set to use cpu


[[{'label': 'joy', 'score': 0.9771687984466553},
  {'label': 'surprise', 'score': 0.00852868054062128},
  {'label': 'neutral', 'score': 0.005764586851000786},
  {'label': 'anger', 'score': 0.004419779404997826},
  {'label': 'sadness', 'score': 0.002092391485348344},
  {'label': 'disgust', 'score': 0.0016119893407449126},
  {'label': 'fear', 'score': 0.0004138521908316761}]]

Let's perform a text classification using an emotion-focused model that helps us identify the emotional tone of a piece of text. In this example, we're inputting the text "I love this" to see how the model classifies its emotional content. As we can see, the model correctly identifies the emotion as "joy", along with a score that shows how confident it is in that prediction. This shows that our model is working accurately and gives us useful insights into the emotions behind the text

In [3]:
books["description"][0]

'A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives. John Ames is a preacher, the son of a preacher and the grandson (both maternal and paternal) of preachers. It’s 1956 in Gilead, Iowa, towards the end of the Reverend Ames’s life, and he is absorbed in recording his family’s story, a legacy for the young son he will never see grow up. Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: the elder, an angry visionary who fought for the abolitionist cause, and his son, an ardent pacifist. He is troubled, too, by his prodigal namesake, Jack (John Ames) Boughton, his best friend’s lost son who returns to Gilead searching for forgiveness and redemption. Told in John Ames’s joyous, rambling voice that finds beauty, humour and truth in the smallest of life’s details, Gilead is a song of celebration and acceptance of the best and the worst the world ha

Should we run our classifier with using the whole text, individual chucks of text, or indivdual words? Let's experiment by running our classifier using these 3 options

In [None]:
# Running our classifier using the whole description
classifier(books["description"][0])


[[{'label': 'fear', 'score': 0.6548405885696411},
  {'label': 'neutral', 'score': 0.16985228657722473},
  {'label': 'sadness', 'score': 0.11640921980142593},
  {'label': 'surprise', 'score': 0.02070065587759018},
  {'label': 'disgust', 'score': 0.019100677222013474},
  {'label': 'joy', 'score': 0.01516144908964634},
  {'label': 'anger', 'score': 0.003935146611183882}]]

The classifier gets the impression that the text for this book is mainly fearful, but it isn’t analyzing the full context or themes of the description. Instead, let’s split the description into individual sentences and classify each one separately to get a more accurate emotional overview.

In [None]:
# Running our classifier using individual chuncks of text seperate by periods 
classifier(books["description"][0].split("."))

[[{'label': 'surprise', 'score': 0.7296020984649658},
  {'label': 'neutral', 'score': 0.14038600027561188},
  {'label': 'fear', 'score': 0.06816228479146957},
  {'label': 'joy', 'score': 0.04794260859489441},
  {'label': 'anger', 'score': 0.009156366810202599},
  {'label': 'disgust', 'score': 0.0026284765917807817},
  {'label': 'sadness', 'score': 0.002122163539752364}],
 [{'label': 'neutral', 'score': 0.44937002658843994},
  {'label': 'disgust', 'score': 0.27359163761138916},
  {'label': 'joy', 'score': 0.10908330976963043},
  {'label': 'sadness', 'score': 0.09362746775150299},
  {'label': 'anger', 'score': 0.04047830402851105},
  {'label': 'surprise', 'score': 0.026970159262418747},
  {'label': 'fear', 'score': 0.006879047024995089}],
 [{'label': 'neutral', 'score': 0.6462159752845764},
  {'label': 'sadness', 'score': 0.24273329973220825},
  {'label': 'disgust', 'score': 0.04342271760106087},
  {'label': 'surprise', 'score': 0.028300564736127853},
  {'label': 'joy', 'score': 0.014211

Our classifier now seems to align more closely with the context and content of the text by identifying the sentiment of each individual sentence. Each sentence has a leading emotion, represented by brackets like this: []. This is a better match for the emotional tone of the description, but it still may not fully capture the dominant emotion. Let's dive deeper to double-check whether the classifier is working correctly—and explore why it’s specifically choosing surprise as the top emotion. Let's test specific sentences from the description that reflect different emotional tones. This will help ensure that the classifier is accurately detecting a range of emotions.

In [None]:
# Split the book description into sentences and classify each one using hte emotion classification to capture specific emotional analysis
sentences = books["description"][0].split(".")
predictions = classifier(sentences)

In [None]:
# Checking the model's predictions for the first sentence
sentences[0], predictions[0]


('A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives',
 [{'label': 'surprise', 'score': 0.7296020984649658},
  {'label': 'neutral', 'score': 0.14038600027561188},
  {'label': 'fear', 'score': 0.06816228479146957},
  {'label': 'joy', 'score': 0.04794260859489441},
  {'label': 'anger', 'score': 0.009156366810202599},
  {'label': 'disgust', 'score': 0.0026284765917807817},
  {'label': 'sadness', 'score': 0.002122163539752364}])

In [None]:
# Checking the model's predictions for the fifth sentence
sentences[4], predictions[4]


(' He is troubled, too, by his prodigal namesake, Jack (John Ames) Boughton, his best friend’s lost son who returns to Gilead searching for forgiveness and redemption',
 [{'label': 'sadness', 'score': 0.9671575427055359},
  {'label': 'neutral', 'score': 0.015104127116501331},
  {'label': 'disgust', 'score': 0.006480589509010315},
  {'label': 'fear', 'score': 0.005393980070948601},
  {'label': 'surprise', 'score': 0.002286937553435564},
  {'label': 'anger', 'score': 0.0018428878393024206},
  {'label': 'joy', 'score': 0.0017338756006211042}])

We can observe that our emotion classifier is running properly

In [17]:
predictions

[[{'label': 'surprise', 'score': 0.7296020984649658},
  {'label': 'neutral', 'score': 0.14038600027561188},
  {'label': 'fear', 'score': 0.06816228479146957},
  {'label': 'joy', 'score': 0.04794260859489441},
  {'label': 'anger', 'score': 0.009156366810202599},
  {'label': 'disgust', 'score': 0.0026284765917807817},
  {'label': 'sadness', 'score': 0.002122163539752364}],
 [{'label': 'neutral', 'score': 0.44937002658843994},
  {'label': 'disgust', 'score': 0.27359163761138916},
  {'label': 'joy', 'score': 0.10908330976963043},
  {'label': 'sadness', 'score': 0.09362746775150299},
  {'label': 'anger', 'score': 0.04047830402851105},
  {'label': 'surprise', 'score': 0.026970159262418747},
  {'label': 'fear', 'score': 0.006879047024995089}],
 [{'label': 'neutral', 'score': 0.6462159752845764},
  {'label': 'sadness', 'score': 0.24273329973220825},
  {'label': 'disgust', 'score': 0.04342271760106087},
  {'label': 'surprise', 'score': 0.028300564736127853},
  {'label': 'joy', 'score': 0.014211

Now we have a problem: a single book description can contain multiple emotions. So how can we find a central emotion for each book?

One approach is to create a separate column for each of the seven emotion categories. Then, instead of looking at individual emotion scores per sentence, we can scan the entire description and take the highest probability score for each emotion. This allows us to summarize the emotional content of the book more effectively by highlighting the most prominent emotions overall.

Let’s find a way to calculate the maximum score for each emotion label across all sentences in a description

In [18]:
sorted(predictions[0], key=lambda x: x["score"])

[{'label': 'sadness', 'score': 0.002122163539752364},
 {'label': 'disgust', 'score': 0.0026284765917807817},
 {'label': 'anger', 'score': 0.009156366810202599},
 {'label': 'joy', 'score': 0.04794260859489441},
 {'label': 'fear', 'score': 0.06816228479146957},
 {'label': 'neutral', 'score': 0.14038600027561188},
 {'label': 'surprise', 'score': 0.7296020984649658}]

For each sentence, we will maintain the same order of the seven emotion categories.

In [34]:
import numpy as np

emotion_labels = ["anger", "disgust", "fear", "joy", "sadness", "surprise", "neutral"]
isbn = []
emotion_scores = {label: [] for label in emotion_labels}

def calculate_max_emotion_scores(predictions):
    per_emotion_scores = {label: [] for label in emotion_labels}
    for prediction in predictions:
        sorted_predictions = sorted(prediction, key=lambda x: x["label"])
        for index, label in enumerate(emotion_labels):
            per_emotion_scores[label].append(sorted_predictions[index]["score"])
    return {label: np.max(scores) for label, scores in per_emotion_scores.items()}

We are calculating the maximum emotion score for each of the seven emotion categories across all sentences in a book description. This helps us determine the most prominent emotional signals in the entire text, rather than just analyzing emotions sentence by sentence. To do this, we first define a list of emotion categories. Then, for each sentence in the book description, we run it through the classifier to get a set of emotion scores. The function calculate_max_emotion_scores collects these scores and finds the highest score for each emotion label across all sentences. These scores are then stored and associated with the book's ISBN for use in our emotion-based book recommendation system.

In [20]:
for i in range(10):
    isbn.append(books["isbn13"][i])
    sentences = books["description"][i].split(".")
    predictions = classifier(sentences)
    max_scores = calculate_max_emotion_scores(predictions)
    for label in emotion_labels:
        emotion_scores[label].append(max_scores[label])

Let's build an emotional profile for each book. First, we identify the book we want to classify using its ISBN. Then, we split the book's description into sentences and run the emotion classification on each one. After that, we use the previous code to calculate the maximum emotion score for each of the seven emotion categories across the entire description. Finally, we store these scores so we can use them later in the book recommendation process.

In [21]:
emotion_scores

{'anger': [np.float64(0.0641336441040039),
  np.float64(0.6126197576522827),
  np.float64(0.0641336441040039),
  np.float64(0.35148438811302185),
  np.float64(0.08141235262155533),
  np.float64(0.2322252243757248),
  np.float64(0.5381842255592346),
  np.float64(0.0641336441040039),
  np.float64(0.3006700277328491),
  np.float64(0.0641336441040039)],
 'disgust': [np.float64(0.27359163761138916),
  np.float64(0.3482847511768341),
  np.float64(0.10400667786598206),
  np.float64(0.1507224589586258),
  np.float64(0.18449543416500092),
  np.float64(0.7271744608879089),
  np.float64(0.155854731798172),
  np.float64(0.10400667786598206),
  np.float64(0.2794816195964813),
  np.float64(0.17792661488056183)],
 'fear': [np.float64(0.9281681180000305),
  np.float64(0.9425276517868042),
  np.float64(0.9723208546638489),
  np.float64(0.3607059419155121),
  np.float64(0.09504334628582001),
  np.float64(0.05136283114552498),
  np.float64(0.7474274635314941),
  np.float64(0.4044976532459259),
  np.float

In [24]:
from tqdm import tqdm

emotion_labels = ["anger", "disgust", "fear", "joy", "sadness", "surprise", "neutral"]
isbn = []
emotion_scores = {label: [] for label in emotion_labels}

for i in tqdm(range(len(books))):
    isbn.append(books["isbn13"][i])
    sentences = books["description"][i].split(".")
    predictions = classifier(sentences)
    max_scores = calculate_max_emotion_scores(predictions)
    for label in emotion_labels:
        emotion_scores[label].append(max_scores[label])

100%|██████████| 5197/5197 [19:33<00:00,  4.43it/s]


Now that we have found the emotion weights for our book, let's build an emotional profile for each book in the dataset. We will run a text classification model on each book description, repeating the process of scoring each sentence individually. Then, we will extract the highest emotion score for each of the seven emotion categories. This will give us a set of emotion scores that represent the most prominent emotional tones present in each book, allowing us to use these profiles for emotion-based recommendations.

In [None]:
# New Dataframe with emotion scores
emotions_df = pd.DataFrame(emotion_scores)
emotions_df["isbn13"] = isbn

In [30]:
emotions_df

Unnamed: 0,anger,disgust,fear,joy,sadness,surprise,neutral,isbn13
0,0.064134,0.273592,0.928168,0.932798,0.646216,0.967158,0.729602,9780002005883
1,0.612620,0.348285,0.942528,0.704422,0.887940,0.111690,0.252546,9780002261982
2,0.064134,0.104007,0.972321,0.767238,0.549477,0.111690,0.078766,9780006178736
3,0.351484,0.150722,0.360706,0.251882,0.732684,0.111690,0.078766,9780006280897
4,0.081412,0.184495,0.095043,0.040564,0.884390,0.475880,0.078766,9780006280934
...,...,...,...,...,...,...,...,...
5192,0.148208,0.030643,0.919165,0.255172,0.853721,0.980877,0.030656,9788172235222
5193,0.064134,0.114383,0.051363,0.400262,0.883198,0.111690,0.227765,9788173031014
5194,0.009997,0.009929,0.339218,0.947779,0.375754,0.066685,0.057625,9788179921623
5195,0.064134,0.104007,0.459269,0.759457,0.951104,0.368111,0.078766,9788185300535


Let's merge our new emotion scores with the existing book data and categories to finalize the book recommender. Now that we have our simple categories, emotion scores, and a vector search model database, we are ready to move forward with building the frontend and dashboard using Gradio. This will allow users to interact with the recommender system and discover books based on emotional themes and personal interests.

In [31]:
books = pd.merge(books, emotions_df, on="isbn13")

In [32]:
books

Unnamed: 0,isbn13,isbn10,title,authors,categories,thumbnail,description,published_year,average_rating,num_pages,...,title_subtitle,tagged_description,simple_categories,anger,disgust,fear,joy,sadness,surprise,neutral
0,9780002005883,0002005883,Gilead,Marilynne Robinson,Fiction,http://books.google.com/books/content?id=KQZCP...,A NOVEL THAT READERS and critics have been eag...,2004.0,3.85,247.0,...,Gilead,9780002005883 A NOVEL THAT READERS and critics...,Fiction,0.064134,0.273592,0.928168,0.932798,0.646216,0.967158,0.729602
1,9780002261982,0002261987,Spider's Web,Charles Osborne;Agatha Christie,Detective and mystery stories,http://books.google.com/books/content?id=gA5GP...,A new 'Christie for Christmas' -- a full-lengt...,2000.0,3.83,241.0,...,Spider's Web: A Novel,9780002261982 A new 'Christie for Christmas' -...,Fiction,0.612620,0.348285,0.942528,0.704422,0.887940,0.111690,0.252546
2,9780006178736,0006178731,Rage of angels,Sidney Sheldon,Fiction,http://books.google.com/books/content?id=FKo2T...,"A memorable, mesmerizing heroine Jennifer -- b...",1993.0,3.93,512.0,...,Rage of angels,"9780006178736 A memorable, mesmerizing heroine...",Fiction,0.064134,0.104007,0.972321,0.767238,0.549477,0.111690,0.078766
3,9780006280897,0006280897,The Four Loves,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=XhQ5X...,Lewis' work on the nature of love divides love...,2002.0,4.15,170.0,...,The Four Loves,9780006280897 Lewis' work on the nature of lov...,Nonfiction,0.351484,0.150722,0.360706,0.251882,0.732684,0.111690,0.078766
4,9780006280934,0006280935,The Problem of Pain,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=Kk-uV...,"""In The Problem of Pain, C.S. Lewis, one of th...",2002.0,4.09,176.0,...,The Problem of Pain,"9780006280934 ""In The Problem of Pain, C.S. Le...",Nonfiction,0.081412,0.184495,0.095043,0.040564,0.884390,0.475880,0.078766
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5192,9788172235222,8172235224,Mistaken Identity,Nayantara Sahgal,Indic fiction (English),http://books.google.com/books/content?id=q-tKP...,On A Train Journey Home To North India After L...,2003.0,2.93,324.0,...,Mistaken Identity,9788172235222 On A Train Journey Home To North...,Fiction,0.148208,0.030643,0.919165,0.255172,0.853721,0.980877,0.030656
5193,9788173031014,8173031010,Journey to the East,Hermann Hesse,Adventure stories,http://books.google.com/books/content?id=rq6JP...,This book tells the tale of a man who goes on ...,2002.0,3.70,175.0,...,Journey to the East,9788173031014 This book tells the tale of a ma...,Nonfiction,0.064134,0.114383,0.051363,0.400262,0.883198,0.111690,0.227765
5194,9788179921623,817992162X,The Monk Who Sold His Ferrari: A Fable About F...,Robin Sharma,Health & Fitness,http://books.google.com/books/content?id=c_7mf...,"Wisdom to Create a Life of Passion, Purpose, a...",2003.0,3.82,198.0,...,The Monk Who Sold His Ferrari: A Fable About F...,9788179921623 Wisdom to Create a Life of Passi...,Fiction,0.009997,0.009929,0.339218,0.947779,0.375754,0.066685,0.057625
5195,9788185300535,8185300534,I Am that,Sri Nisargadatta Maharaj;Sudhakar S. Dikshit,Philosophy,http://books.google.com/books/content?id=Fv_JP...,This collection of the timeless teachings of o...,1999.0,4.51,531.0,...,I Am that: Talks with Sri Nisargadatta Maharaj,9788185300535 This collection of the timeless ...,Nonfiction,0.064134,0.104007,0.459269,0.759457,0.951104,0.368111,0.078766


In [None]:
# Saving final DataFrame with emotions to a CSV file
books.to_csv("books_with_emotions.csv", index = False)