# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Mahitha

[GitHub Repo](https://github.com/mkunta1/json-sentiment)

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

In [22]:
!pip install spacy
# Install requests
!pip install requests
# Install TextBlob (alternative to spacytextblob)
!pip install textblob
# Install spacytextblob if you still want to attempt it
#!pip install spacy-textblob



In [23]:
# Create and activate a Python virtual environment. 
# Before starting the project, try all these imports FIRST
# Address any errors you get running this code cell 
# by installing the necessary packages into your active Python environment.
# Try to resolve issues using your materials and the web.
# If that doesn't work, ask for help in the discussion forums.
# You can't complete the exercises until you import these - start early! 
# We also import json and pickle (included in the Python Standard Library).


import json
import pickle

import requests
import spacy

print('All prereqs installed.')
!pip list

All prereqs installed.
Package                 Version
----------------------- -----------
annotated-types         0.7.0
asttokens               3.0.0
beautifulsoup4          4.13.4
blis                    1.3.0
catalogue               2.0.10
certifi                 2025.7.14
charset-normalizer      3.4.2
click                   8.2.1
cloudpathlib            0.21.1
colorama                0.4.6
comm                    0.2.2
confection              0.1.5
cymem                   2.0.11
debugpy                 1.8.15
decorator               5.2.1
en_core_web_sm          3.8.0
executing               2.2.0
idna                    3.10
ipykernel               6.29.5
ipython                 9.4.0
ipython_pygments_lexers 1.1.1
jedi                    0.19.2
Jinja2                  3.1.6
joblib                  1.5.1
jupyter_client          8.6.3
jupyter_core            5.8.1
langcodes               3.5.0
language_data           1.3.0
lyricsgenius            3.6.4
marisa-trie             1.2.1

In [24]:
!pip install spacytextblob



In [25]:
# Version check to confirm spacy and spacyblob are working properly

import spacy
import spacytextblob

print(spacy.__version__)
print(spacytextblob.__version__)

3.8.7
5.0.0


1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [38]:
# import dependencies
import lyricsgenius
# This script uses the lyricsgenius library to fetch and print the lyrics of a song by They Might Be Giants.
# Make sure to install the lyricsgenius library first using pip
genius = lyricsgenius.Genius('e-lDH-OS8NPZuQ0nx-Y4EvU31u2VYfcahFIRPOKtW5aVTppkmn321dOljj1l_RTu')
song = genius.search_song("Birdhouse in your soul", "They Might Be Giants")
print(song.lyrics)
# Save the lyrics to a file
song.save_lyrics()

Searching for "Birdhouse in your soul" by They Might Be Giants...
Done.
64 ContributorsBirdhouse in Your Soul LyricsLike Many TMBG songs the direct meaning can be debated. This one seems to be one of the more simple cases however. This is a song about a night-light, from the night-light’s point of view.[Bridge]
I'm your only friend
I'm not your only friend
But I'm a little glowing friend
But really I'm not actually your friend
But I am

[Chorus]
Blue canary in the outlet by the light switch
Who watches over you
Make a little birdhouse in your soul
Not to put too fine a point on it
Say I'm the only bee in your bonnet
Make a little birdhouse in your soul

[Verse 1]
I have a secret to tell
From my electrical well
It's a simple message and I'm
Leaving out the whistles and bells
So the room must listen to me
Filibuster vigilantly
My name is blue canary
One note spelled l-i-t-e
My story's infinite
Like the Longines Symphonette
It doesn't rest


[Chorus]
Blue canary in the outlet by the light

2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [39]:
# import dependencies
# Make sure to install all required packages before running this script
import spacy
import json
from spacytextblob.spacytextblob import SpacyTextBlob

# Open the JSON file and read the lyrics
with open("lyrics_theymightbegiants_birdhouseinyoursoul.json","r") as result_file:
    contents=json.load(result_file)
lyrics=contents["lyrics"]
# Print the lyrics from the JSON file
print(lyrics)

# Load the Spacy model and add the SpacyTextBlob component to perform sentiment analysis
nlp=spacy.load("en_core_web_sm")
nlp.add_pipe("spacytextblob")
analysis=nlp(lyrics)
polarity_score = analysis._.blob.polarity
print(f"Polarity is {polarity_score}")

64 ContributorsBirdhouse in Your Soul LyricsLike Many TMBG songs the direct meaning can be debated. This one seems to be one of the more simple cases however. This is a song about a night-light, from the night-light’s point of view.[Bridge]
I'm your only friend
I'm not your only friend
But I'm a little glowing friend
But really I'm not actually your friend
But I am

[Chorus]
Blue canary in the outlet by the light switch
Who watches over you
Make a little birdhouse in your soul
Not to put too fine a point on it
Say I'm the only bee in your bonnet
Make a little birdhouse in your soul

[Verse 1]
I have a secret to tell
From my electrical well
It's a simple message and I'm
Leaving out the whistles and bells
So the room must listen to me
Filibuster vigilantly
My name is blue canary
One note spelled l-i-t-e
My story's infinite
Like the Longines Symphonette
It doesn't rest


[Chorus]
Blue canary in the outlet by the light switch
Who watches over you
Make a little birdhouse in your soul
Not to

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [40]:
import lyricsgenius
import json

# Initialize Genius API client
genius = lyricsgenius.Genius("LSJHBo6rF-Ojfvuej4F7TE5QhnaJBNtK4L-tA5hEOF-TtKojBv0y2tmJPkOmKzGd")

# Function to fetch song lyrics and write to JSON
def save_lyrics_to_file(artist, song, filename):
    try:
        # Fetch the song details from Genius API
        song_obj = genius.search_song(song, artist)
        
        if song_obj:
            song_data = {
                "artist": artist,
                "song": song,
                "lyrics": song_obj.lyrics
            }
            
            # Write the song data to a JSON file
            with open(filename, 'w', encoding='utf-8') as json_file:
                json.dump(song_data, json_file, ensure_ascii=False, indent=4)
            
            print(f"Song '{song}' by {artist} saved to {filename}.")
        else:
            print(f"Song '{song}' by {artist} not found.")
    except Exception as e:
        print(f"An error occurred: {e}")

# Test the function with four different songs and save the data in JSON format
save_lyrics_to_file("Arijit Singh", "Tum Hi Ho", "arijit_singh_tum_hi_ho_lyrics.json")
save_lyrics_to_file("Lata Mangeshkar", "Tujhe Dekha To", "lata_mangeshkar_tujhe_dekha_to_lyrics.json")
save_lyrics_to_file("A. R. Rahman", "Jai Ho", "ar_rahman_jai_ho_lyrics.json")
save_lyrics_to_file("Kishore Kumar", "Pal Pal Dil Ke Paas", "kishore_kumar_pal_pal_dil_ke_paas_lyrics.json")


Searching for "Tum Hi Ho" by Arijit Singh...
Done.
Song 'Tum Hi Ho' by Arijit Singh saved to arijit_singh_tum_hi_ho_lyrics.json.
Searching for "Tujhe Dekha To" by Lata Mangeshkar...
Done.
Song 'Tujhe Dekha To' by Lata Mangeshkar saved to lata_mangeshkar_tujhe_dekha_to_lyrics.json.
Searching for "Jai Ho" by A. R. Rahman...
Done.
Song 'Jai Ho' by A. R. Rahman saved to ar_rahman_jai_ho_lyrics.json.
Searching for "Pal Pal Dil Ke Paas" by Kishore Kumar...
Done.
Song 'Pal Pal Dil Ke Paas' by Kishore Kumar saved to kishore_kumar_pal_pal_dil_ke_paas_lyrics.json.


4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [41]:
from textblob import TextBlob
import json

# Function to analyze sentiment of lyrics from a JSON file
def analyze_lyrics_sentiment_from_json(filename, song_name):
    try:
        # Load the lyrics from the JSON file
        with open(filename, 'r', encoding='utf-8') as file:
            song_data = json.load(file)
        
        # Get the lyrics from the song data
        lyrics = song_data['lyrics']
        
        # Perform sentiment analysis using TextBlob
        blob = TextBlob(lyrics)
        polarity = blob.sentiment.polarity
        
        # Print the polarity score and song name
        print(f"Polarity score for '{song_name}': {polarity}")
        return polarity
    
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

# List of song files from question 3
songs_files = [
    ("arijit_singh_tum_hi_ho_lyrics.json", "Tum Hi Ho - Arijit Singh"),
    ("lata_mangeshkar_tujhe_dekha_to_lyrics.json", "Tujhe Dekha To - Lata Mangeshkar"),
    ("ar_rahman_jai_ho_lyrics.json", "Jai Ho - A. R. Rahman"),
    ("kishore_kumar_pal_pal_dil_ke_paas_lyrics.json", "Pal Pal Dil Ke Paas - Kishore Kumar")
]

# Analyze the sentiment for each song and print polarity score
for filename, song_name in songs_files:
    analyze_lyrics_sentiment_from_json(filename, song_name)


Polarity score for 'Tum Hi Ho - Arijit Singh': 0.16666666666666666
Polarity score for 'Tujhe Dekha To - Lata Mangeshkar': 0.0
Polarity score for 'Jai Ho - A. R. Rahman': 0.08849206349206351
Polarity score for 'Pal Pal Dil Ke Paas - Kishore Kumar': 0.0
