# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Kellie Leopold [GitHub Link](https://github.com/kjleopold/json-sentiment)

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

In [20]:
# Create and activate a Python virtual environment. 
# Before starting the project, try all these imports FIRST
# Address any errors you get running this code cell 
# by installing the necessary packages into your active Python environment.
# Try to resolve issues using your materials and the web.
# If that doesn't work, ask for help in the discussion forums.
# You can't complete the exercises until you import these - start early! 
# We also import json and pickle (included in the Python Standard Library).

import json
import pickle

import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

print('All prereqs installed.')
!pip list

All prereqs installed.
Package                   Version
------------------------- --------------
annotated-types           0.7.0
anyio                     4.9.0
argon2-cffi               25.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 3.0.0
async-lru                 2.0.5
attrs                     25.3.0
babel                     2.17.0
beautifulsoup4            4.13.4
bleach                    6.2.0
blis                      0.7.11
catalogue                 2.0.10
certifi                   2025.7.14
cffi                      1.17.1
charset-normalizer        3.4.2
click                     8.2.1
cloudpathlib              0.16.0
colorama                  0.4.6
comm                      0.2.2
confection                0.1.5
contourpy                 1.3.2
cycler                    0.12.1
cymem                     2.0.11
debugpy                   1.8.15
decorator                 5.2.1
defusedxml                0.7.1
en-core-web-sm        

#### Question 1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [None]:
import requests
import json

# Request lyrics from API
url = 'https://api.lyrics.ovh/v1/They Might Be Giants/Birdhouse in your soul'
result = json.loads(requests.get(url).text)

# Save to a JSON file
with open('lyrics.json', 'w', encoding='utf-8') as f:
    json.dump(result, f, ensure_ascii=False, indent=4)

print("Lyrics saved to lyrics.json")


Lyrics saved to lyrics.json


#### Question 2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [22]:
import json
import spacy
from spacy.tokens import Doc
from textblob import TextBlob

# Register extensions
if not Doc.has_extension('blob'):
    Doc.set_extension('blob', getter=lambda doc: TextBlob(doc.text))
if not Doc.has_extension('polarity'):
    Doc.set_extension('polarity', getter=lambda doc: doc._.blob.sentiment.polarity)

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Load JSON file (replace 'lyrics.json' with your file path)
with open('lyrics.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Extract lyrics from JSON (adjust key if your JSON structure is different)
lyrics = data.get('lyrics', '')

# Process lyrics with spaCy
doc = nlp(lyrics)

# Print sentiment polarity
print("Polarity:", doc._.polarity)

# Print the lyrics
print("Lyrics:\n", lyrics)

Polarity: 0.04505208333333333
Lyrics:
 I'm your only friend 
I'm not your only friend 
But I'm a little glowing friend 
But really I'm not actually your friend 
But I am 


Blue canary in the outlet by the light switch 

Who watches over you 

Make a little birdhouse in your soul 

Not to put too fine a point on it 

Say I'm the only bee in your bonnet 

Make a little birdhouse in your soul 



I have a secret to tell 

From my electrical well 

It's a simple message and I'm leaving out the whistles and bells 

So the room must listen to me 

Filibuster vigilantly 

My name is blue canary one note* spelled l-i-t-e 

My story's infinite 

Like the Longines Symphonette it doesn't rest 



Blue canary in the outlet by the light switch 

Who watches over you 

Make a little birdhouse in your soul 

Not to put too fine a point on it 

Say I'm the only bee in your bonnet 

Make a little birdhouse in your soul 



I'm your only friend 

I'm not your only friend 

But I'm a little glowing frie

#### Question 3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [23]:
import requests

def fetch_and_save_lyrics(artist, song, filename):
    """
    Fetch lyrics from lyrics.ovh API and write to a file.
    
    Args:
        artist (str): Artist name
        song (str): Song title
        filename (str): Path to the file where lyrics will be saved
    """
    url = f"https://api.lyrics.ovh/v1/{artist}/{song}"
    response = requests.get(url)
    
    if response.status_code == 200:
        data = response.json()
        lyrics = data.get('lyrics', '')
        if lyrics:
            with open(filename, 'w', encoding='utf-8') as f:
                f.write(lyrics)
            print(f"Lyrics for '{song}' by {artist} saved to '{filename}'")
        else:
            print(f"No lyrics found for '{song}' by {artist}")
    else:
        print(f"Failed to fetch lyrics for '{song}' by {artist}. Status code: {response.status_code}")

# Test the function with four songs and filenames
songs_to_test = [
    ("The Eagles", "Hotel California", "hotel_california.json"),
    ("The Runaways", "Cherry Bomb", "cherry_bomb.json"),
    ("Fleetwood Mac", "Rhiannon", "rhiannon.json"),
    ("Pink Floyd", "Wish You Were Here", "wish_you_were_here.json")
]

for artist, song, filename in songs_to_test:
    fetch_and_save_lyrics(artist, song, filename)

Failed to fetch lyrics for 'Hotel California' by The Eagles. Status code: 504
Lyrics for 'Cherry Bomb' by The Runaways saved to 'cherry_bomb.json'
Failed to fetch lyrics for 'Rhiannon' by Fleetwood Mac. Status code: 504
Failed to fetch lyrics for 'Wish You Were Here' by Pink Floyd. Status code: 504


#### Question 4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [24]:
from textblob import TextBlob

def analyze_lyrics_sentiment(filename):
    """
    Reads lyrics from a file and returns the polarity score using TextBlob.

    Args:
        filename (str): Path to the lyrics file

    Returns:
        float: Polarity score ranging from -1 (negative) to 1 (positive)
    """
    try:
        with open(filename, 'r', encoding='utf-8') as f:
            lyrics = f.read()
            blob = TextBlob(lyrics)
            return blob.sentiment.polarity
    except FileNotFoundError:
        print(f"File '{filename}' not found.")
        return None

# Songs and files (replace with your own if different)
songs_files = [
    ("Fleetwood Mac - Rhiannon", "rhiannon.json"),
    ("Pink Floyd - Wish You Were Here", "wish_you_were_here.json"),
    ("The Runaways - Cherry Bomb", "cherry_bomb.json")
]

# Analyze and print results
for song_title, file in songs_files:
    polarity = analyze_lyrics_sentiment(file)
    if polarity is not None:
        print(f"Polarity for '{song_title}': {polarity}")

Polarity for 'Fleetwood Mac - Rhiannon': 0.558974358974359
Polarity for 'Pink Floyd - Wish You Were Here': -0.004166666666666666
Polarity for 'The Runaways - Cherry Bomb': -0.012180687852792627


The reported polarity for the three songs I chose do match my understanding of each song's lyrics. 
* Fleetwood Mac - Rhiannon: The lyrics could be considered dreamy and poetic, which align with a high positive polarity. 
* Pink Floyd - Wish You Were Here: The lyrics are melancholic and reflective, but not angry or sad. This fits the neutral polarity score.
* The Runaways - Cherry Bomb: The edgy, rebellious lyrics result in a slightly negative polarity, which is to be expected.

However, while the scores reflect the general tone, sentiment analysis tools have limitations. They don’t fully consider context, musical tone, cultural meaning, or sarcasm. A song may sound sad or aggressive due to its performance rather than word choice, which sentiment tools can't interpret well. So while the polarity values are helpful, they don’t always align perfectly with how a human listener might interpret a song’s emotion.