# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Jarred Gastreich
https://github.com/jarjarredred/json-sentiment

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

In [60]:
# Create and activate a Python virtual environment. 
# Before starting the project, try all these imports FIRST
# Address any errors you get running this code cell 
# by installing the necessary packages into your active Python environment.
# Try to resolve issues using your materials and the web.
# If that doesn't work, ask for help in the discussion forums.
# You can't complete the exercises until you import these - start early! 
# We also import json and pickle (included in the Python Standard Library).

import json
import pickle

import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

print('All prereqs installed.')
!pip list

All prereqs installed.
Package                   Version
------------------------- --------------
annotated-types           0.7.0
anyio                     4.9.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 3.0.0
async-lru                 2.0.5
attrs                     25.3.0
babel                     2.17.0
beautifulsoup4            4.13.4
bleach                    6.2.0
blis                      1.3.0
catalogue                 2.0.10
certifi                   2025.7.14
cffi                      1.17.1
chardet                   3.0.4
charset-normalizer        3.4.2
click                     8.2.1
cloudpathlib              0.21.1
colorama                  0.4.6
comm                      0.2.2
comtypes                  1.4.11
confection                0.1.5
contourpy                 1.3.2
cycler                    0.12.1
cymem                     2.0.11
debugpy                   1.8.14
decorator             

In [61]:
# Only need to run these once in the Jupyter notebook (to ensure packages are installed in the notebook environment)
%pip install spacy
%pip install spacytextblob

# Import the libraries for use
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

# Load spaCy and add spacytextblob pipeline 
nlp = spacy.load("en_core_web_sm") 
nlp.add_pipe("spacytextblob")

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


<spacytextblob.spacytextblob.SpacyTextBlob at 0x1d2e3854f50>

# Question 1

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [62]:
import requests
import json

AUTHOR='Edgar Allan Poe'
POEM = 'A Dream Within A Dream'
# Request lyrics from API
url = 'https://poetrydb.org/author,title/{AUTHOR};{POEM}'.format(AUTHOR=AUTHOR, POEM=POEM)

result = json.loads(requests.get(url).text)

# Save to a JSON file
with open('lyrics.json', 'w', encoding='utf-8') as f:
    json.dump(result, f, ensure_ascii=False, indent=4)

print("Lyrics saved to lyrics.json")



Lyrics saved to lyrics.json


2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [63]:
import requests
import json
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

# Extract and print the lyrics
lyrics = "\n".join(data[0]['lines'])
print("\nLyrics:")
print(lyrics)

# Perform sentiment analysis using spaCyTextBlob
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')
doc = nlp(lyrics)

# Print the polarity score
print(f"\nSentiment Polarity Score: {doc._.polarity}")

# Polarity score of 0.05... indicates a slightly positive sentiment.


Lyrics:
Take this kiss upon the brow!
And, in parting from you now,
Thus much let me avow--
You are not wrong, who deem
That my days have been a dream:
Yet if hope has flown away
In a night, or in a day,
In a vision or in none,
Is it therefore the less _gone_?
_All_ that we see or seem
Is but a dream within a dream.

I stand amid the roar
Of a surf-tormented shore,
And I hold within my hand
Grains of the golden sand--
How few! yet how they creep
Through my fingers to the deep
While I weep--while I weep!
O God! can I not grasp
Them with a tighter clasp?
O God! can I not save
_One_ from the pitiless wave?
Is _all_ that we see or seem
But a dream within a dream?

Sentiment Polarity Score: 0.055555555555555546


3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [64]:
import requests
import json

def get_and_save_poem(author, poem_title, filename):
    """
    Fetches poem lyrics from the poetrydb.org API and saves them to a specified JSON file.

    Args:
        author (str): The author of the poem.
        poem_title (str): The title of the poem.
        filename (str): The name of the file to save the lyrics to (e.g., 'my_poem.json').
    """
    
    formatted_author = author.replace(" ", "%20")
    formatted_poem_title = poem_title.replace(" ", "%20")
    url = f'https://poetrydb.org/author,title/{formatted_author};{formatted_poem_title}'

    try:
        # Make the API request
        response = requests.get(url)
        response.raise_for_status()  # Raise an HTTPError for bad responses (4xx or 5xx)
        result = json.loads(response.text)

        # Check if any poem data was returned
        if not result:
            print(f"No poem found for '{poem_title}' by {author}. Please check the author and title.")
            return

        # Save to a JSON file
        with open(filename, 'w', encoding='utf-8') as f:
            json.dump(result, f, ensure_ascii=False, indent=4)
        print(f"Lyrics for '{poem_title}' by {author} saved to {filename}")

    except requests.exceptions.RequestException as e:
        print(f"Error fetching data for '{poem_title}' by {author}: {e}")
    except json.JSONDecodeError:
        print(f"Error decoding JSON response for '{poem_title}' by {author}. The API might have returned non-JSON content.")
    except IOError as e:
        print(f"Error writing to file {filename}: {e}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")


# Poem 1: 
get_and_save_poem('Edgar Allan Poe', 'The Raven', 'the_raven.json')

# Poem 2: 
get_and_save_poem('Emily Dickinson', 'If I should cease to bring a Rose', 'dickinson_rose.json')

# Poem 3: 
get_and_save_poem('Robert Burns', '91. The Vision', 'burns_vision.json')

# Poem 4: 
get_and_save_poem('William Shakespeare', 'Sonnet 18', 'shakespeare_sonnet18.json')


Lyrics for 'The Raven' by Edgar Allan Poe saved to the_raven.json
Lyrics for 'If I should cease to bring a Rose' by Emily Dickinson saved to dickinson_rose.json
Lyrics for '91. The Vision' by Robert Burns saved to burns_vision.json
Lyrics for 'Sonnet 18' by William Shakespeare saved to shakespeare_sonnet18.json


4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [65]:
import json
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

def analyze_poem_sentiment(filename):
    """
    Loads poem lyrics from a JSON file, performs sentiment analysis using spaCyTextBlob,
    and returns the polarity score.

    Args:
        filename (str): The path to the JSON file containing poem lyrics.

    Returns:
        float: The sentiment polarity score of the poem (between -1.0 and 1.0),
               or None if an error occurs (e.g., file not found, bad JSON, or missing data).
    """
    try:
        # Open and load the JSON file
        with open(filename, 'r', encoding='utf-8') as f:
            data = json.load(f)

        
        if not data or not isinstance(data, list) or not data[0] or 'lines' not in data[0]:
            print(f"Error: '{filename}' does not contain expected poem data structure (missing 'lines' key or not a list).")
            return None

        # Extract the lyrics (lines) and join them into a single string
        lyrics = "\n".join(data[0]['lines'])

        # Load the spaCy English model and add the SpacyTextBlob pipe
        # This part requires 'en_core_web_sm' to be downloaded and spacytextblob installed.
        nlp = spacy.load('en_core_web_sm')
        nlp.add_pipe('spacytextblob')

        # Process the lyrics with the NLP pipeline
        doc = nlp(lyrics)

        # Return the polarity score
        return doc._.polarity

    except FileNotFoundError:
        print(f"Error: File not found at '{filename}'. Please ensure the file exists.")
        return None
    except json.JSONDecodeError:
        print(f"Error: Could not decode JSON from '{filename}'. Check if the file content is valid JSON.")
        return None
    except KeyError:
        print(f"Error: Missing expected 'lines' key in the JSON data from '{filename}'.")
        return None
    except OSError as e: # Catching other OS-related errors like permission issues
        print(f"OS Error while accessing '{filename}': {e}")
        return None
    except Exception as e:
        # Catch any other unexpected errors during processing
        print(f"An unexpected error occurred while processing '{filename}': {e}")
        return None

# Define the paths to the files created in the previous step
# (Assuming these files were successfully created by the previous function call)
poem_files_to_analyze = {
    'The Raven': 'the_raven.json',
    'If I should cease to bring a Rose': 'dickinson_rose.json',
    '91. The Vision': 'burns_vision.json',
    'Sonnet 18': 'shakespeare_sonnet18.json'
}

print("--- Sentiment Analysis Results ---")

# Iterate through the files and print their sentiment scores
for poem_name, filename in poem_files_to_analyze.items():
    polarity_score = analyze_poem_sentiment(filename)
    if polarity_score is not None:
        # Format the polarity score to 4 decimal places for readability
        print(f"'{poem_name}' (from {filename}): Polarity Score = {polarity_score:.4f}")
    else:
        print(f"Could not analyze sentiment for '{poem_name}' (from {filename}) due to previous errors.")

print("\n--- Analysis Complete ---")

--- Sentiment Analysis Results ---
'The Raven' (from the_raven.json): Polarity Score = 0.0385
'If I should cease to bring a Rose' (from dickinson_rose.json): Polarity Score = 0.6750
'91. The Vision' (from burns_vision.json): Polarity Score = 0.1221
'Sonnet 18' (from shakespeare_sonnet18.json): Polarity Score = 0.3318

--- Analysis Complete ---


## Polarity Analysis

The Raven = 0.0385. This very slightly positive polarity is accurate because the poem starts off that the subject is weak and weary but at the end their worries are lifted.

If I should cease to bring a rose = 0.6750 This highly positive polarity is fair, but I suggest it should be lower because the poem's words are very gloomy. It leads me to suspect that the json sentiment is performing an intelligent human-level understanding of the text, and not just analysing the words. Despite the gloomy words, the poem makes me feel that the author would never let a positive moment slip away from her, unless due to her own death.

"91. The Vision" = 0.1221 This slightly positive polarity rating is accurate because the poem is full of beautiful imagery and the author thinks positively of the vision as it leaves.

Sonnet 18 = 0.3318 This moderately positive polarity rating is accurate because the author is comparing his companion to a summer's day which has its ups and downs which brings positively brings adventure into his life.