# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Jim Crivello https://github.com/jmcriv/wmnlp-materials


Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

## Before Taking the Screenshot

In [2]:
# Create and activate a Python virtual environment. 
# Before starting the project, try all these imports FIRST
# Address any errors you get running this code cell 
# by installing the necessary packages into your active Python environment.
# Try to resolve issues using your materials and the web.
# If that doesn't work, ask for help in the discussion forums.
# You can't complete the exercises until you import these - start early! 
# We also import json and pickle (included in the Python Standard Library).

import json
import pickle

import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

print('All prereqs installed.')
!pip list

All prereqs installed.
Package                       Version
----------------------------- --------------------
absl-py                       1.4.0
alabaster                     0.7.12
anaconda-client               1.11.0
anaconda-navigator            2.3.2
anaconda-project              0.11.1
anyio                         3.5.0
appdirs                       1.4.4
argon2-cffi                   21.3.0
argon2-cffi-bindings          21.2.0
arrow                         1.2.3
asgiref                       3.5.2
astroid                       2.11.7
astropy                       5.1
astunparse                    1.6.3
atomicwrites                  1.4.0
attrs                         22.1.0
Automat                       20.2.0
autopep8                      1.6.0
Babel                         2.11.0
backcall                      0.2.0
backports.functools-lru-cache 1.6.4
backports.tempfile            1.0
backports.weakref             1.0.post1
bcrypt                        3.2.0
beautifulsoup4 

## QUESTION 1

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

#### Call to api.lyrics.ovi failed to provide information, but I have included information below. I followed fellow students recommendations via discussion board to use lyricsgenius.

#### api.lyrics.ovi

In [4]:
import requests
import json
import pickle
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

result = json.loads(requests.get('https://api.lyrics.ovh/v1/They Might Be Giants/Birdhouse in your soul').text)

print(result)



{'error': 'No lyrics found'}


In [5]:
import requests
import json
import pickle
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

result = json.loads(requests.get('https://api.lyrics.ovh/v1/They Might Be Giants/Birdhouse in your soul').text)

# result = json.loads(requests.get('https://api.lyrics.ovh/v1/Coldplay/Adventure of a Lifetime').text)

resultdict = {
    "artist": "They Might Be Giants",
    "title": "Birdhouse in Your Soul",
    "lyrics": result
}

with open('lyrics.json', 'w') as file:
    json.dump(result, file)

print(resultdict)

{'artist': 'They Might Be Giants', 'title': 'Birdhouse in Your Soul', 'lyrics': {'error': 'No lyrics found'}}


#### lyricsgenius

In [1]:
import lyricsgenius
import json
import os
from dotenv import load_dotenv

load_dotenv()

def get_genius_access_token():
    return os.getenv("GENIUS_ACCESS_TOKEN")

def get_song_lyrics(artist, song, filename):
    access_token = get_genius_access_token()
#    genius = lyricsgenius.Genius(access_token)
    genius = lyricsgenius.Genius('xxxxxxxxxxxxxxxxxxxxxxxxx')
    try:
        song = genius.search_song(song, artist)
        if song:
            lyrics = song.lyrics
            with open(filename, 'w') as file:
                json.dump(lyrics, file)
            print(f"Lyrics for '{song.title}' by {song.artist} have been saved to {filename}")
        else:
            print(f"Failed to retrieve lyrics for '{song}' by {artist}")
    except Exception as e:
        print(f"An error occurred: {str(e)}")

get_song_lyrics("They Might Be Giants", "Birdhouse in Your Soul", "birdhouse_lyrics.json")


Searching for "Birdhouse in Your Soul" by They Might Be Giants...
Done.
Lyrics for 'Birdhouse in Your Soul' by They Might Be Giants have been saved to birdhouse_lyrics.json


In [6]:
import json
import requests
import json
import pickle
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

with open('birdhouse_lyrics.json', 'r') as json_file:
    json_object = json.load(json_file)


dict = {
    "artist": "They Might Be Giants",
    "title": "Birdhouse in Your Soul",
    "lyrics": json_object
}

with open('lyrics.json', 'w') as file:
    json.dump(result, file)

print(dict)

{'artist': 'They Might Be Giants', 'title': 'Birdhouse in Your Soul', 'lyrics': "52 ContributorsBirdhouse in Your Soul Lyrics[Bridge]\nI'm your only friend\nI'm not your only friend\nBut I'm a little glowing friend\nBut really I'm not actually your friend\nBut I am\n\n[Chorus]\nBlue canary in the outlet by the light switch\nWho watches over you\nMake a little birdhouse in your soul\nNot to put too fine a point on it\nSay I'm the only bee in your bonnet\nMake a little birdhouse in your soul\n\n[Verse 1]\nI have a secret to tell\nFrom my electrical well\nIt's a simple message and I'm\nLeaving out the whistles and bells\nSo the room must listen to me\nFilibuster vigilantly\nMy name is blue canary\nOne note spelled l-i-t-e\nMy story's infinite\nLike the Longines Symphonette\nIt doesn't rest\nSee They Might Be Giants LiveGet tickets as low as $59You might also like[Chorus]\nBlue canary in the outlet by the light switch\nWho watches over you\nMake a little birdhouse in your soul\nNot to put 

## QUESTION 2

2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

#### Song Lyrics

In [29]:
import json

with open('birdhouse_lyrics.json', 'r') as json_file:
    json_object = json.load(json_file)

print(json_object)



52 ContributorsBirdhouse in Your Soul Lyrics[Bridge]
I'm your only friend
I'm not your only friend
But I'm a little glowing friend
But really I'm not actually your friend
But I am

[Chorus]
Blue canary in the outlet by the light switch
Who watches over you
Make a little birdhouse in your soul
Not to put too fine a point on it
Say I'm the only bee in your bonnet
Make a little birdhouse in your soul

[Verse 1]
I have a secret to tell
From my electrical well
It's a simple message and I'm
Leaving out the whistles and bells
So the room must listen to me
Filibuster vigilantly
My name is blue canary
One note spelled l-i-t-e
My story's infinite
Like the Longines Symphonette
It doesn't rest
See They Might Be Giants LiveGet tickets as low as $59You might also like[Chorus]
Blue canary in the outlet by the light switch
Who watches over you
Make a little birdhouse in your soul
Not to put too fine a point on it
Say I'm the only bee in your bonnet
Make a little birdhouse in your soul

[Bridge]
I'm yo

#### Polarity Score

In [25]:
import requests
import json
import pickle
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

with open('birdhouse_lyrics.json', 'r') as json_file:
    json_object = json.load(json_file)

# print(json_object)

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("spacytextblob")
doc = nlp(json_object)

polarity_score = doc._.polarity

print("The Resulting Polarity Score is:", polarity_score)

# Question for my thoughts on the polarity score and negative or positive connotations.
#
# Before I ran this code, I read the song lyrics a few times and I saw them as neutral, but leaning a bit toward positive.
# The resulting polarity score of 0.025 is not surprising to me.

The Resulting Polarity Score is: 0.02575757575757576


## QUESTION 3

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [27]:
import lyricsgenius
import json
import os
from dotenv import load_dotenv

load_dotenv()

def get_genius_access_token():
    return os.getenv("GENIUS_ACCESS_TOKEN")

def get_song_lyrics(artist, song, filename):
    access_token = get_genius_access_token()
#    genius = lyricsgenius.Genius(access_token)
    genius = lyricsgenius.Genius('xxxxxxxxxxxxxxxxxxxxxxxxx')
    try:
        song = genius.search_song(song, artist)
        if song:
            lyrics = song.lyrics
            with open(filename, 'w') as file:
                json.dump(lyrics, file)
            print(f"Lyrics for '{song.title}' by {song.artist} have been saved to {filename}")
        else:
            print(f"Failed to retrieve lyrics for '{song}' by {artist}")
    except Exception as e:
        print(f"An error occurred: {str(e)}")

get_song_lyrics("Steely Dan", "Pretzel Logic", "pretzel_logic.json")
get_song_lyrics("Led Zepplin", "Stairway to Heaven", "stairway_heaven.json")
get_song_lyrics("Eagles", "One of these Nights", "these_nights.json")
get_song_lyrics("Pink Floyd", "Money", "money.json")

Searching for "Pretzel Logic" by Steely Dan...
Done.
Lyrics for 'Pretzel Logic' by Steely Dan have been saved to pretzel_logic.json
Searching for "Stairway to Heaven" by Led Zepplin...
Done.
Lyrics for 'Stairway to Heaven' by Led Zeppelin have been saved to stairway_heaven.json
Searching for "One of these Nights" by Eagles...
Done.
Lyrics for 'One of These Nights' by Eagles have been saved to these_nights.json
Searching for "Money" by Pink Floyd...
Done.
Lyrics for 'Money' by Pink Floyd have been saved to money.json


## QUESTION 4

4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [28]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
import json

def analyze_lyrics(filename):
    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe('spacytextblob')

    with open(filename, 'r') as file:
        lyrics = json.load(file)

    doc = nlp(lyrics)
    polarity = doc._.polarity

    return polarity

files = [
    "pretzel_logic.json",
    "stairway_heaven.json",
    "these_nights.json",
    "money.json",
]

for file in files:
    polarity = analyze_lyrics(file)
    print(f"Polarity of {file}: {polarity}")
    

# Question for my thoughts on the polarity score for all four and negative or positive connotations.
#
# Before I ran this code, I thought all four would lean toward positive.
# I was surprised to see the song One of these Nights as negative, as I reread the lyrics without the music, I can see why it was.
# The difference with this and hearing a song is that the sentiment analysis is purely subjective based on words and groupings.
# As with an email, there is no tone. The reader interprets the tone of the email.
# The music that goes with the lyrics drives happy or sad feelings of a song.
# The absence of music and auditory senses impacts interpretation. 

Polarity of pretzel_logic.json: 0.17541666666666664
Polarity of stairway_heaven.json: 0.09801925505050504
Polarity of these_nights.json: -0.07384146341463417
Polarity of money.json: 0.17755920550038198
