# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Pasquale Salomone
### Github Repo Link: (https://github.com/mrme77/json-sentiment)

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

In [1]:
# Create and activate a Python virtual environment. 
# Before starting the project, try all these imports FIRST
# Address any errors you get running this code cell 
# by installing the necessary packages into your active Python environment.
# Try to resolve issues using your materials and the web.
# If that doesn't work, ask for help in the discussion forums.
# You can't complete the exercises until you import these - start early! 
# We also import json and pickle (included in the Python Standard Library).

import json
import pickle

import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

print('All prereqs installed.')
!pip list

All prereqs installed.
Package                            Version
---------------------------------- --------------------
account                            0.1.0
aiohttp                            3.8.1
aiosignal                          1.2.0
alabaster                          0.7.12
alembic                            1.8.0
amqp                               2.6.1
anaconda-client                    1.11.2
anaconda-navigator                 2.1.4
anaconda-project                   0.11.1
anyio                              3.5.0
appdirs                            1.4.4
applaunchservices                  0.2.1
appnope                            0.1.2
appscript                          1.1.2
argh                               0.26.2
argon2-cffi                        21.3.0
argon2-cffi-bindings               21.2.0
asn1crypto                         1.5.1
astroid                            2.14.2
astropy                            5.1
asttokens                          2.0.5
async-timeou

#### Question 1 
The following code accesses the lyricsgenius API since the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api is no longer available; it searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [16]:
import configparser

# Create a ConfigParser object
config = configparser.ConfigParser()

# Read the configuration file
config.read('/Users/pasqualesalomone/Desktop/WebMining&NLP/config.cfg')

# Access the key from the configuration file
key = config.get('key','lyrics_key')

# Use the key for further processing




In [17]:
import requests, json, lyricsgenius
#result = json.loads(requests.get('https://api.lyrics.ovh/v1/They Might Be Giants/Birdhouse in your soul').text)


In [18]:

genius = lyricsgenius.Genius(key)


In [19]:
artist = genius.search_artist("They Might Be Giants",max_songs=3)
#print(artist.songs)

Searching for songs by They Might Be Giants...

Song 1: "Istanbul (Not Constantinople)"
Song 2: "Birdhouse in Your Soul"
Song 3: "Other Father Song"

Reached user-specified song limit (3).
Done. Found 3 songs.


In [20]:
artist.name

'They Might Be Giants'

In [21]:
song = artist.song("Birdhouse in Your Soul")
lyrics = song.lyrics

# Create a dictionary with the song title and lyrics
data = {
    "title": "Birdhouse in Your Soul",
    "lyrics": lyrics
}

# Save the data as a JSON file
with open("lyrics_birdhouse.json", "w") as file:
    json.dump(data, file)

In [8]:
#artist.add_song(song)
# the Artist object also accepts song names:
# artist.add_song("To You")

#### Question 2
Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [22]:
with open("lyrics_birdhouse.json", "r") as file:
    data = json.load(file)
# Remove the unwanted string from the lyrics
lyrics = data["lyrics"]
clean_lyrics = lyrics.replace("See They Might Be Giants LiveGet tickets as low as $41You might also like", "")

# Update the data with the cleaned lyrics
data["lyrics"] = clean_lyrics

# Write the updated data back to the JSON file
with open("Lyrics_TheyMightBeGiants.json", "w") as file:
    json.dump(data, file)

In [124]:
pwd

'/Users/pasqualesalomone/Desktop/WebMining&NLP/Module4/json-sentiment'

In [23]:


title = data["title"]
lyrics = data["lyrics"]

print("Title:", title)
print("Lyrics:")
print(lyrics)

Title: Birdhouse in Your Soul
Lyrics:
52 ContributorsBirdhouse in Your Soul Lyrics[Bridge]
I'm your only friend
I'm not your only friend
But I'm a little glowing friend
But really I'm not actually your friend
But I am

[Chorus]
Blue canary in the outlet by the light switch
Who watches over you
Make a little birdhouse in your soul
Not to put too fine a point on it
Say I'm the only bee in your bonnet
Make a little birdhouse in your soul

[Verse 1]
I have a secret to tell
From my electrical well
It's a simple message and I'm
Leaving out the whistles and bells
So the room must listen to me
Filibuster vigilantly
My name is blue canary
One note spelled l-i-t-e
My story's infinite
Like the Longines Symphonette
It doesn't rest
See They Might Be Giants LiveGet tickets as low as $39You might also like[Chorus]
Blue canary in the outlet by the light switch
Who watches over you
Make a little birdhouse in your soul
Not to put too fine a point on it
Say I'm the only bee in your bonnet
Make a little b

In [24]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

# Load the spaCy model
nlp = spacy.load('en_core_web_lg')

# Add the SpacyTextBlob component to the pipeline
nlp.add_pipe('spacytextblob')

# Process the lyrics

doc = nlp(lyrics)

# Access the sentiment analysis results
polarity = doc._.polarity
#subjectivity = doc._.subjectivity


print("Polarity:", polarity)
#print("Subjectivity:", subjectivity)


Polarity: 0.02575757575757576


#### In my opinion the sentiment or emotional tone of these lyrics sounds neutral which matches the polarity scored obtained with the 'en_core_web_lg' model.

#### Question 3 
Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [25]:
def find_song_lyrics(performer,track,filename):
    """
    Find and save the lyrics of a song by a specific performer and track.

    Args:
        performer (str): The name of the performer or artist.
        track (str): The title of the song or track.
        filename (str): The name of the file to save the lyrics.

    Returns:
        None

    Description:
        This function takes the name of a performer, the title of a song, and the desired filename,
        and retrieves the lyrics of the song using an external API or web scraping. The lyrics are
        then saved to a file with the specified filename.

        Example usage:
        find_song_lyrics('Metallica', 'Enter Sandman', 'lyrics.json')
    """
    import re,lyricsgenius
    genius = lyricsgenius.Genius(key)
    artist = genius.search_artist(performer,max_songs=3,get_full_info =False)
    song = artist.song(track)
    lyrics = song.lyrics
    #clean_text =re.sub(r'See.*like', '', lyrics)
    clean_text = re.sub(r'\b\d+.*?\]|See.*like', '',lyrics)

    # Create a dictionary with the song title and lyrics
    data = {
        "title": track,
        "lyrics": clean_text
        }

    # Save the data as a JSON file
    with open(filename+"_lyrics.json", "w") as filename:
        json.dump(data, filename)
    
    
    

#### Creating a dictionary with my 4 songs for testing purposes.

In [112]:
my_songs = {'performer':['the beach boys','guns and roses','eric clapton','whitney houston','nirvana'],\
            'track':['good vibrations',"don't cry",'tears in heaven','i will always love you','Polly'],\
            'filename':['the beach boys','guns and roses','eric clapton','whitney houston','nirvana']}

In [113]:
# iterating over the dictionary to execute the find_song_lyrics function
for i in range(len(my_songs['performer'])):
    performer = my_songs['performer'][i]
    track = my_songs['track'][i]
    filename = my_songs['filename'][i]
    find_song_lyrics(performer, track, filename)

Searching for songs by the beach boys...

Changing artist name to 'The Beach Boys'
Song 1: "Surfin’ U.S.A."
Song 2: "God Only Knows"
Song 3: "Wouldn’t It Be Nice"

Reached user-specified song limit (3).
Done. Found 3 songs.
Searching for "good vibrations" by The Beach Boys...
Done.
Searching for songs by guns and roses...

Changing artist name to 'Guns N’ Roses'
Song 1: "Sweet Child O’ Mine"
Song 2: "Welcome to the Jungle"
Song 3: "November Rain"

Reached user-specified song limit (3).
Done. Found 3 songs.
Searching for "don't cry" by Guns N’ Roses...
Done.
Searching for songs by eric clapton...

Changing artist name to 'Eric Clapton'
Song 1: "Tears in Heaven"
Song 2: "Cocaine"
Song 3: "Wonderful Tonight"

Reached user-specified song limit (3).
Done. Found 3 songs.
Searching for "tears in heaven" by Eric Clapton...
Done.
Searching for songs by whitney houston...

Changing artist name to 'Whitney Houston'
Song 1: "I Will Always Love You"
Song 2: "I Have Nothing"
Song 3: "Greatest Love o

#### Checking that function works as expected

In [114]:
for item in my_songs['filename']:
    with open(item+"_lyrics.json", "r") as file:
         data = json.load(file)
    title = data["title"]
    lyrics = data["lyrics"]

    print("Title:", title)
    print("Lyrics:")
    print(lyrics)

Title: good vibrations
Lyrics:

I-I love the colorful clothes she wears
And the way the sunlight plays upon her hair
I hear the sound of a gentle word
On the wind that lifts her perfume through the air

[Chorus]
I'm pickin' up good vibrations
She's giving me excitations (Oom-bop-bop)
I'm pickin' up good vibrations (Good vibrations, bop-bop)
She's giving me excitations (Excitations, bop-bop)
I'm pickin' up good vibrations (Good vibrations, bop-bop)
She's giving me excitations (Excitations, bop-bop)
I'm pickin' up good vibrations (Good vibrations, bop-bop)
She's giving me excitations (Excitations)

[Verse 
Close my eyes, she's somehow closer now
Softly smile, I know she must be kind
When I look in her eyes
She goes with me to a blossom world

[Chorus]
I'm pickin' up good vibrations
She's giving me excitations (Oom-bop-bop)
I'm pickin' up good vibrations (Good vibrations, bop-bop)
She's giving me excitations (Excitations, bop-bop)
I'm pickin' up good vibrations (Good vibrations, bop-bop)


#### Question 4 
Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [123]:
def sentiment_analysis(filename):
    """
    Perform sentiment analysis on song lyrics stored in a file and return the polarity score.

    Args:
        filename (str): The name of the file that contains the song lyrics.

    Returns:
        float: The polarity score indicating the sentiment of the lyrics. The score can range from
               -1.0 (negative sentiment) to 1.0 (positive sentiment), with 0.0 representing neutral sentiment.
        
        float: The subjectivity score is a metric used in sentiment analysis to measure the degree of subjectivity in a 
        given text. It quantifies the extent to which the text expresses personal opinions, 
        feelings, or subjective viewpoints rather than objective facts.
        
        string: The artist name.
        
        string: The song's title.

    Description:
        This function takes the name of a file that contains song lyrics and performs sentiment analysis
        on the lyrics using a natural language processing (NLP) library. It calculates the polarity score,
        which indicates the sentiment of the lyrics. A positive score suggests positive sentiment, a negative
        score suggests negative sentiment, and a score close to zero suggests neutral sentiment.

        Example usage:
        polarity_score = sentiment_analysis('metallica_lyrics.json')
    """
    from spacytextblob.spacytextblob import SpacyTextBlob

    # Load the spaCy model
    nlp = spacy.load('en_core_web_lg')

    # Add the SpacyTextBlob component to the pipeline
    nlp.add_pipe('spacytextblob')

    with open(filename, "r") as file:
        data = json.load(file)

    title = data["title"]
    lyrics = data["lyrics"]

    # Process the lyrics
    doc = nlp(lyrics)

    # Access the sentiment analysis results
    polarity = doc._.polarity
    subjectivity = doc._.subjectivity

    polarity = doc._.polarity

    artist_name = filename.split('_')[0]

    return artist_name, title, polarity,subjectivity

In [120]:
performers = my_songs['performer']
for performer in performers:
    artist_name, track_title, polarity_score, subjectivity = sentiment_analysis(performer+"_lyrics.json")
    print("Artist Name:", artist_name)
    print("Track Title:", track_title)
    print("Polarity:", polarity_score)
    #print("Subjectivity:", subjectivity )
    print("-------------------------")

Artist Name: the beach boys
Track Title: good vibrations
Polarity: 0.6424242424242421
-------------------------
Artist Name: guns and roses
Track Title: don't cry
Polarity: 0.11083333333333334
-------------------------
Artist Name: eric clapton
Track Title: tears in heaven
Polarity: 0.13730158730158729
-------------------------
Artist Name: whitney houston
Track Title: i will always love you
Polarity: 0.47777777777777786
-------------------------
Artist Name: nirvana
Track Title: Polly
Polarity: 1.586032892321652e-17
-------------------------


#### Findings

According to the website article [theMusic](https://themusic.com.au/news/scientists-reveal-the-happiest-song-of-all-time/hkqSmJuanZw/17-02-23), "good vibrations" by The Beach Boys is considered the number one happiest song; and the polarity score generated by our function closely matches that. Conversely, "I will always love you" by Whitney Houston is considered one of the saddest song according to an article from [WGNA](https://wgntv.com/news/wgn-news-now/whats-the-saddest-song-of-all-time/), yet it scores a moderate positive sentiment which doesn't match the article findings. The remaining songs, which I would argue to be sad songs, do score either neutral or slightly positive sentiment. I believe that there are may be sevearl reasons that may cause the polarity of a song to not always align perfectly with an individual's perception. Some of these include context which makes lyrics opent to interpration of the song also based on the melody, rhythm, or the artist's delivery. There is an interesting article about the impact of negative harmony on polarity [Negative Harmony: Experiments with the Polarity in Music](https://dc.etsu.edu/cgi/viewcontent.cgi?article=1614&context=honors). Another aspect to consider is that some lyrics may include methapors which sentiment analysis algorithms may struggle to capture. 

In [125]:
!jupyter nbconvert --to html requests-json-nlp.ipynb

[NbConvertApp] Converting notebook requests-json-nlp.ipynb to html
[NbConvertApp] Writing 723470 bytes to requests-json-nlp.html
