# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Jarrod Sims

### GitHub Repo: https://github.com/simsjarrod/44630_mod4

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

In [120]:
# Create and activate a Python virtual environment. 
# Before starting the project, try all these imports FIRST
# Address any errors you get running this code cell 
# by installing the necessary packages into your active Python environment.
# Try to resolve issues using your materials and the web.
# If that doesn't work, ask for help in the discussion forums.
# You can't complete the exercises until you import these - start early! 
# We also import json and pickle (included in the Python Standard Library).

import json
import pickle

import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

print('All prereqs installed.')
!pip list

All prereqs installed.
Package                       Version
----------------------------- ---------------
alabaster                     0.7.12
anaconda-client               1.11.2
anaconda-navigator            2.4.0
anaconda-project              0.11.1
anyio                         3.5.0
appdirs                       1.4.4
argon2-cffi                   21.3.0
argon2-cffi-bindings          21.2.0
arrow                         1.2.3
astroid                       2.14.2
astropy                       5.1
asttokens                     2.0.5
atomicwrites                  1.4.0
attrs                         22.1.0
Automat                       20.2.0
autopep8                      1.6.0
Babel                         2.11.0
backcall                      0.2.0
backports.functools-lru-cache 1.6.4
backports.tempfile            1.0
backports.weakref             1.0.post1
bcrypt                        3.2.0
beautifulsoup4                4.11.1
binaryornot                   0.4.4
black              

### Question 1

The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [79]:
import requests
import json
import lyricsgenius

#access lyricsgenius key
with open("genius_config.txt") as f:
    key = f.read() 
genius = lyricsgenius.Genius(key)

#artist = genius.search_artist("Radiohead", max_songs=3, sort="title")
#print(artist.songs)

song = artist.song('Fake Plastic Trees')
lyrics = song.lyrics

song_dict = {
    'artist': 'Radiohead',
    'title': 'Fake Plastic Trees',
    'lyrics': lyrics
}

with open('Fake_Plastic_Trees.json', 'w') as new_file:
    json.dump(song_dict, new_file)


Searching for "Fake Plastic Trees" by Radiohead...
Done.


### Question 2
Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [81]:
#print lyrics
with open('Fake_Plastic_Trees.json') as file:
    data = json.load(file)
lyrics = data['lyrics']
print(lyrics)

Fake Plastic Trees Lyrics
[Verse 1]
A green plastic watering can
For a fake Chinese rubber plant
In a fake plastic earth
That she bought from a rubber man
In a town full of rubber plants
To get rid of itself

[Chorus]
It wears her out
It wears her out
It wears her out
It wears her out

[Verse 2]
She lives with a broken man
A cracked polystyrene man
Who just crumbles and burns
He used to do surgery
For girls in the eighties
But gravity always wins

[Chorus]
And it wears him out
It wears him out
It wears him out
It wears
You might also like[Verse 3]
She looks like the real thing
She tastes like the real thing
My fake plastic love
But I can't help the feeling
I could blow through the ceiling
If I just turn and run

[Chorus]
And it wears me out
It wears me out
It wears me out
It wears me out

[Outro]
And if I could be who you wanted
If I could be who you wanted
All the time
All the time86Embed


In [102]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe('spacytextblob')
doc = nlp(lyrics)
polarity = doc._.polarity
print("Polarity score:", polarity)

#Polarity score: -0.08750000000000001
#Becuase the polarity score is very close to zero, I would classify this as neutral.

Polarity score: -0.08750000000000001


### Question 3
Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [115]:
import lyricsgenius
import json

def get_lyrics(artist, song, filename):
    with open("genius_config.txt") as f:
        key = f.read() 
    genius = lyricsgenius.Genius(key)
    artist = genius.search_artist(artist, max_songs=1, get_full_info=False)
    song = artist.song(song)
    with open(filename+'.json', 'w') as filename:
        json.dump(song.lyrics, filename)

get_lyrics("The Clash", "Know Your Rights", "theclash_knowyourrights")
get_lyrics("Talking Heads", "This Must Be The Place", "talkingheads_thismustbetheplace")
get_lyrics("Joy Division", "Love Will Tear Us Apart", "joydivision_lovewilltearusapart")
get_lyrics("David Bowie", "Lets Dance", "davidbowie_letsdance")

Searching for songs by The Clash...

Song 1: "Should I Stay or Should I Go"

Reached user-specified song limit (1).
Done. Found 1 songs.
Searching for "Know Your Rights" by The Clash...
Done.
Searching for songs by Talking Heads...

Song 1: "Once in a Lifetime"

Reached user-specified song limit (1).
Done. Found 1 songs.
Searching for "This Must Be The Place" by Talking Heads...
Done.
Searching for songs by Joy Division...

Song 1: "Love Will Tear Us Apart"

Reached user-specified song limit (1).
Done. Found 1 songs.
Searching for songs by David Bowie...

Song 1: "Space Oddity"

Reached user-specified song limit (1).
Done. Found 1 songs.
Searching for "Lets Dance" by David Bowie...
Done.


### Question 4
Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [119]:
import json 
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
def lyric_sentiment(filename):
    nlp = spacy.load("en_core_web_sm")
    nlp.add_pipe('spacytextblob')
    with open(filename, 'r') as f:
        lyrics = json.load(f)
    doc = nlp(lyrics)
    polarity = doc._.polarity
    print(f'{filename} Polarity = {polarity}')


lyric_sentiment('theclash_knowyourrights.json')
lyric_sentiment('talkingheads_thismustbetheplace.json')
lyric_sentiment('joydivision_lovewilltearusapart.json')
lyric_sentiment('davidbowie_letsdance.json')

theclash_knowyourrights.json Polarity = 0.07747252747252747
talkingheads_thismustbetheplace.json Polarity = 0.11519607843137256
joydivision_lovewilltearusapart.json Polarity = 0.28606666666666664
davidbowie_letsdance.json Polarity = 0.03333333333333333


1. "The Clash: Know Your Rights Polarity" = 0.07747252747252747
    * This makes sense that this score is mostly neutral because the clash is listing basic human rights in a satirical way such as the right to not be killed and the right to money for food. Since the polarity score is just assessing the text itself I assume it isn't going to be able to take into account something like sarcasm.
2. "Talking Heads: This Must Be the Place Polarity" = 0.11519607843137256
    * I am kind of surprised that this score isn't more positive considering this is essentially a love song about a person making you feel like you have found your home.
3. "Joy Divsion: Love Will Tear Us Apart" Polarity = 0.28606666666666664
    * I am very surprised that this song had the highest polarity score of the four I chose because it is definitely the darkest. The song is about a dying relationship. Possibly because the song uses the word 'love' extremely frequently it raised its polarity.
4. "David Bowie: Let's Dance" Polarity = 0.03333333333333333
    * This song is basically about dancing with someone you love which is pretty positive. However it mentions things like "my love for you would break my heart in two" and "put on your red shoes and dance the blues" which might lower its polarity score.