# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Erin Swan-Siegel https://github.com/progswan2022/44-620_Module04 

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

# Question 1
The following code accesses the [poetrydb](https://poetrydb.org/) public api, searches for the lines of a poem, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [1]:
import requests
import json

In [63]:
# User's Input options
AUTHOR = 'Shakespear'
TITLE = 'Spring and Winter ii'

# Loading the contents of the selected author and title into memory
URL = f'https://poetrydb.org/author,title/{AUTHOR};{TITLE}'
result = json.loads(requests.get(URL).text)

# Prepare the data to be written to a json file; Serializing json
json_object = json.dumps(result, indent=4)
 
# Opening a .json file, writing to Shakespear_Winter.json
with open("Shakespear_Winter_ii.json", "w") as outfile:
    outfile.write(json_object)
    
outfile.close()

# Question 2
Read in the contents of your file.  Print the lines of the poem (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lines.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [67]:
# Opening JSON file
with open('Shakespear_Winter_ii.json', 'r') as openfile:
 
    # Reading from json file
    json_object = json.load(openfile)
 
print(json_object[0]["lines"])
openfile.close()

['WHEN icicles hang by the wall,', '   And Dick the shepherd blows his nail,', 'And Tom bears logs into the hall,', '   And milk comes frozen home in pail,', "When blood is nipp'd, and ways be foul,", 'Then nightly sings the staring owl,', '   To-whit!', 'To-who!--a merry note,', 'While greasy Joan doth keel the pot.', '', 'When all aloud the wind doe blow,', "   And coughing drowns the parson's saw,", 'And birds sit brooding in the snow,', "   And Marian's nose looks red and raw,", 'When roasted crabs hiss in the bowl,', 'Then nightly sings the staring owl,', '   To-whit!', 'To-who!--a merry note,', 'While greasy Joan doth keel the pot.']


In [68]:
# Use spaCyTextBlob for Sentiment Analysis on the poem's lines
# Print the Polarity Score
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')
text = ' '.join(json_object[0]['lines'])
doc = nlp(text)
print(doc._.polarity)         

# Based on the polarity of -0.144, the poem has a slightly negative connotation

-0.14423076923076925


# Question 3
Write a function that takes an author, title, and filename, accesses the poetrydb api to get the poem lines, and writes the results to the specified filename.  Test this function by getting the lines to any four poems of your choice and storing them in different files.

In [72]:
# Function that takes author and title, and builds a filename from the two inputs
def Sentiment_Analysis_Poems(author,title):
    # Loading the contents of the selected author and title into memory
    URL = f'https://poetrydb.org/author,title/{author};{title}'
    result = json.loads(requests.get(URL).text)

    # Prepare the data to be written to a json file; Serializing json
    json_object = json.dumps(result, indent=4)
    
    # Prepare the filename
    filename = author.replace(' ','_')+'_'+title.replace(' ','_')+'.json'
 
    # Opening a .json file, writing to author_title.json
    with open(filename, "w") as outfile:
        outfile.write(json_object)
    outfile.close()

author1 = 'Shakespear'
title1 = 'Under the Greenwood Tree'
# filename will be created as a part of the function in order to create consistency

author2 = 'Charlotte Bronte'
title2 = 'Life'

author3 = 'Edgar Allan Poe'
title3 = 'The Raven'

author4 = 'Edgar Allan Poe'
title4 = 'Silence'

Sentiment_Analysis_Poems(author1, title1)
Sentiment_Analysis_Poems(author2, title2)
Sentiment_Analysis_Poems(author3, title3)
Sentiment_Analysis_Poems(author4, title4)

# Question 4
Write a function that takes the name of a file that contains poem, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the poem) of the four files you created in question 3.  Does the reported polarity match your understanding of the poem? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [73]:
def poem_polarity(filename,poem,author):
    with open(filename+'.json', 'r') as openfile:
 
        # Reading from json file
        json_object = json.load(openfile)
    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe('spacytextblob')
    text = ' '.join(json_object[0]['lines'])
    doc = nlp(text)
    print('Polarity of the poem "'+poem+'" by author '+author+': '+str(doc._.polarity))
    openfile.close()

poem_polarity('Charlotte_Bronte_Life','Life','Charlotte Bronte')
poem_polarity('Shakespear_Under_the_Greenwood_Tree','Under The Greenwood Tree','Shakespear')
poem_polarity('Edgar_Allan_Poe_Silence','Silence','Edgar Allan Poe')
poem_polarity('Edgar_Allan_Poe_The_Raven','The Raven','Edgar Allan Poe')

Polarity of the poem "Life" by author Charlotte Bronte: 0.3731060606060606
Polarity of the poem "Under The Greenwood Tree" by author Shakespear: 0.0715909090909091
Polarity of the poem "Silence" by author Edgar Allan Poe: -0.14386363636363636
Polarity of the poem "The Raven" by author Edgar Allan Poe: 0.03847439660795825


### Question 4b
* For Charlotte Bronte's "Life", I agree with the polarity score. While the poem does speak of the laments of life, there is an over-arching positive tone; taking the good with the bad, but ultimately exclaiming that courage can overcome the despair. The Sentiment analysis agrees closely with me due to positive words such as 'victoriously' and 'gloriously'
* For Shakespear's "Under The Greenwood Tree", I slightly disagree with the Sentiment analysis as I believe this poem to be more positive. I believe the issue lies partially in the Olde English prose; written in meters which causes certain words to be used when they normally wouldn't and the shortening of words through apostrophes, which I imagine is confusing if the NPL hasn't been trained on Shakespear
* For Edgar Allan Poe's "Silence", I don't necessarily agree with the polarity score but I am also having a difficult time discerning the tone of the poem myself. My guess is that the more negative words than positive are what caused the score
* For Edgar Allan Poe's "The Raven", I don't necessarily gree with the positive polarity score. The author is in mourning and Ravens are often regarded as symbols for death. I believe the sentiment is more mournful, which could be thought of as negative.