#Lyrics Extraction
This notebook continues the processing pipeline. It applies offline scraping on each available lyric page from [Lyrics-Raw-Harvesting Notebook](Lyrics-Raw-Harvesting.ipynb). It builds a cache (i.e. persistence) of extracted song lyrics in order to pick up on processing as needed without duplicating the need to re-process valid results.
* [lw-extracted-lyrics](../../data/harvested/lw-extracted-lyrics) is the directory for extracted song lyrics
* [lw-extracted-lyrics-error](../../data/harvested/lw-extracted-lyrics-error) is the directory for lyrics unable to be processed, requiring some sort of manual intervention. After being corrected, additional processing units should remain unaware of manual intervention.

Lyrics extraction is done in an offline manner. It works against all available raw songs from lyrics.wikia within [lw-raw-lyrics](../../data/harvested/lw-raw-lyrics).

In [1]:
%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set_style("whitegrid")
sns.set_context("poster")

In [34]:
## MLJ: Additional Extras
from bs4 import BeautifulSoup
import io
import os
import requests
import time
import itertools
import json
import pickle

##Load Lyrics Dataframe

In [3]:
# load the latest master lyricsdf
lyricsdf = pd.read_csv("../../data/conditioned/master-lyricsdf.csv")  

In [4]:
lyricsdf.head()

Unnamed: 0,position,year,title.href,title,artist,lyrics,decade,song_key,lyrics_url,lyrics_abstract
0,1,1970,https://en.wikipedia.org/wiki/Bridge_over_Trou...,Bridge over Troubled Water,Simon and Garfunkel,When you're weary feeling small When tears are...,1970,1970-1,http://lyrics.wikia.com/Simon_And_Garfunkel:Br...,When you're weary\nFeeling small\nWhen tears a...
1,2,1970,https://en.wikipedia.org/wiki/(They_Long_to_Be...,(They Long to Be) Close to You,The Carpenters,x,1970,1970-2,http://lyrics.wikia.com/Carpenters:%28They_Lon...,Why do birds suddenly appear\nEverytime you ar...
2,3,1970,https://en.wikipedia.org/wiki/American_Woman_(...,American Woman,The Guess Who,"American woman, stay away from me American wom...",1970,1970-3,http://lyrics.wikia.com/The_Guess_Who:American...,"Mmm, da da da\nMmm, mmm, da da da\nMmm, mmm, d..."
3,4,1970,https://en.wikipedia.org/wiki/Raindrops_Keep_F...,Raindrops Keep Fallin' on My Head,B.J. Thomas,Raindrops keep falling on my head Just like th...,1970,1970-4,http://lyrics.wikia.com/B.J._Thomas:Raindrops_...,Raindrops are falling on my head\nAnd just lik...
4,5,1970,https://en.wikipedia.org/wiki/War_(Edwin_Starr...,War,Edwin Starr,"War huh Yeah! Absolutely uh-huh, uh-huh huh Ye...",1970,1970-5,http://lyrics.wikia.com/Edwin_Starr:War,"War, huh, yeah\nWhat is it good for?\nAbsolute..."


##Setup Directories and File Utility Methods

In [13]:
lw_success_dir = "../../data/harvested/lw-extracted-lyrics/"
lw_issues_dir = "../../data/harvested/lw-extracted-lyrics-error/"
lw_raw_dir = "../../data/harvested/lw-raw-lyrics/" #must look for raw here

In [6]:
# adapted from https://justgagan.wordpress.com/2010/09/22/python-create-path-or-directories-if-not-exist/
def assureDirExists(path):
    d = os.path.dirname(path)
    if not os.path.exists(d):
        os.makedirs(d)

In [14]:
# make sure the directories exist
assureDirExists(lw_success_dir)
assureDirExists(lw_issues_dir)
assureDirExists(lw_raw_dir)

In [8]:
# adapted from http://stackoverflow.com/questions/82831/check-whether-a-file-exists-using-python
def isNonZeroFile(fpath):  
    return True if os.path.isfile(fpath) and os.path.getsize(fpath) > 0 else False

In [16]:
# consolidated helper method for paths to be used in writing success and issues as needed.
# note *_hpath used here vs *_jpath in previous notebook.
def buildPathsDictFor(song_key):
    """
    return a dictionary of paths for the `song_key`
    """
    success_path = "{}{}.txt".format(lw_success_dir,song_key) #normal results    
    issues_path = "{}{}.txt".format(lw_issues_dir,song_key) #text with issue message
    raw_path = "{}{}.html".format(lw_raw_dir,song_key) #raw html from lyrics.wikia
    return {'song_key':song_key, 'success_path':success_path, 'issues_path':issues_path, 'raw_path':raw_path}

In [64]:
# consolidated helper method for result writes
def writeStrToFile(s, pathsd, pathsd_key):
    """
    write the given str to the given path.
    
     --- Input ---
    str: String to write
    pathsd: Dictionary holding paths
    pathsd_key: String key to use in pathsd when writing
    
    --- Return ---
    pathsd[pathsd_key]
    """
    path = pathsd[pathsd_key]
    
    with io.open(path,'w',encoding='utf-8') as text_file:
        if isinstance(s, str):
            text_file.write(unicode(s,encoding='utf-8'))
        elif isinstance(s, unicode):        
            text_file.write(s)
    
    return path

##Common Utilities

In [362]:
# common cleaning for lyrics, good for both full and abstracts.
def normalizeLyrics(string):
    # catch weird trailing lines
    lyrics = string.replace("\n.","") 
        
    # catch double periods with spaces.
    lyrics = lyrics.replace(". . ",". ").replace("?. ","? ").replace("!. ","! ").replace(",. ",", ")
        
    # catch double spaces and hanging periods
    lyrics = lyrics.replace("  "," ").replace(" .",".")
    
    # convert newline to '. ' 
    lyrics = lyrics.replace("\n",". ")
    
    # catch ellipses and double periods and strip
    lyrics = lyrics.replace("...",".").replace("..",".").strip() 
    
    return lyrics

##Grab or Identify Cached Song Lyrics

In [361]:
# get lyrics within bs4.element.Tag lyricbox into the right output
from bs4 import Comment

def cleanLyrics(lyricbox, debug=False):
    """
    clean lyrics starting from the beautiful soup initial results at bs4.element.Tag lyricbox
    """
    lyrics = None
    try:
        # kill all script and style elements in lyricsbox as well as the meta paragraph
        # from http://stackoverflow.com/questions/5598524/can-i-remove-script-tags-with-beautifulsoup
        for script in lyricbox(["script", "style"]):
            script.extract()    # rip it out
        
        # kill all comments
        # from http://stackoverflow.com/questions/23299557/beautifulsoup-4-remove-comment-tag-and-its-content
        for child in lyricbox:
            if isinstance(child,Comment):
                child.extract()
        
        llist = lyricbox.contents # use this versus getText() to keep breaks
        if debug:  
            print "len contents --> ", len(llist)
            print "contents -->\n", llist
        
        # replace '<br/>' with '.'
        lyrics = "".join([x if isinstance(x,unicode) else ". " for x in llist])
        
        lyrics = normalizeLyrics(lyrics) # common cleaning
        
        # catch missing final punctuation
        if not lyrics.endswith("."):
            lyrics = lyrics + "."
        
    except Exception as e:
        print e
        lyrics = None
    
    if debug:
        print"---START ::: CLEAN LYRICS ---"
        print lyrics
        print"---END ::: CLEAN LYRICS ---"
    
    return lyrics    

In [93]:
def cachedLyricsOrExtract(pathsd, force=False, debug=False):
    """
    Leverage cache where possible; helpful for reprocessing. This will use
    `song_key` to lookup available results within data/harvested/lw-extracted-lyrics.
    
    Successful results are within data/harvested/lw-extracted-lyrics/<song_key>.txt
    Unsuccessful result are within data/harvested/lw-extracted-lyrics-error/<song_key>.txt
    
     --- Input ---    
    pathsd: Dictionary provided with paths to use as well as `song_key`
    force: optional Boolean to indicate full processing, ignoring cache, default = False
    debug: optional Boolean to indicate more verbose output
    
    --- Return ---
    tuple of the following:
    t[0] String path to processing results, 
    t[1] Boolean indicating True for success, False for issue
    t[2] Boolean indicating True for cache results, else False
    """  
    
    song_key = pathsd['song_key']
    
    #if not force and is in the cache (i.e. already persisted) just return it
    if not force and isNonZeroFile(pathsd['success_path']):
        print "... using song lyric in cache for ", song_key
        return pathsd['success_path'], True, True #success_path, success, cache
    elif not isNonZeroFile(pathsd['raw_path']):
        if debug:
            print "... no raw song lyric in cache to extract for ", song_key
        return pathsd['issues_path'], False, False #issues_path, not success, not cache
    
    # have an extracted lyric at the raw path, attempt processing.
    try:
        
        print "... attempting to extract song_key: ", song_key
        
        # Read text from raw 
        with io.open(pathsd['raw_path'], "r", encoding='utf-8') as text_file:
            text = text_file.read()
        
        # attempt to parse with BeautifulSoup
        soup = BeautifulSoup(text, "html.parser")        
        
        lyricbox = soup.find("div", attrs={"class": "lyricbox"}) #get first
        
        # punt off to cleaner
        lyrics = cleanLyrics(lyricbox, debug)  
              
        if lyrics:                    
            return writeStrToFile(lyrics, pathsd, 'success_path'), True, False #success_path, success, not cache
        
        # essentially, else 
        msg = "Not able to process lyrics for song_key: `{}`".format(song_key)
        return writeStrToFile(msg, pathsd, 'issues_path'), False, False #issues_path, not success, not cache
    
    except Exception as e:
        msg = "exception processing lyrics for song_key: `{}`, {}".format(song_key,e)
        print msg
        return writeStrToFile(msg, pathsd, 'issues_path'), False, False #issues_path, not success, not cache    

In [150]:
# main entry-point for extracted lyric processing
def extractAvailableRawSongLyrics(df, force=False, debug=False):
    """
    Attempt to extract raw lyrics downloaded from lyrics.wikia, skipping successfully persisted previous results.
    Each song_key result is individually persisted to file for repeat / additive processing pipeline.
    
     --- Input ---
    df: Dataframe from which to build and cache results   
    force: optional Boolean to indicate full processing, ignoring cache, default = False
    debug: optional Boolean to indicate more verbose output
    
    --- Return ---
    tuple of the following:
    t[0] dictionary of new processing by song_key with path to results,
    t[1] dictionary of existing / cached processing by song_key with path to results,
    t[2] dictionary of issues by song_key with path to results 
    """   
    cache_refs = {}
    new_refs = {}
    issues = {}    
    
    for r in df.iterrows():
        song_key = r[1].song_key
        
        pathsd = buildPathsDictFor(song_key)
            
        # let this call handle download or skip based on cache 
        # the following returns tuple with the following:
        # t[0] String path to processing results, 
        # t[1] Boolean indicating True if success or False if issue
        # t[2] Boolean indicating 'True' if results from cache
        t = cachedLyricsOrExtract(pathsd,force=force,debug=debug)
            
        # cached results (ignored)
        if t[2] and t[1]:
            cache_refs[song_key] = t[0]
        # new results    
        elif t[1]:
            new_refs[song_key] = t[0]
        # issues    
        else:    
            issues[song_key] = t[0]
        
    return new_refs, cache_refs, issues

###Quick Test to Verify Handling a Single Key

In [156]:
# quick test
ttuple = extractAvailableRawSongLyrics(lyricsdf[lyricsdf.song_key == "2001-96"], debug=True, force=True)   
print
print "how many new lyrics were downloaded? ", len(ttuple[0])
print "how many results were in the cache? ", len(ttuple[1])
print "how many issues were encountered? ", len(ttuple[2])

print ttuple[0]

... attempting to extract song_key:  2001-96
len contents -->  110
contents -->
[u'He spends his nights in California', <br/>, u'Watching the stars on the big screen', <br/>, u'And then he lies awake and he wonders', <br/>, u'"Why can\'t that be me?"', <br/>, <br/>, u"'Cause in his life he's filled with all these good intentions", <br/>, u"He's left a lot of things he'd rather not mention right now", <br/>, u'Just before he says goodnight', <br/>, u'He looks up with a little smile at me and he says', <br/>, <br/>, u'If I could be like that', <br/>, u'I would give anything', <br/>, u'Just to live one day', <br/>, u'In those shoes', <br/>, u'If I could be like that, what would I do?', <br/>, u'What would I do? Yeah', <br/>, <br/>, u'Now in dreams we run', <br/>, <br/>, u'She spends her days up in the north park', <br/>, u'Watching the people as they pass', <br/>, u'And all she wants is just a little piece of this dream', <br/>, u'Is that too much to ask?', <br/>, u'With a safe home, and 

##Process the 1970s
Validate process prior to committing to all decades

In [157]:
print "execution start --> {}".format(time.strftime('%a, %d %b %Y %H:%M:%S', time.localtime()))

execution start --> Mon, 23 Nov 2015 17:04:42


In [158]:
%%time
# process 70s
new_refs70, cache_refs70, issues70 = extractAvailableRawSongLyrics(lyricsdf[lyricsdf.decade == 1970])  

... attempting to extract song_key:  1970-1
... attempting to extract song_key:  1970-2
... attempting to extract song_key:  1970-3
... attempting to extract song_key:  1970-4
... attempting to extract song_key:  1970-5
... attempting to extract song_key:  1970-6
... attempting to extract song_key:  1970-7
... attempting to extract song_key:  1970-8
... attempting to extract song_key:  1970-9
... attempting to extract song_key:  1970-10
... attempting to extract song_key:  1970-11
... attempting to extract song_key:  1970-12
... attempting to extract song_key:  1970-13
... attempting to extract song_key:  1970-14
... attempting to extract song_key:  1970-15
... attempting to extract song_key:  1970-16
... attempting to extract song_key:  1970-17
... attempting to extract song_key:  1970-18
... attempting to extract song_key:  1970-19
... attempting to extract song_key:  1970-20
... attempting to extract song_key:  1970-21
... attempting to extract song_key:  1970-22
... attempting to e

In [162]:
print
print "how many new lyrics were extracted? ", len(new_refs70)
print "how many results were in the cache? ", len(cache_refs70)
print "how many issues were encountered? ", len(issues70)


how many new lyrics were extracted?  942
how many results were in the cache?  0
how many issues were encountered?  58


##Process the Rest

In [159]:
print "execution start --> {}".format(time.strftime('%a, %d %b %Y %H:%M:%S', time.localtime()))

execution start --> Mon, 23 Nov 2015 17:07:25


In [160]:
%%time
# process 70s
new_refs0, cache_refs0, issues0 = extractAvailableRawSongLyrics(lyricsdf[lyricsdf.decade != 1970]) 

... attempting to extract song_key:  1980-1
... attempting to extract song_key:  1980-2
... attempting to extract song_key:  1980-3
... attempting to extract song_key:  1980-4
... attempting to extract song_key:  1980-5
... attempting to extract song_key:  1980-6
... attempting to extract song_key:  1980-7
... attempting to extract song_key:  1980-8
... attempting to extract song_key:  1980-9
... attempting to extract song_key:  1980-10
... attempting to extract song_key:  1980-11
... attempting to extract song_key:  1980-12
... attempting to extract song_key:  1980-13
'NoneType' object is not callable
... attempting to extract song_key:  1980-14
... attempting to extract song_key:  1980-15
... attempting to extract song_key:  1980-16
... attempting to extract song_key:  1980-17
... attempting to extract song_key:  1980-18
... attempting to extract song_key:  1980-19
... attempting to extract song_key:  1980-20
... attempting to extract song_key:  1980-21
... attempting to extract song

In [161]:
print
print "how many new lyrics were downloaded? ", len(new_refs0)
print "how many results were in the cache? ", len(cache_refs0)
print "how many issues were encountered? ", len(issues0)


how many new lyrics were downloaded?  3406
how many results were in the cache?  1
how many issues were encountered?  93


##Save Off Issues

In [163]:
# save 70s issues
with open('../../data/harvested/lyrics-extracted-issues_1970s.json', 'w') as fp:
    json.dump(issues70, fp)

In [164]:
# save remaining issues
with open('../../data/harvested/lyrics-extracted-issues_after_1970s.json', 'w') as fp:
    json.dump(issues0, fp)

##Add Lyrics to Dataframe
**The rules are the following:**
* if a cached lyric extraction is available, it goes into df['lyrics']
* else if existing df['lyrics'] is valid, it is the keeper
* else if df['lyric_abstract'] is valid, it goes into df ['lyrics']

Collect the metrics on the processing


In [368]:
def addExtractedLyrics(lyricsdf, debug=False, bogus_threshold=10):    
        
    new = 0
    exist = 0
    abstract = 0
    invalid = 0
    
    def pickBestLyric(r, debug=False, bogus_threshold=10):        
        song_key = r[1]['song_key']
        lyrics = r[1]['lyrics']
        abstract = r[1]['lyrics_abstract']
        
        if not isinstance(abstract,str) or len(abstract) < bogus_threshold:
            abstract = ""
        
        if not isinstance(lyrics,str) or len(lyrics) < bogus_threshold:
            lyrics = ""
        
        tnew = 0
        texist = 0
        tabstract = 0
        tinvalid = 0
        
        best = lyrics #default
        
        pathsd = buildPathsDictFor(song_key) 
        
        # test for cached being available, if so, use
        cache_hit = False
        if isNonZeroFile(pathsd['success_path']):                
            # Read text from pathsd
            with io.open(pathsd['success_path'], "r", encoding='utf-8') as text_file:
                tmp = text_file.read().encode('ascii', 'ignore') #for dataframe, want ascii. 
            
            # Test for validity to avoid some bogus information
            if len(tmp) > bogus_threshold:
                best = tmp
                cache_hit = True
                
        # test for cache hit valid    
        if cache_hit:
            tnew += 1
            if debug:
                print "... cache is best for song_key --> ", song_key
                
        # test for abstract being better than lyric
        elif len(lyrics) < len(abstract):
            best = abstract
            tabstract +=1
            if debug:
                print "... `lyrics_abstract` is best for song_key --> ", song_key
            
        # stick with original, though ok    
        elif len(lyrics) >= bogus_threshold:
            texist +=1
            if debug:
                print "... existing at `lyrics` is best for song_key --> ", song_key
            
        # stick with original, though bad    
        else:
            tinvalid += 1
            if debug:
                print "... invalid `lyrics` (last option) for song_key --> ", song_key
        
        return best, (tnew,texist,tabstract,tinvalid)
    
    #use a copy of provided df
    df = lyricsdf.copy(deep=True).reset_index() 
    
    #build up new lyrics based on rules
    best_lyrics = []    
    for r in df.iterrows():
        b, t = pickBestLyric(r,debug=debug,bogus_threshold=bogus_threshold)
        #lyric winner
        best_lyrics.append(b)
        #counters
        new += t[0]
        exist += t[1]
        abstract += t[2]
        invalid += t[3]
    
    ## Get double use:
    # 1. apply to df on all 'lyrics' column
    idx = -1    
    for lyrics in best_lyrics:
        idx +=1
        
        # go ahead and normalize best lyrics.
        b = normalizeLyrics(lyrics)
        
        df.loc[idx, 'lyrics'] = b    
        
    #apply selectively on `lyrics_abstract` column with some more manipulation
    idx = -1  
    a_threshold = 280
    for lyrics in best_lyrics:
        idx +=1
        
        a = df.loc[idx, 'lyrics_abstract']
        awinner = "" # set the initial winner to empty
        
        if debug:
            print "type of abstract? ", type(a)
        
        if not isinstance(a,str) or len(a) < bogus_threshold:
            a = ""
        else:    
            awinner = a # can set a winner to a after checking conditions
        
        # only go further if have something worthwhile to compare
        if isinstance(lyrics,str) and len(lyrics) >= bogus_threshold:
            if len(lyrics) >= a_threshold or len(lyrics) > len(awinner):             
                awinner = lyrics
                    
        # some final cleanup    
        if awinner:
            
            # trim down if needed.
            if len(awinner) > 280:
                awinner = awinner[:280]
            
            # strip trailing '[...]'
            awinner = awinner.replace('[...]','')
            
            # handle common cleaning
            awinner = normalizeLyrics(awinner)
            
            # standardize ending
            awinner = awinner + "[...]"  
            
            if debug:
                print "changing abstract --> ", awinner
        
        # regardless, update with best abstract, whether empty or lyrics substring or same.
        df.loc[idx, 'lyrics_abstract'] = awinner
    
    return df, (new,exist,abstract,invalid)

##Quick Test on Single Record

In [369]:
tdf, ttuple = addExtractedLyrics(lyricsdf[lyricsdf.song_key == "1970-1"], debug=True)   
print
print "how many new lyrics applied? ", ttuple[0]
print "how many existing lyrics were preserved? ", ttuple[1]
print "how many lyric_abstract were used? ", ttuple[2]
print "how many lyrics are invalid (after other rules)? ", ttuple[3]

... cache is best for song_key -->  1970-1
type of abstract?  <type 'str'>
changing abstract -->  When you're weary. Feeling small. When tears are in your eyes. I will dry them all. I'm on your side. When times get rough. And friends just can't be found. Like a bridge over troubled water. I will lay me down. Like a bridge over troubled water. I will lay me down. When you're d[...]

how many new lyrics applied?  1
how many existing lyrics were preserved?  0
how many lyric_abstract were used?  0
how many lyrics are invalid (after other rules)?  0


In [370]:
tdf.head()

Unnamed: 0,index,position,year,title.href,title,artist,lyrics,decade,song_key,lyrics_url,lyrics_abstract
0,0,1,1970,https://en.wikipedia.org/wiki/Bridge_over_Trou...,Bridge over Troubled Water,Simon and Garfunkel,When you're weary. Feeling small. When tears a...,1970,1970-1,http://lyrics.wikia.com/Simon_And_Garfunkel:Br...,When you're weary. Feeling small. When tears a...


##Full Processing

In [371]:
print "execution start --> {}".format(time.strftime('%a, %d %b %Y %H:%M:%S', time.localtime()))

execution start --> Mon, 23 Nov 2015 20:47:49


In [372]:
%%time
mlyricsdf, rtuple = addExtractedLyrics(lyricsdf)   
print
print "how many new lyrics applied? ", rtuple[0]
print "how many existing lyrics were preserved? ", rtuple[1]
print "how many lyric_abstract were used? ", rtuple[2]
print "how many lyrics are invalid (after other rules)? ", rtuple[3]


how many new lyrics applied?  4293
how many existing lyrics were preserved?  43
how many lyric_abstract were used?  61
how many lyrics are invalid (after other rules)?  103
CPU times: user 10.2 s, sys: 6.49 s, total: 16.7 s
Wall time: 32.5 s


In [373]:
mlyricsdf.shape

(4500, 11)

In [374]:
mlyricsdf.head()

Unnamed: 0,index,position,year,title.href,title,artist,lyrics,decade,song_key,lyrics_url,lyrics_abstract
0,0,1,1970,https://en.wikipedia.org/wiki/Bridge_over_Trou...,Bridge over Troubled Water,Simon and Garfunkel,When you're weary. Feeling small. When tears a...,1970,1970-1,http://lyrics.wikia.com/Simon_And_Garfunkel:Br...,When you're weary. Feeling small. When tears a...
1,1,2,1970,https://en.wikipedia.org/wiki/(They_Long_to_Be...,(They Long to Be) Close to You,The Carpenters,Why do birds suddenly appear. Everytime you ar...,1970,1970-2,http://lyrics.wikia.com/Carpenters:%28They_Lon...,Why do birds suddenly appear. Everytime you ar...
2,2,3,1970,https://en.wikipedia.org/wiki/American_Woman_(...,American Woman,The Guess Who,"Mmm, da da da. Mmm, mmm, da da da. Mmm, mmm, d...",1970,1970-3,http://lyrics.wikia.com/The_Guess_Who:American...,"Mmm, da da da. Mmm, mmm, da da da. Mmm, mmm, d..."
3,3,4,1970,https://en.wikipedia.org/wiki/Raindrops_Keep_F...,Raindrops Keep Fallin' on My Head,B.J. Thomas,Raindrops are falling on my head. And just lik...,1970,1970-4,http://lyrics.wikia.com/B.J._Thomas:Raindrops_...,Raindrops are falling on my head. And just lik...
4,4,5,1970,https://en.wikipedia.org/wiki/War_(Edwin_Starr...,War,Edwin Starr,"War, huh, yeah. What is it good for? Absolutel...",1970,1970-5,http://lyrics.wikia.com/Edwin_Starr:War,"War, huh, yeah. What is it good for? Absolutel..."


## Save DataFrame

In [376]:
# save mlyricsdf
mlyricsdf.to_csv("../../data/conditioned/master-lyricsdf-extracted.csv",index=False)