# DSCI 511: Data acquisition and pre-processing<br>Chapter 8: Establishing a Database with Documentation

## 8.0 From data collection to database construction
Supposing we've finished creating our data acquisition script and we can downloaded all kinds of data, are we done?
No way! Not only do data scientists need to be experts at acquiring data, they need to also be able to create
efficient databases with the data they collect (with appropriate documentation!), analyze said data (not so much in this course), and 
appropriately communicate their findings. This week we'll be talking about the creation of databases. Maintaining generality, a _database_ for us is simply a collection of data with structure and format intentional for a particular mode of access of interaction.

Note: As with other topics in this course, we'll utilize some pretty low-level Python tools instead of any specific (relational, i.e., sql) software. There are extensive courses in the CCI curricula available, so please contact your instructor or other advisors for more information on the college's SQL-specific coursework.

## 8.1 Directory structures need to scale!
Let's first talk about good habits to get into when actually storing your data. Oftentimes, data science projects will last a long time, and proper organization of the both the code used during the project along with the actual data files themselves is essential to keep data scientists from going insane. Of course, we begin with a new folder for our project. Let's say we're working on a project called `Hello World`. So, we've created our `hello_world` directory and are residing there. It's usually best to keep your code and data separated, so we'd begin by creating appropriate directories, `./hello_world/code` and `./hello_world/data`. The way you organize the `code` folder is entirely dependent on the project. What's more important, and often more difficult, is the organization of the `data` folder. 

Creation of directories is quick and easy (as is shown below using the `os` module). In order to facilitate the organization of your data, don't be scared of creating extra directories. Sometimes one of the easiest schema for organizing a set of records is to keep each record with its associated metadata in its own directory. This is what is done below in an updated music scraping script. Just make sure when you are trying to retrieve your data to first check if the directory you are trying to access actually exists! To check and see if a file or directory exists we can use the `os.path.exists(path)` function:

In [1]:
import os
print(os.path.exists("data/"))
print(os.path.exists("database.ipynb"))

False
True


## 8.2 Metadata documents for strategic data interaction
First, in case you are unfamiliar with the term, __metadata__ has a bit of a nebulous definition. For our purposes, we will just go with the following basic definition: metadata is a set of data which describes any other set of data. It is very important to set up metadata documentation about datasets, particularly if they are quite large. Imagine if we had a data set of 2 million tweets, and we wanted to find each tweet made by a specific user. If we want to do this without strictly using the tweets themselves, we would need to set up a loop to go through each and every element in the set of 2 million, checking each one for a match. On the other hand, if we set up an additional document which lists the tweet IDs associated with each tweet made per user, then this task becomes trivial. While setting up this document would take a while, the amount of time saved throughout the lifecycle of the project will probably be massive. Why perform the same computations over and over when we can do it once and for all in the beginning of our work? If this wasn't totally clear, that's okay. We'll discuss these concepts in more detail below.

### 8.2.1 Inverted Indices
An __inverted index__ is an index data structure storing a mapping from content to its locations in a dataset. Below, we'll map the genres of scraped songs to the songs that are a part of the given genres. What's the point of making an inverted index? Is it actually worth the effort? Yes, absolutely!

Inverted indices are one of the data structures which allow for search engines like Google to be so fast. Say you search Google for "data science". Perhaps unsurprisingly, the search engine doesn't immediately scrape every web page on the internet and search each of them in succession for the query of "data science". Instead, Google continually crawls the web and updates a massive inverted index where alongside each search query there exists a list of associated web pages that contain information relevant to the query. This is one of the key components of the modern search engine which allows for such quick results. Here's a good graphic describing the kind of inverted index data uses to resolve keyword search terms:
![Inverted Idices](./images/inverted-index.jpg)

#### 8.2.1.1 Example: Facebook comments organized by thread
Let's look at another context for using inverted indices using some mocked up Facebook comments from a thread. For more information on what comment objects look like on Facebook, check out the docs:
- https://developers.facebook.com/docs/graph-api/reference/v3.1/comment

Say we have a large corpus of Facebook comments. When the Facebook Graph API is queried, each comment is returned in its own `JSON` object. When loaded into Python using the `json` module, this yields a dictionary of the following form (depending on the requested/accessible values):

```
{
    "created_time": "2018-05-05T21:38:24+0000",
    "id": "10156502712828459_10156502784433450",
    "message": "I love data science!"
}
```
Each message has a unique `id`, with a twist. It turns out the part of the `id` before the underscore is just the `id` of the Object the comment was made on in the first place, e.g., a page's post. The second half of the `id` distinguishes all the comments on the Objects from one another. So, if we have a huge corpus of Facebook comments and wanted to do some analysis of the various threads, it'd be really useful to create an inverted index that lists each comment that we have for each thread.

#### 8.2.1.2 Extended example: building an inverted index to analyzing threads
As it turns out, we can think of the first half of the comment ids as metadata, and use them to build an inverted index that allows us to lookup comments by thread. Since we can't publicly access these data, here's some code to 1) create a data directory for this chapter/'project' and mock up Facebook comments data in `'./data/COMMENTS.json'`. 

To create our directory let's use the following bash command: `mkdir -p <DIRECTORY>`, where the `-p` flag tells `mkdir` to build directories for any path dependencies.

In [2]:
os.system("mkdir -p ./data")

0

If you haven't seen it yet, we cab use the `%%writefile <FILE>` IPython 'magic' command at the top of a cell to create a file:

In [3]:
%%writefile ./data/COMMENTS.json
[
    {
        "created_time": "2018-08-21T17:12:22+0000",
        "id": "T2_C2",
        "message": "Yeah, but we can still fake some data up here to get the point across."
    },
    {
        "created_time": "2018-05-05T21:38:24+0000",
        "id": "T1_C1",
        "message": "I love data science!"
    },
    {
        "created_time": "2018-08-21T15:24:16+0000",
        "id": "T2_C1",
        "message": "Unfortunately terms of use can make it difficult to share data."
    },
    {
        "created_time": "2018-05-05T22:02:04+0000",
        "id": "T1_C2",
        "message": "Yeah, but I didn't expect all of this pre-processing work!"
    }
    
]

Writing ./data/COMMENTS.json


Now that we have some 'comments' in place, our goal is to create and use an inverted index file to efficiently access comments by thread. 

Note: Storing each thread's comments in a separate file would allow us to load the individual threads more quickly. Since the comments would be accessed according to a thread (Post's) ID, this would actually be a convenient way to initially store the data. Moreover, such storage would be a very important step to take if we were going scale our database up, say, continually collecting data. However, storing comments in thread-level files biases access towards thread-level analysis. So, if we were primarily interested in accessing our data by time, e.g., for a timeseries analysis, it would be more convenient to organize the comments into files by day!

Regardless, if the database we're setting up is for a fixed amount of content, the code required to set up our thread-access system is a bit simpler. So this is a good place to start! 

In [5]:
import json
from collections import defaultdict

# Get the comments
with open('./data/COMMENTS.json', 'r') as f:
    COMMENTS = json.load(f)
    
ids_by_thread = defaultdict(list)
comments_by_id = {}

for comment in COMMENTS:
    threadID, commentID = comment['id'].split('_')
    ids_by_thread[threadID].append(commentID)
    comments_by_id[comment['id']] = comment
    
# Write the new files
with open('./data/ids_by_thread.json', 'w') as f:
    json.dump(ids_by_thread, f, sort_keys = True, indent = 4)
    
with open('./data/comments_by_id.json', 'w') as f:
    json.dump(comments_by_id, f, sort_keys = True, indent = 4)

Now the `ids_by_thread.json` file is a JSON file which yields a relatively small dictionary where the keys are `threadID`s, and the values are lists of the associated `commentID`s. Note: to ease the loading of a thread (or switching between threads) we've also loaded the comments a bit differently this time, storing each comment in a dictionary (nee a list), keyed by `commentID`

Now we'll work on some functions that will ease a thread-by-thread analysis. First, let's create a basic Python class to represent a thread, and have it initialize to parses message times and store our comments in their order of appearance:

In [6]:
import datetime
class Thread(object):
    """Represents a full Facebook commentary thread. comments is a list of the threads comments"""
    
    def __init__(self, comments, IDs):
        # Initiate the variables        
        self.comments = []
        for ID in IDs:
            self.comments.append((self.getTime(comments[ID]['created_time']), comments[ID]))
        self.comments.sort()
                                  
    @staticmethod  # This just means the function isn't strictly needed for the class, but useful to have
    def getTime(time):
        """produce a datetime object from a time string"""
        formatted = datetime.datetime.strptime(time, "%Y-%m-%dT%H:%M:%S+0000")
        return formatted   

Ok, now we can create some functions that can easily extract useful data for us. First, we create a function that loads up a Thread class instance for us, which the other functions will use.

In [7]:
with open('./data/ids_by_thread.json', 'r') as f:
    ids_by_thread = json.load(f)
    
with open('./data/comments_by_id.json', 'r') as f:
    comments_by_id = json.load(f)
    
## just take an arbitrary threads comment IDs
threadID = list(ids_by_thread.keys())[0]
IDs = [threadID+"_"+commentID for commentID in ids_by_thread[threadID]]

thread_obj = Thread(comments_by_id, IDs)
    
thread_obj.comments

[(datetime.datetime(2018, 5, 5, 21, 38, 24),
  {'created_time': '2018-05-05T21:38:24+0000',
   'id': 'T1_C1',
   'message': 'I love data science!'}),
 (datetime.datetime(2018, 5, 5, 22, 2, 4),
  {'created_time': '2018-05-05T22:02:04+0000',
   'id': 'T1_C2',
   'message': "Yeah, but I didn't expect all of this pre-processing work!"})]

This code might have seemed like a bit of work, but now, for the rest of the lifecycle of our project, we can easily access different threads in our data. For example, since our class structures a given thread by order of appearance, we can now easily iterate through the conversation as it unfolded, with parsed times:

In [8]:
for time, comment in thread_obj.comments:
    print(time, comment['message'])

2018-05-05 21:38:24 I love data science!
2018-05-05 22:02:04 Yeah, but I didn't expect all of this pre-processing work!


### 8.2.2 Getting the data right, the first time
While accessing a large dataset it might not seem convient to pre-define and execute a database structure, preprocess data, or generate metadata, but it can really save a lot of time on the back end, and should definitely be a prioroty with ongoing (streaming) data collections. Let's revisit out song lyrics exercise and do some strategic preprocessing and file management.

#### 8.2.1.1 Exercise: Building a song lyrics database with metadata
Let's work with the song lyrics scraper we created during the Harvesting Data lecture. Instead of downloading the entire dataset again and then creating our metadata files, it'd be much more efficient to rewrite our data acquisition procedure to create the metadata as it runs&mdash;upon download, we alread have each piece of data interpreted in memory, i.e.,  don't have to read from disk! The old pieces of web scraping code was just storing songs by artist in large, alphabetic data files. Here, our tasks center around making sure the songs are organized alphabeticaly by artist, and are accessible by albums and genres. We'll want to exercise care as we create data and metadata files&mdash;we need to come up with a consistent naming scheme for the different artists and songs since they don't have IDs from the website&mdash;what could go wrong if we just named files according to artist, album, song, or genre names?

Pulling the pieces of scaper code together, here's a fill-in the blanks-style exercise. Complete the marked changes:

In [1]:
from bs4 import BeautifulSoup
import requests, re, string, json, os

#######################################################################
####### 0. Create a primary data directory. ###########################
#######################################################################
#######################################################################

#######################################################################
####### 1. Create reverse-lookup for songs by genre ###################
#######################################################################
#######################################################################

## go through all of the letters in the alphabet
for letter in string.ascii_lowercase:
    
    #######################################################################
    ####### 2. Create the letter-level directory ##########################
    #######################################################################
    #######################################################################
    
    #######################################################################
    ####### 3. Initialize a letter-level metadata file ####################
    ####### create a data file for the current letter
    filename = "songlyrics-{}.json".format(letter)
    fh = open(filename,  "w")
    fh.close()
    #######################################################################
    #######################################################################
    
    ## open and parse the html for the current letter
    letter_link = 'http://www.songlyrics.com/{}/'.format(letter)
    letterhtml = requests.get(letter_link).text
    lettersoup = BeautifulSoup(letterhtml, 'html.parser')

    ## collect the pages for this letter
    pages = ["/{}/".format(letter)]
    for letterlink in lettersoup.find_all('a'):
        ## filter links for letter pages
        if letterlink.get("href") and re.search("^Page \d+$", letterlink.get("title", "NOTITLE")):            
            pages.append(letterlink['href'])

    ## go through the letter pages
    for page in pages:        
        ## open and parse the html for the current page of this letter
        pagehtml = requests.get("http://www.songlyrics.com" + page).text
        pagesoup = BeautifulSoup(pagehtml, 'html.parser')

        ## go through the artists in the page
        for pagelink in pagesoup.find_all('a'):
            ## filter links for artist pages
            if re.search("^http://.*?-lyrics/$", pagelink.get("href", "NOLINK")):

                #######################################################################                
                ####### 4. remove old data structure and hold on to the artist's name 
                ####### set up data and store artist-level information
                data = {
                    "Artist": pagelink.text,
                    "url": pagelink['href'],
                    "Songs": {}
                }
                #######################################################################
                #######################################################################

                #######################################################################
                ####### 5. Output artist info to letter-level metadata file ###########
                #######################################################################
                #######################################################################

                #######################################################################
                ####### 6. Create artist-level directory. #############################
                #######################################################################
                #######################################################################
                
                #######################################################################
                ####### 7. Create an artist-level metadata file #######################
                #######################################################################
                #######################################################################      
                
                ## open and parse the html for the current artist on this page
                artisthtml = requests.get(data["url"]).text
                artistsoup = BeautifulSoup(artisthtml, 'html.parser')                        

                ## go through the songs of this artist
                for songlink in artistsoup.find_all('a'):

                    ## filter links for song pages
                    if songlink.get("itemprop", "NOITEMPROP") == "url" and songlink.get("title"):
                                                
                        #######################################################################
                        ############ 8. Hold song title; store info as artist-level metadata
                        ############ store initial song-level information
                        title = songlink.text
                        data["Songs"][title] = {"Title": title}
                        data["Songs"][title]["url"] = songlink['href']
                        #######################################################################
                        #######################################################################                        

                        ## open and parse the html for the current song by this artist
                        songhtml = requests.get(data["Songs"][title]["url"]).text
                        songsoup = BeautifulSoup(songhtml, 'html.parser')

                        ## go through paragraphs to find song attributes
                        for par in songsoup.find_all("p"):
                            if re.search(": ", par.text):
                                pieces = re.split(": ", par.text)
                                key = pieces[0]
                                value = ": ".join(pieces[1: len(pieces)])

                                #######################################################################                                
                                ############ 9. add song attributes to artist-level metadata ##########
                                data["Songs"][title][key] = value    
                                #######################################################################
                                #######################################################################                        

                                #######################################################################                                
                                ############ 10. add song attributes to reverse song lookup ###########
                                #######################################################################
                                #######################################################################                                

                        #######################################################################                                
                        ############ 11. output song metadata to artist-level metadata file ###
                        #######################################################################
                        #######################################################################                                
                                
                        ## go through divs to find the one with the song lyrics
                        for div in songsoup.find('body').find_all('div'):
                            if div.get("id","NOCLASS") == "songLyricsDiv-outer":
                                
                                #######################################################################                                
                                ############ 12. output song lyrics as text in artist-level directory #                                
                                data["Songs"][title]["Lyrics"]=div.text
                                #######################################################################
                                #######################################################################
                        
                        break
                        
                #######################################################################
                #### 13. remove old data write out ####################################
                
                ## write out the data for this artist, appending to the end of this letter's file
                with open(filename, "a") as fh:
                    fh.writelines(json.dumps(data)+"\n")
                    
                #######################################################################
                #######################################################################
                
                break
        break
        
    break

#######################################################################
####### 14. Output reverse-lookup for songs by attributes #############
#######################################################################
#######################################################################

#### 8.2.1.2 Solution: Making the changes (Spoilers if you're doing the exercise!)
By making a reverse lookup metadata file for artists by associated song attributes, we are denormalizing data and making it easy to perform specific transformations which are interesting for analysis. So here is the above scraper, with the additional changes we wanted to make all filled out. This is a lot, so make sure to take some time and really figure out what's going on here. If you run this code, take the time as well to review the directory structure and files it creates, and if you performed the above exercise on your own, compare your edits to those below!

In [9]:
from bs4 import BeautifulSoup
import requests, re, string, json, os

#######################################################################
####### 0. Create a primary data directory. ###########################
os.system("mkdir ./data/")
#######################################################################
#######################################################################

#######################################################################
####### 1. Create objects for reverse-lookup of songs by genre ########
songsByAttribute = {}
attributeIDs = {}
attributes = {}
attributeNumbers = {}
#######################################################################
#######################################################################

## go through all of the letters in the alphabet
for letter in string.ascii_lowercase:

    numartists = 0
    
    #######################################################################
    ####### 2. Create the letter-level directory ##########################
    os.system("mkdir ./data/{}/".format(letter))
    #######################################################################
    #######################################################################    
    
    #######################################################################
    ####### 3. Initialize a letter-level metadata file ####################
    lettermetafile = "./data/{}/lettermeta.json".format(letter)
    fh = open(lettermetafile,  "w")
    fh.close()    
    #######################################################################
    #######################################################################
    
    ## open and parse the html for the current letter
    letter_link = 'http://www.songlyrics.com/{}/'.format(letter)
    letterhtml = requests.get(letter_link).text
    lettersoup = BeautifulSoup(letterhtml, 'html.parser')

    ## collect the pages for this letter
    pages = ["/{}/".format(letter)]
    for letterlink in lettersoup.find_all('a'):
        ## filter links for letter pages
        if letterlink.get("href") and re.search("^Page \d+$", letterlink.get("title", "NOTITLE")):            
            pages.append(letterlink['href'])

    ## go through the letter pages
    for page in pages:        
        ## open and parse the html for the current page of this letter
        pagehtml = requests.get("http://www.songlyrics.com" + page).text
        pagesoup = BeautifulSoup(pagehtml, 'html.parser')

        ## go through the artists in the page
        for pagelink in pagesoup.find_all('a'):
            ## filter links for artist pages
            if re.search("^http://.*?-lyrics/$", pagelink.get("href", "NOLINK")):
                
                #######################################################################                
                ####### 4. remove old data structure and hold on to the artist's data #
                ####### keep track of number of artists, songs, and create an ID
                numartists += 1
                artistID = "{}-{}".format(letter, str(numartists))
                numsongs = 0
                
                artist = pagelink.text
                artisturl = pagelink['href']
                
                
#                 ## set up data and store artist-level information
#                 data = {
#                     "Artist": pagelink.text,
#                     "url": pagelink['href'],
#                     "Songs": {}
#                 }                
                #######################################################################
                #######################################################################

                #######################################################################                
                ####### 5. Output artist info to letter-level metadata file ###########
                with open(lettermetafile,  "a") as f:
                    f.writelines(artistID + "\t" + artist + "\t" + artisturl + "\n")                    
                #######################################################################
                #######################################################################                    
                    
                #######################################################################
                ####### 6. Create artist-level directory. #############################
                artist_dir = './data/{}/{}/'.format(letter, artistID)
                os.system("mkdir " + artist_dir)
                #######################################################################
                #######################################################################
                
                #######################################################################
                ####### 7. Create an artist-level metadata file #######################
                artistmetafile = artist_dir + "artistmeta.json"
                fh = open(artistmetafile,  "w")
                fh.close()               
                #######################################################################
                #######################################################################                
                                
                ## open and parse the html for the current artist on this page
                ## note we now use the artist's url!
                artisthtml = requests.get(artisturl).text
                artistsoup = BeautifulSoup(artisthtml, 'html.parser')                        

                ## go through the songs of this artist
                for songlink in artistsoup.find_all('a'):

                    ## filter links for song pages
                    if songlink.get("itemprop", "NOITEMPROP") == "url" and songlink.get("title"):                        

                        #######################################################################
                        ############ 8. Hold song title; store info as artist-level metadata ##
                        ## keep track of number of songs and create and ID
                        numsongs += 1
                        titleID = "{}-{}".format(artistID, str(numsongs))
                        
                        ## hold on to the song's title
                        title = songlink.text
                        
#                         data["Songs"][title] = {"Title": title}
#                         data["Songs"][title]["url"] = songlink['href']

                        data = {
                            "ID": titleID,
                            "title": title,
                            "url": songlink['href']
                        }
                        #######################################################################
                        #######################################################################

                        ## open and parse the html for the current song by this artist
                        ## note the data format has changed to get the song's url!
                        songhtml = requests.get(data["url"]).text
                        songsoup = BeautifulSoup(songhtml, 'html.parser')

                        ## go through paragraphs and get song attributes
                        for par in songsoup.find_all("p"):
                            if re.search(": ", par.text):
                                pieces = re.split(": ", par.text)
                                key = pieces[0]
                                value = ": ".join(pieces[1: len(pieces)])

                                #######################################################################                                
                                ############ 9. add song attributes to artist-level metadata ##########
                                if key != "Note":
                                    data[key] = value
                                #######################################################################
                                #######################################################################
                                
                                #######################################################################                                
                                ############ 10. add song attributes to reverse song lookup ###########
                                if key != "Note":
                                    attributeNumbers.setdefault(key, 1)
                                    attributeIDs.setdefault(key, {})
                                    attributes.setdefault(key, {})
                                    if not attributeIDs[key].get(value, False):
                                        attributeID = "{}-{}".format(key, str(attributeNumbers[key]))
                                        attributes[key][attributeID] = value
                                        attributeIDs[key][value] = attributeID
                                        attributeNumbers[key] += 1
                                    else:
                                        attributeID = attributeIDs[key][value]                                        
                                    
                                    songsByAttribute.setdefault(key, {})
                                    songsByAttribute[key].setdefault(attributeID, {})
                                    songsByAttribute[key][attributeID].setdefault(artistID, [])
                                    songsByAttribute[key][attributeID][artistID].append(titleID)
                                #######################################################################
                                #######################################################################

                        #######################################################################                                
                        ############ 11. output song metadata to artist-level metadata file ###
                        with open(artistmetafile,  "a") as f:
                            f.writelines(json.dumps(data) + "\n")
                        #######################################################################
                        #######################################################################                            
                                
                        ## go through divs to find the one with the song lyrics
                        for div in songsoup.find('body').find_all('div'):
                            if div.get("id", "NOCLASS") == "songLyricsDiv-outer":

                                #######################################################################                                
                                ############ 12. output song lyrics as text in artist-level directory #
                                title_file = "./data/{}/{}/{}.txt".format(letter, artistID, titleID)
                                with open(title_file, "w") as f:
                                    f.writelines(div.text + "\n")
                                
#                                 data["Songs"][title]["Lyrics"]=div.text
                                #######################################################################
                                #######################################################################

                                break
            
                    ## now, only break after 10 songs by an artist
                    if numsongs >= 1:
                        break
                        
                #######################################################################
                #### 13. remove old data write out ####################################
#                 ## write out the data for this artist, appending to the end of this letter's file
#                 with open(filename, "a") as fh:
#                     fh.writelines(json.dumps(data)+"\n")
                #######################################################################
                #######################################################################
                
            ## now, only break if this is the tenth artist of this letter!
            if numartists >= 1:
                break
        
        ## this stops us after one page of each letter
        break
        
    ## this stops us after one letter in the alphabet
#     break

#######################################################################
####### 14. Output reverse-lookup for songs by genre ##################
os.system("mkdir ./data/Genre/")
fh = open("./data/Genre/attributeIDs.txt", "w")
for attributeID in songsByAttribute["Genre"]:
    fh.writelines(attributeID + "\t" + attributes["Genre"][attributeID] + "\n")
    with open("./data/Genre/" + attributeID + ".json", "w") as f:
        f.writelines(json.dumps(songsByAttribute["Genre"][attributeID]) + "\n")
fh.close()
#######################################################################
#######################################################################

### 8.2.2 Accessing our database
The goal here is to create API-like access for our local database. Let's work on our goal of being able to access data by genre. To do this, we'll make a function that reads the appropriate reverse-lookup file and finds all songs titles/artists with the desired genre.

In [3]:
genres = {}
with open("./data/Genre/attributeIDs.txt", "r") as f:
    for line in f:
        line = line.strip()
        ID, genre = re.split("\t", line)
        genres[genre] = ID

def genreSongs(genre):
    with open("./data/Genre/{}.json".format(genres[genre]), 'r') as f:
        genredata = json.load(f)
    data = []
    for artistID in genredata:
        letter = artistID[0]
        songs = genredata[artistID]
        with open("./data/{}/{}/artistmeta.json".format(letter, artistID)) as f:
            for line in f:
                songmeta = json.loads(line.strip())
                if songmeta["ID"] in songs:
                    data.append((songmeta["Artist"], songmeta["title"]))

    return data

In [4]:
for artist, song in genreSongs("Rock"):
    print("Artist: " + artist)
    print("Song: " + song)
    print

Artist: A
Song: Sing-A-Long
Artist: X
Song: 4th of July


#### 8.2.2.1 Exercise: accessing songs by album
Review the `genreSongs()` function and use it as a starting point to retrieve a specific albumn's worth of song data for a specified artist. Albums and artists should be specified by string arguments. Be sure to have this functionfail gracefully/informatively if no match is found in the database! 

In [None]:
## place code here