# Albums and Songs Lab

### Introduction

In this lesson, we'll use the skills we have learned over the past several lessons to answer questions about the top songs, artists and albums over the past fifty years.

### Working with Songs

Let's start by working with data regarding top 500 albums according to the Rolling Stone Magazine.

In [1]:
import pandas as pd

In [2]:
url = "https://raw.githubusercontent.com/eng-6-22/mod-1-a-data-structures/master/6-top-songs/data.csv"
df = pd.read_csv(url)
albums = df.to_dict('records')

In [3]:
albums[:2]

[{'number': 1,
  'year': 1967,
  'album': "Sgt. Pepper's Lonely Hearts Club Band",
  'artist': 'The Beatles',
  'genre': 'Rock',
  'subgenre': 'Rock & Roll, Psychedelic Rock'},
 {'number': 2,
  'year': 1966,
  'album': 'Pet Sounds',
  'artist': 'The Beach Boys',
  'genre': 'Rock',
  'subgenre': 'Pop Rock, Psychedelic Rock'}]

In [4]:
len(albums)

478

> Well, 478.

### missing ranks

In [5]:
all_ranks = set(range(1, 501))
ranks = set([album['number'] for album in albums])
missing_ranks = list(all_ranks - ranks)

print(f'Missing ranks: {len(missing_ranks):,}')
print(','.join([str(i) for i in sorted(missing_ranks)]))

Missing ranks: 22
60,105,164,196,198,214,232,245,256,278,289,308,346,355,359,363,377,400,421,426,449,459


----------

Let's write some functions to help us better explore the data.

* `all_albums` - Takes an argument of albums and returns the list of album names.

* `all_artists` - Takes argument of list of albums and returns a list of all artists (where each element is a string), and no artist is repeated. 

* `find_by_name` - Has one argument of `album_name`. Returns a dictionary of the correct album, or `None` if no album is found.

* `find_by_ranks` - Takes `begin_rank` and `end_rank` as arguments.  Also possible to execute the function by just providing the `begin_rank` or `end_rank` (and not both).  If no arguments are provided the entire list of albums are returned.

* `find_by_years` - Takes `begin_year` and `end_year` as arguments, and returns a list of dictionaries for albums between those years.  Also possible to execute the function by just providing the `begin_year` or `end_year` (and not both).

In [6]:
def all_albums(albums: list[dict]) -> list[str]:
    return [album['album'] for album in albums]

In [7]:
def all_artists(albums: list[dict]) -> list[str]:
    return list(set([album['artist'] for album in albums]))

In [8]:
def find_by_name(albums: list[dict], album_name: str) -> dict:
    """Return album based on album name, otherwise None"""
    for album in albums:
        if album['album'] == album_name:
            return album

clash = {
    'number': 8,
    'year': 1979,
    'album': 'London Calling',
    'artist': 'The Clash',
    'genre': 'Rock',
    'subgenre': 'Punk, New Wave'
}

assert (find_by_name(albums, 'London Calling') == clash)
assert (find_by_name(albums, 'test') is None)

In [9]:
def find_by_ranks(albums: list[dict], begin_rank: int = None, end_rank: int = None) -> list[dict]:
    """Return all albums that are >= begin_rank OR <= end_rank"""
    if begin_rank and end_rank:
        raise ValueError('Can only give a begin OR end rank')

    if begin_rank:
        return [a for a in albums if a['number'] >= begin_rank]
    elif end_rank:
        return [a for a in albums if a['number'] <= end_rank]
    else:
        return albums

# these should all return the full albums list
assert (len(find_by_ranks(albums)) == len(albums))
assert (len(find_by_ranks(albums, begin_rank=1)) == len(albums))
assert (len(find_by_ranks(albums, end_rank=500)) == len(albums))

# subsets
assert (len(find_by_ranks(albums, end_rank=10)) == 10)
assert (len(find_by_ranks(albums, begin_rank=499)) == 2)

In [10]:
def find_by_years(albums: list[dict], begin_year: int = None, end_year: int = None) -> list[dict]:
    """Return all albums that are >= being_year OR <= end_rank"""
    if begin_year and end_year:
        raise ValueError('Can only give a begin OR end year')

    if begin_year:
        return [a for a in albums if a['year'] >= begin_year]
    elif end_year:
        return [a for a in albums if a['year'] <= end_year]
    else:
        return albums
    
# these should all return the full albums list
assert (len(find_by_years(albums)) == len(albums))
assert (len(find_by_years(albums, begin_year=1900)) == len(albums))
assert (len(find_by_years(albums, end_year=2023)) == len(albums))

sinatra = {
    'number': 101,
    'year': 1955,
    'album': 'In the Wee Small Hours',
    'artist': 'Frank Sinatra',
    'genre': 'Jazz, Pop',
    'subgenre': 'Big Band, Ballad'
}

# subsets
assert (len(find_by_years(albums, begin_year=2011)) == 1)
assert (find_by_years(albums, end_year=1955) == [sinatra])

### Working with Songs

Next, let's load up data related to songs, and data that connects albums and songs.

In [11]:
import pandas as pd

def get_songs():
    songs_url = "https://raw.githubusercontent.com/eng-6-22/mod-1-a-data-structures/master/6-top-songs/top-500-songs.txt"
    songs_df = pd.read_csv(songs_url, sep='\t', header = None, names = ['rank', 'song', 'artist', 'year'])
    songs = songs_df.to_dict('records')
    return songs

def get_albums():
    track_url = "https://raw.githubusercontent.com/eng-6-22/mod-1-a-data-structures/master/6-top-songs/track_data.json"
    albums_and_tracks = pd.read_json(track_url)
    albums_tracks = albums_and_tracks.to_dict('records')
    return albums_tracks

In [12]:
SONGS = get_songs()
ALBUMS = get_albums()

# set for faster lookups
TOP_500 = set([song['song'] for song in SONGS])

`album_most_top_songs` - Returns the name of the artist and album that has that most songs featured on the top 500 songs list

In [13]:
def album_most_top_songs() -> dict:
    top_album = {
        'artist': None,
        'album': None,
        'count': 0
    }

    for album in ALBUMS:
        count = sum([1 for track in album['tracks'] if track in TOP_500])

        if count > top_album['count']:  # won't resolve a tie
            top_album['artist'] = album['artist']
            top_album['album'] = album['album']
            top_album['count'] = count

    return top_album

In [14]:
album_most_top_songs()

{'artist': 'Elvis Presley', 'album': 'Elvis Presley', 'count': 8}

`top_ten_albums_by_songs` - returns a dictionary with the 10 albums that have the most songs that appear in the top songs list. The album names should be the keys and the corresponding values should be the number of songs that appear on the top 500 list.

In [15]:
def top_ten_albums_by_songs() -> list[dict]:
    """Return top 10 albums with the most songs in the top 500 songs"""
    album_counts = []

    for album in ALBUMS:
        count = sum([1 for track in album['tracks'] if track in TOP_500])
        album_counts.append((album['album'], count))  # tuple

    # sort on count desc
    album_counts_sorted = sorted(album_counts, key=lambda t: t[1], reverse=True)
    return dict(album_counts_sorted[:10])

In [16]:
top_ten_albums_by_songs()

{'Elvis Presley': 8,
 'The Sun Records Collection': 6,
 'Are You Experienced': 4,
 'Portrait of a Legend 1951-1964': 4,
 'Highway 61 Revisited': 3,
 'Bringing It All Back Home': 3,
 'Star Time': 3,
 'Led Zeppelin II': 3,
 'I Never Loved a Man the Way I Love You': 3,
 'All the Young Dudes': 3}

In [17]:
def top_albums_by_songs(n: int = 10) -> list[dict]:
    """Return top n albums with the most songs in the top 500 songs"""
    album_counts = []

    for album in ALBUMS:
        count = sum([1 for track in album['tracks'] if track in TOP_500])
        album_counts.append((album['album'], count))  # tuple

    # sort on count descending
    album_counts_sorted = sorted(album_counts, key=lambda t: t[1], reverse=True)
    return dict(album_counts_sorted[:n])

In [18]:
top_albums_by_songs(5)

{'Elvis Presley': 8,
 'The Sun Records Collection': 6,
 'Are You Experienced': 4,
 'Portrait of a Legend 1951-1964': 4,
 'Highway 61 Revisited': 3}

In [19]:
top_albums_by_songs()

{'Elvis Presley': 8,
 'The Sun Records Collection': 6,
 'Are You Experienced': 4,
 'Portrait of a Legend 1951-1964': 4,
 'Highway 61 Revisited': 3,
 'Bringing It All Back Home': 3,
 'Star Time': 3,
 'Led Zeppelin II': 3,
 'I Never Loved a Man the Way I Love You': 3,
 'All the Young Dudes': 3}