# Albums and Songs Lab

### Introduction

In this lesson, we'll use the skills we have learned over the past several lessons to answer questions about the top songs, artists and albums over the past fifty years.

### Working with Songs

Let's start by working with data regarding top 500 albums according to the Rolling Stone Magazine.

In [1]:
import pandas as pd
url = "https://raw.githubusercontent.com/data-eng-10-21/mod-1-a-data-structures/master/6-top-songs/data.csv"
df = pd.read_csv(url)
albums = df.to_dict('records')

In [None]:
# albums[:2]

In [None]:
len(albums)

478

> Well, 478.

Let's write some functions to help us better explore the data.

* `all_albums` - Takes an argument of albums and returns the list of album names.

* `all_artists` - Takes argument of list of albums and returns a list of all artists (where each element is a string), and no artist is repeated. 

* `find_by_name` - Has one argument of `album_name`. Returns a dictionary of the correct album, or `None` if no album is found.

* `find_by_ranks` - Takes `begin_rank` and `end_rank` as arguments.  Also possible to execute the function by just providing the `begin_rank` or `end_rank` (and not both).  If no arguments are provided the entire list of albums are returned.

* `find_by_years` - Takes `begin_year` and `end_year` as arguments, and returns a list of dictionaries for albums between those years.  Also possible to execute the function by just providing the `begin_year` or `end_year` (and not both).

### Working with Songs

Next, let's load up data related to songs, and data that connects albums and songs.

In [37]:
import pandas as pd
songs_url = "https://raw.githubusercontent.com/data-eng-10-21/mod-1-a-data-structures/master/6-top-songs/top-500-songs.txt"
songs_df = pd.read_csv(songs_url, sep='\t', header = None, names = ['rank', 'song', 'artist', 'year'])
top_five_hundred_songs = songs_df.to_dict('records')

track_url = "https://raw.githubusercontent.com/data-eng-10-21/mod-1-a-data-structures/master/6-top-songs/track_data.json"
albums_and_tracks = pd.read_json(track_url)
albums_tracks = albums_and_tracks.to_dict('records')

In [None]:
* Find the how many top five hundred songs are on each album

Given an album, Sgt. Pepper's Lonely Hearts Club Band
* Want to 
  * Go through the list top 500 songs, and count up how many times this occurs 
  Starting top 500
  [{'artist': 'Bob Dylan', 'song': 'Like a Rolling Stone'}, {'song': 'man who sold the world', 'artist': 'nirvana'}]
  Goal for top 500
  [{'artist': 'Bob Dylan', 'song': 'Like a Rolling Stone', 'album': 'Rolling Stone Album'}]


  * Album tracks - has all of the songs
    tracks = [{'song': 'man who sold the world', 'album': 'nevermind'}]    


In [69]:
# albums_tracks[1]

In [73]:
songs = build_songs(albums_tracks)

def build_songs(albums_tracks):
  songs = []
  
  for album in albums_tracks:
    
    tracks = build_album_songs(album) 
    songs = songs + tracks
  return songs

def build_album_songs(album):
  album_name = album['album']
  tracks = album_tracks(album)
  return [{'album': album_name, 'song': track} for track in tracks]
  
  
def album_tracks(album):
  album_tracks = album['tracks']
  return list(set([album_track.split(' -')[0].split(' (R')[0] for album_track in album_tracks]))

In [79]:
# [song['song'] for song in songs][:100]

In [64]:
[song for song in songs if song['song'] == 'Like a Rolling Stone']

[]

In [60]:
top_five_hund_song = top_five_hundred_songs[0]
top_five_hund_song

{'artist': 'Bob Dylan',
 'rank': 1,
 'song': 'Like a Rolling Stone',
 'year': 1965}

In [82]:
def find_album(top_song, songs):
  song_name = top_song['song']
  # 'Like a Rolling Stone'
  found_songs = [song for song in songs if song['song'] == song_name]
  if found_songs:
    return found_songs[0]['album']
  else: 
    return 'N/A'

In [85]:
new_songs = []
for top_five_hund_song in top_five_hundred_songs:
  new_song = top_five_hund_song.copy()
  album_name = find_album(top_five_hund_song, songs)
  new_song['album'] = album_name
  new_songs.append(new_song)

In [90]:
albums_for_top_five_hundred = [new_song['album'] for new_song in new_songs]
album_histogram = dict.fromkeys(albums_for_top_five_hundred, 0)
for album in albums_for_top_five_hundred:
  album_histogram[album] += 1

In [97]:
sorted(list(album_histogram.items()), key= lambda album_count: album_count[1], reverse = True)[:3]

[('N/A', 289), ('Elvis Presley', 6), ('The Great Twenty Eight', 5)]

In [25]:
build_album_songs(first_album)

[{'album': "Sgt. Pepper's Lonely Hearts Club Band",
  'song': 'With A Little Help From My Friends'},
 {'album': "Sgt. Pepper's Lonely Hearts Club Band",
  'song': 'A Day In The Life'},
 {'album': "Sgt. Pepper's Lonely Hearts Club Band", 'song': 'Penny Lane'},
 {'album': "Sgt. Pepper's Lonely Hearts Club Band", 'song': 'Getting Better'},
 {'album': "Sgt. Pepper's Lonely Hearts Club Band",
  'song': "Sgt. Pepper's Lonely Hearts Club Band"},
 {'album': "Sgt. Pepper's Lonely Hearts Club Band",
  'song': 'Being For The Benefit Of Mr. Kite!'},
 {'album': "Sgt. Pepper's Lonely Hearts Club Band", 'song': 'Lovely Rita'},
 {'album': "Sgt. Pepper's Lonely Hearts Club Band",
  'song': "She's Leaving Home"},
 {'album': "Sgt. Pepper's Lonely Hearts Club Band",
  'song': 'Strawberry Fields Forever'},
 {'album': "Sgt. Pepper's Lonely Hearts Club Band",
  'song': 'Good Morning Good Morning'},
 {'album': "Sgt. Pepper's Lonely Hearts Club Band",
  'song': "When I'm Sixty-Four"},
 {'album': "Sgt. Pepper's

* Write functions that perform the following: 

`album_most_top_songs` - 
    * Returns the name of the artist and album that has that most songs featured on the top 500 songs list

`top_ten_albums_by_songs` - returns a dictionary with the 10 albums that have the most songs that appear in the top songs list. The album names should be the keys and the corresponding values should be the number of songs that appear on the top 500 list.