# Albums and Songs Lab

### Introduction

In this lesson, we'll use the skills we have learned over the past several lessons to answer questions about the top songs, artists and albums over the past fifty years.

### Working with Songs

Let's start by working with data regarding top 500 albums according to the Rolling Stone Magazine.

In [1]:
import pandas as pd
url = "https://raw.githubusercontent.com/data-eng-10-21/mod-1-a-data-structures/master/6-top-songs/data.csv"
df = pd.read_csv(url)
albums = df.to_dict('records')

In [3]:
albums[:2]

[{'number': 1,
  'year': 1967,
  'album': "Sgt. Pepper's Lonely Hearts Club Band",
  'artist': 'The Beatles',
  'genre': 'Rock',
  'subgenre': 'Rock & Roll, Psychedelic Rock'},
 {'number': 2,
  'year': 1966,
  'album': 'Pet Sounds',
  'artist': 'The Beach Boys',
  'genre': 'Rock',
  'subgenre': 'Pop Rock, Psychedelic Rock'}]

In [2]:
len(albums)

478

> Well, 478.

Let's write some functions to help us better explore the data.

* `all_albums` - Takes an argument of albums and returns the list of album names.

* `all_artists` - Takes argument of list of albums and returns a list of all artists (where each element is a string), and no artist is repeated.

* `find_by_name` - Has one argument of `album_name`. Returns a dictionary of the correct album, or `None` if no album is found.

* `find_by_ranks` - Takes `begin_rank` and `end_rank` as arguments.  Also possible to execute the function by just providing the `begin_rank` or `end_rank` (and not both).  If no arguments are provided the entire list of albums are returned.

* `find_by_years` - Takes `begin_year` and `end_year` as arguments, and returns a list of dictionaries for albums between those years.  Also possible to execute the function by just providing the `begin_year` or `end_year` (and not both).

In [4]:
def get_album_data_by_key(albums: list[dict], key_name: str):
  return [album[key_name] for album in albums]

In [6]:
def get_all_albums(albums: list[dict]):
  album_names = get_album_data_by_key(albums, 'album' )
  return album_names

# get_all_albums(albums)

In [13]:
def find_album_by_name(albums: list[dict], name: str) -> dict:
  for album in albums:
    if album['album'] == name:
      return album

In [15]:
find_album_by_name(albums, 'Elvis esley')

In [19]:
def find_album_by_ranks(albums: list[dict], begin_rank: float=float('-inf'), end_rank: float=float('inf')):
  return [album for album in albums if album['number'] > begin_rank and album['number'] < end_rank]


In [20]:
def find_album_by_year(albums: list[dict], begin_year: float=float('-inf'), end_year: float=float('inf')):
  return [album for album in albums if album['year'] > begin_year and album['year'] < end_year]


### Working with Songs

Next, let's load up data related to songs, and data that connects albums and songs.

In [21]:
import pandas as pd
songs_url = "https://raw.githubusercontent.com/data-eng-10-21/mod-1-a-data-structures/master/6-top-songs/top-500-songs.txt"
songs_df = pd.read_csv(songs_url, sep='\t', header = None, names = ['rank', 'song', 'artist', 'year'])
songs = songs_df.to_dict('records')

track_url = "https://raw.githubusercontent.com/data-eng-10-21/mod-1-a-data-structures/master/6-top-songs/track_data.json"
albums_and_tracks = pd.read_json(track_url)
albums_tracks = albums_and_tracks.to_dict('records')

In [22]:
songs[:2]

[{'rank': 1,
  'song': 'Like a Rolling Stone',
  'artist': 'Bob Dylan',
  'year': 1965},
 {'rank': 2,
  'song': 'Satisfaction',
  'artist': 'The Rolling Stones',
  'year': 1965}]

In [60]:
albums_tracks[0]
len(albums_tracks)

478

In [47]:
# first, we edit our data structure. Edit the album data structure so each song has an artist and album
def edit_albums_tracks(albums: list[dict]) -> list[dict]:
  clean_album_tracks = []
  for album in albums_tracks:
    filtered_tracks = [track.split(' - ')[0].split('(Reprise)')[0] for track in album['tracks']]
    clean_tracks = [{'artist': album['artist'], 'album': album['album'], 'name': \
                    track} for track in filtered_tracks]
    clean_album_tracks = clean_album_tracks + clean_tracks

  return clean_album_tracks

In [48]:
tracks_list = edit_albums_tracks(albums_tracks)
len(tracks_list)

7375

In [54]:
def remove_duplicate_tracks(track_list):
  seen_track_names = []
  seen_tracks = []
  for track in tracks_list:
    if track['name'] not in seen_track_names:
      seen_track_names.append(track['name'])
      seen_tracks.append(track)
  return seen_tracks

In [56]:
clean_tracks_list = remove_duplicate_tracks(tracks_list)
len(clean_tracks_list)

6402

In [61]:
# find top albums
top_song_names = [song['song'] for song in songs]
# create new data struct. for album names
album_names = {}
for track in albums_tracks:
  album_names[track['album']] = 0

len(album_names)

475

In [67]:
for track in clean_tracks_list:
  for song in songs:
    if track['name'] == song['song'] and song['artist'] == track['artist']:
        album_names[track['album']] += 1
sorted(album_names.items(), key = lambda name: name[1], reverse=True)

[('Elvis Presley', 36),
 ('The Great Twenty Eight', 30),
 ('Are You Experienced', 23),
 ('Pet Sounds', 18),
 ('Highway 61 Revisited', 18),
 ('Abbey Road', 17),
 ('Bringing It All Back Home', 17),
 ('Otis Blue: Otis Redding Sings Soul', 15),
 ("Sgt. Pepper's Lonely Hearts Club Band", 12),
 ('Rubber Soul', 12),
 ('London Calling', 12),
 ('Blonde on Blonde', 12),
 ('Born to Run', 12),
 ('Thriller', 12),
 ('The Joshua Tree', 12),
 ("Who's Next", 12),
 ('Led Zeppelin', 12),
 ('Let It Bleed', 12),
 ('Please Please Me', 12),
 ("Here's Little Richard", 12),
 ('Appetite for Destruction', 12),
 ('Sticky Fingers', 12),
 ('Star Time', 12),
 ('Led Zeppelin II', 12),
 ('40 Greatest Hits', 12),
 ('Portrait of a Legend 1951-1964', 12),
 ('Loaded', 12),
 ('Paranoid', 12),
 ('Live at Leeds', 12),
 ('Wheels of Fire', 12),
 ("A Hard Day's Night", 12),
 ('MTV Unplugged in New York', 12),
 ('The Sun Sessions', 11),
 ('Electric Ladyland', 11),
 ('Achtung Baby', 11),
 ('Purple Rain', 11),
 ('Lady Soul', 11),


* Write functions that perform the following:

`album_most_top_songs` -
    * Returns the name of the artist and album that has that most songs featured on the top 500 songs list

`top_ten_albums_by_songs` - returns a dictionary with the 10 albums that have the most songs that appear in the top songs list. The album names should be the keys and the corresponding values should be the number of songs that appear on the top 500 list.