# Instructions:

## Getting Started
In this exercise, we will be using data from `rolling stone's top 500 albums`. We have this data contained in the `data.csv` file. We will be building out the following functions to answer questions and interact with this data.

> **remember:** reading from a csv file in python looks like the following:

```python 
import csv

    with open(file_name) as f:
        # we are using DictReader because we want our information to be in dictionary format.
        reader = csv.DictReader(f)
        # some more code
```

Once we have our `reader` reading our file as dictionaries, we want our data to be a list of dictionaries. So, we need to loop through our `reader` and create a list. *hint: list comprehension / for loops are your friend"

```python
# our data will look something like this once we have read it and turned it into a list of `OrderedDict`s
# don't worry, the ordered dicts look different but we can interact with them the same way we do normal dicts
[OrderedDict([('number', '1'), ('year', '1967'), ('album', "Sgt. Pepper's Lonely Hearts Club Band"), ('artist', 'The Beatles'), ('genre', 'Rock'), ('subgenre', 'Rock & Roll, Psychedelic Rock')]), OrderedDict([('number', '2'), ('year', '1966'), ('album', 'Pet Sounds'), ('artist', 'The Beach Boys'), ('genre', 'Rock'), ('subgenre', 'Pop Rock, Psychedelic Rock')]), OrderedDict([('number', '3'), ('year', '1966'), ('album', 'Revolver'), ('artist', 'The Beatles'), ('genre', 'Rock'), ('subgenre', 'Psychedelic Rock, Pop Rock')])]
```

After we have our data formated the way we want it, we can now begin working on defining our functions.

In [88]:
import csv

with open('data.csv') as f:
    albums = []
    for row in csv.DictReader(f):
        albums.append(row)
     
albums

[{'album': "Sgt. Pepper's Lonely Hearts Club Band",
  'artist': 'The Beatles',
  'genre': 'Rock',
  'number': '1',
  'subgenre': 'Rock & Roll, Psychedelic Rock',
  'year': '1967'},
 {'album': 'Pet Sounds',
  'artist': 'The Beach Boys',
  'genre': 'Rock',
  'number': '2',
  'subgenre': 'Pop Rock, Psychedelic Rock',
  'year': '1966'},
 {'album': 'Revolver',
  'artist': 'The Beatles',
  'genre': 'Rock',
  'number': '3',
  'subgenre': 'Psychedelic Rock, Pop Rock',
  'year': '1966'},
 {'album': 'Highway 61 Revisited',
  'artist': 'Bob Dylan',
  'genre': 'Rock',
  'number': '4',
  'subgenre': 'Folk Rock, Blues Rock',
  'year': '1965'},
 {'album': 'Rubber Soul',
  'artist': 'The Beatles',
  'genre': 'Rock, Pop',
  'number': '5',
  'subgenre': 'Pop Rock',
  'year': '1965'},
 {'album': "What's Going On",
  'artist': 'Marvin Gaye',
  'genre': 'Funk / Soul',
  'number': '6',
  'subgenre': 'Soul',
  'year': '1971'},
 {'album': 'Exile on Main St.',
  'artist': 'The Rolling Stones',
  'genre': 'Rock

In [84]:
import matplotlib.pyplot as plt
import collections


In [87]:
#from functions import *


### Functions to build-out:

Each of the following functions can be defined in the `functions.py` file. 

* **Searching functions**
  * Find by name - Takes in a string that represents the name of an album. Should return a dictionary with the correct album, or return `None`.
  * Find by rank - Takes in a number that represents the rank in the list of top albums and returns the album with that rank. If there is no album with that rank, it returns `None`.
  * Find by year - Takes in a number for the year in which an album was released and returns a list of albums that were released in that year. If there are no albums released in the given year, it returns an empty list.
  * Find by years - Takes in a start year and end year. Returns a list of all albums that were released on or between the start and end years. If no albums are found for those years, then an empty list is returned. 
  * Find by ranks - Takes in a start rank and end rank. Returns a list of albums that are ranked between the start and end ranks. If no albums are found for those ranks, then an empty list is returned.
* **All functions**
  * All titles - Returns a list of titles for each album.
  * All artists - Returns a list of artist names for each album.
* **Questions to answer / functions**
  * Artists with the most albums - Returns the artist with the highest amount of albums on the list of top albums 
  * Most popular word - Returns the word used most in amongst all album titles
  * Histogram of albums by decade - Returns a histogram with each decade pointing to the number of albums released during that decade.
  * Histogram by genre - Returns a histogram with each genre pointing to the number of albums that are categorized as being in that genre.

In [85]:
# Pass album name, returns dict of album info.
def find_by_name(name, data_set):
    for album in data_set:
        if album['album'].lower() == name.lower():
            return album
        else:
            return None

In [86]:
find_by_name("Sgt. Pepper's Lonely Hearts Club Band", albums)

{'album': "Sgt. Pepper's Lonely Hearts Club Band",
 'artist': 'The Beatles',
 'genre': 'Rock',
 'number': '1',
 'subgenre': 'Rock & Roll, Psychedelic Rock',
 'year': '1967'}

In [15]:
# Pass album rank, returns album name, rank.
# def find_by_rank(rank, data_set):
#     for album in data_set:
#         if album['number'] == str(rank):
#             return album
#         return None

In [16]:
def find_by_rank(rank, our_data):
    for album in our_data:
        if int(album['number']) == rank:
            return album
    return None

In [17]:
find_by_rank(34, albums)

{'album': 'Music From Big Pink',
 'artist': 'The Band',
 'genre': 'Rock',
 'number': '34',
 'subgenre': 'Folk Rock, Acoustic, Blues Rock',
 'year': '1968'}

In [18]:
# Pass year, returns list of album names in that year. 
def find_by_year(year, data_set):
    result = []
    for album in data_set:
        if int(album['year']) == year:
            result.append(album['album'])
    return result

In [19]:
find_by_year(1960, albums)

['At Last!', 'Muddy Waters at Newport 1960', 'Sketches of Spain']

In [20]:
def find_by_years(start_yr, end_yr, data_set):
    result = []
    for album in data_set:
        if int(album['year']) in range(start_yr, end_yr + 1):
            result.append(album['album'])
    return result

In [21]:
find_by_years(1962, 1963, albums)

['Live at the Apollo, 1962',
 'Please Please Me',
 "The Freewheelin' Bob Dylan",
 'A Christmas Gift for You From Phil Spector',
 "Howlin' Wolf",
 'Presenting the Fabulous Ronettes Featuring Veronica']

In [22]:
# Pass start and end ranks, returns list of album names in range (inclusive).
def find_by_ranks(start_rk, end_rk, data_set):
    result = []
    for album in data_set:
        if int(album['number']) in range(start_rk, end_rk + 1):
            result.append(album['album'])
    return result

In [23]:
find_by_ranks(1,5, albums)

["Sgt. Pepper's Lonely Hearts Club Band",
 'Pet Sounds',
 'Revolver',
 'Highway 61 Revisited',
 'Rubber Soul']

In [24]:
#All titles - Returns a list of album titles for each album.
def all_albums(data_set):
    result = []
    for album in data_set:
        result.append(album['album'])
    return result

In [25]:
#all_albums(albums)

In [26]:
#All artists - Returns a list of artist names for each album.
def all_artists(data_set):
    result = []
    for album in data_set:
        result.append(album['artist'])
    return result

In [27]:
#all_artists(albums)

In [28]:
#Artists with the most albums - Returns the artist with the highest amount of albums on the list of top albums
def artist_w_most_albums(data_set):
    sample = all_artists(data_set)
    c = collections.Counter(sample)
    return c.most_common(5)

In [29]:
artist_w_most_albums(albums)

[('Bob Dylan', 10),
 ('The Beatles', 10),
 ('The Rolling Stones', 10),
 ('Bruce Springsteen', 8),
 ('The Who', 7)]

In [30]:
#Most popular word - Returns the word used most in amongst all album titles    
def most_popular_word(data_set):
    result = []
    for title in all_albums(data_set):
        word_list = title.split()
        for word in word_list:
            result.append(word)
    c = collections.Counter(result)
    return c.most_common(1)

In [31]:
most_popular_word(albums)

[('The', 73)]

In [32]:
#Histogram of albums by decade - Returns a histogram with each decade pointing to the number of albums released during that decade.
def year_hist(data_set):
    result = []
    for album in data_set:
        result.append(int(album['year']))
    
    num_bins = [1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020]
    plt.hist(result, num_bins, facecolor='blue', alpha=1)
    plt.title('Album Histogram')
    plt.xlabel('Number of Albums')
    plt.ylabel('Years')
    return plt.show()

In [33]:
year_hist(albums)

In [34]:
#Histogram by genre - Returns a histogram with each genre pointing to the number of albums that are categorized as being in that genre.                     
def genre_hist(data_set):
    result = []
    for album in data_set:
        result.append(album['genre']) 
        
    c = collections.Counter(result)
    plt.bar(range(len(c)), c.values(), alpha=1)
    plt.title('Genre')
    plt.xlabel('Genres')
    plt.ylabel('Counter')
    return plt.show()

In [35]:
genre_hist(albums)

## Next Steps

In [66]:
text_file = open('top-500-songs.txt', 'r')
lines = text_file.readlines()

song_list = []
for line in lines:
    line_list = line.split('\t')
    song_list.append(line_list)      

songs = []

for s in song_list:
    song = {'rank': s[0],
            'name': s[1],
            'artist': s[2],
            'year': s[3][:-1],
                }
    songs.append(song)
    
#songs


### Working with the top 500 songs

If we can't already re-use our searching functions (i.e. Find by name, Find by rank, Find by year, Find by years, Find by ranks), all functions (i.e. all titles, all artists), and questions-to-answer functions (i.e. Artists with the most albums (or songs), Most popular word, Histogram by decade, Histogram by genre) with the song data we just formatted, then refactor these functions so that they can be used with either set of data. This is a good practice for ensuring that our code is as reusable and modular as possible, which is important when writing code for any project, especially when it comes time to scale a project. Things are easier to read, and there is less code to worry about (and more importantly there is less code to debug when something goes wrong).

Once we have our functions working for both sets of data, we can start writing new functions!

Luckily for us, this next dataset is already made for us. We were curious to find out which songs on the top 500 songs overlapped with the top albums and vice versa. So, we created a data set that is a list of dictionaries in JSON format. Each dictionary contains the name of the artist, the album, and the tracks (songs) on that given album. We can use this data to check which songs on the top 500 list are featured on the albums on the top albums list.

To load our JSON file we will write:

```python
import json

file = open('track_data.json', 'r')
json_data = json.load(file)

print(json_data)

In [37]:
import json

file = open('track_data.json', 'r')
albums_json = json.load(file)
#albums_json

### Define the following functions:

**albumWithMostTopSongs** - returns the name of the artist and album that has that most songs featured on the top 500 songs list

**albumsWithTopSongs** - returns a list with the name of only the albums that have tracks featured on the list of top 500 songs

**songsThatAreOnTopAlbums** - returns a list with the name of only the songs featured on the list of top albums

**top10AlbumsByTopSongs** - returns a histogram with the 10 albums that have the most songs that appear in the top songs list. The album names should point to the number of songs that appear on the top 500 songs list.

**topOverallArtist** - Artist featured with the most songs and albums on the two lists. This means that if Brittany Spears had 3 of her albums featured on the top albums listed and 10 of her songs featured on the top songs, she would have a total of 13. The artist with the highest aggregate score would be the top overall artist.

In [38]:
# Returns the album with the most Top 500 songs.
def album_w_most_songs():
    sample = {}
    most_songs = 0
    top_album = []
    for album in albums_json:
        i = 0
        track_list = []
        for song in album['tracks']:
            track_list.append(song)
            for track in track_list:
                if track in top_500:
                    i += 1
                    sample[album['album']] = [album['artist'], i]
                    if i > most_songs:
                        most_songs = i
                        top_album = [album['artist'], album['album']]
    return top_album

In [138]:
album_w_most_songs()

['Elvis Presley', 'Elvis Presley']

In [39]:
# Returns list of albums with a Top 500 song.
def albums_w_top_songs():
    result = []
    for album in albums_json:
        track_list = []
        for song in album['tracks']:
            track_list.append(song)
            for track in track_list:
                if track in top_500:
                    result.append(album['album'])
    return list(set(result))

In [40]:
#albums_w_top_songs()

In [41]:
# Returns list of songs in the Top Albums.
def songs_in_top_albums():
    result = []
    for album in albums_json:
        if album['album'] in all_albums(albums):
            for song in album['tracks']:
                result.append(song)
    return result

In [42]:
#songs_in_top_albums()

In [45]:
def all_songs(data_set):
    result = []
    for album in data_set:
        result.append(album['name'])
    return result


top_500 = all_songs(songs)

In [81]:
# Returns hist of Top 10 Albums.
def album_top_10():
    sample = []
    for album in albums_json:
        track_list = []
        for song in album['tracks']:
            track_list.append(song)
            for track in track_list:
                if track in all_songs(songs):
                    sample.append(album['album'])
    counter = collections.Counter(sample)
    c = counter.most_common(10)
    numbers = []
    titles = []
    for item in counter.most_common(10):
        titles.append(item[0])
        numbers.append(item[1])
    plt.bar(range(len(titles)), numbers, alpha =1)
    plt.title('Top 10 Album')
    plt.xlabel('Albums')
    plt.ylabel('Apperances')
    
    plt.xticks(list(range(len(titles))), titles)
    return plt.show()
 

In [82]:
album_top_10()

In [154]:
# Returns top overall artist.
def top_overall_artist():
    sample = {}
    i = 0
    for artist in all_artists(albums):
        sample[artist] = i
    for album in albums_json:
        if album['album'] in all_albums(albums):
            sample[album['artist']] += 1
    for song in songs:
        if song['artist'] not in sample.keys():
            sample[song['artist']] = i + 1
        sample[song['artist']] += 1
    top = max(sample.values())
    for artist in sample:
        if sample[artist] == top:
            return artist, top

In [155]:
top_overall_artist()

('The Beatles', 33)

### End.