# Bootcamp Practice Notebooks:  Music Album Rankings Analysis

## Notebook 5:  Complex Analysis

 # Overview: Music Album Rankings #

`Rolling Stone` magazine is an American monthly magazine that focuses on music, politics, and popular culture. It was founded in San Francisco, California in 1967 and still publishes monthly to this day. The magazine is known for its coverage of music, entertainment, and politics.

In 2003, the magazine released its `“500 Greatest Albums of All Time,”` placing the Beatles’ “Sgt. Pepper’s Lonely Hearts Club Band” in the top slot. It has since released two additional `"500 Greatest"` lists, in 2012 and 2020. While not necessary for this analysis, to gain a full understanding of these rankings, see this Wikipedia article:  https://en.wikipedia.org/wiki/Rolling_Stone%27s_500_Greatest_Albums_of_All_Time



# Setup and Data Load

To get started, run the following code cells. They will load the data files and populate the `voters` and `albums` datasets that you will be working with.

In [None]:
# !wget https://github.com/gt-cse-6040/bootcamp/raw/main/practice_exercises/voters.json
# !wget https://github.com/gt-cse-6040/bootcamp/raw/main/practice_exercises/Rolling_Stone_500_public.json

In [None]:
import json

with open("voters.json", "r") as read_file:
    voters = json.load(read_file)
read_file.close()

with open("Rolling_Stone_500_public.json", "r") as read_file:
    albums = json.load(read_file)
read_file.close()

## As we noted in the description notebook, the feedback on the three lists focused on the following:

- The `2003` list was heavily criticized for being male-dominated, outmoded, and almost entirely Anglo-American in focus.

- The `2012` list was also heavily critcised in a similar manner, with one music critic noting that the only one album in the top 10 was less than 40 years old.

- The `2020` list was much more diverse in its representation of different music genres, musicians, and time periods. Music critics were much more positive in their reviews of the list, noting the lesser representation of white male rock musicians, and the move to recognize more contemporary albums and a wider range of tastes.

## So our final two exercises will analyze the voter and albums data, to confirm or refute the list feedback.

### Exercise 7 will focus on the voter demographics, and exercise 8 will be for the album demographics.

### From the results of these two exercises, you will be able to understand the feedback and determine for yourselves if it valid.

# Ex. 7 (**3 points**): `voter_demographics` #

Given a list of dictionaries, `voters_list`, complete the function,
```python
def voter_demographics(albums_list,years):
    ...
```
so that it returns a list of dictionaries of the voter demographics for the voters in the `voters` variable that is passed in. 

**Input:** 
- A list of dictionaries, `voters_list`. It will have the same format as the `voters` variable above. For testing, it may or may not contain all of the values in the variable, so your code must account for a different number of voters to be in your input.
- A list of integers of the year(s) to compute the demographics for. There may be one or more years in this list. The years may or may not correspond to the year(s) that voting occurred.

**Your task:** Copy this list and output the list of dictionaries.

**Output:** Return a list of dictionaries of the voter demographics for the albums in the `voters_list` variable that is passed in, without modifying the input `voters_list`. If the year passed in does not correspond to one of the years that voting occurred, return an empty list.

Each entry of the dictionaries in the list should contain the the following key-value pairs:
- Key = `'Year'`. 
- Value = String. Year that the voting occurred. Be aware of the data types for this variable.

- Key = `'Median_Birth_Year'`. 
- Value = Integer. The median birth year for the voters who participated in that year's voting. Round the computed value so that the output is an integer.

- Key = `'Average_Age'`. 
- Value = Integer. The average age for the voters who participated in that year's voting. Round the computed value so that the output is an integer.

- Key = `'Median_Age'`. 
- Value = Integer. The median age for the voters who participated in that year's voting. Round the computed value so that the output is an integer.

- Key = `'Teen_Decade'`. 
- Value = Sorted List of Tuples.

    Each tuple will have two elements:
      - String: The decade
      - Integer: The number of voters in that Teen Decade
      - The list of tuples should be sorted by the number of voters in descending order, with ties broken by the decade in ascending order.


**Caveat(s)/comment(s)/hint(s):**
1. The list of dictionaries passed into the function may be smaller or larger than the entire list of voters.

2. Ensure that your data types are correct, as the data types input may not always be the same as what you need to output.

3. For some voters, their age at the time of voting was not known. In these cases, the `Age_at_Vote` value is populated with `'N/A'`. Do not include these voters in your calculations.

4. Using the statistics module and Counter function may be helpful in solving the exercise.

#### A properly-coded function will return the following dictionary, for the demo data (which is the year 2003):

[{'Year': '2003',
  'Median_Birth_Year': 1952,
  'Average_Age': 50,
  'Median_Age': 51,
  'Teen_Decade': [('1970s', 92),
   ('1960s', 68),
   ('1980s', 54),
   ('1990s', 30),
   ('1950s', 14),
   ('1940s', 4),
   ('2000s', 4)]}]

In [None]:
### Exercise 7 solution -- 3 points ###
def voter_demographics(voters,years):
    import statistics
    from collections import Counter
    '''
    for each year, compute:
        0. Year of the vote
        1. Median voter birth year
        2. Voter average age
        3. Voter median age
        4. Count of teen decades
    ''' 

    ##BEGIN SOLUTION
    voter_counter = 0
    voter_age_list = []
    voter_birth_year_list = []
    teen_list = []
    teen_sort_list = []
    stats_dict = dict()
    ret_list = []

    # display(voters)
    
    for year in years:
        for voter in voters:
    #         print(voter)
            if voter['Year'] == str(year) and voter['Age_at_Vote'] != 'N/A':
                voter_age_list.append(int(voter['Age_at_Vote']))
                voter_birth_year_list.append(int(voter['Estimated_Birthyear']))
                voter_counter += 1
                teen_list.append(voter['Teenage_Decade'])
                                      
        teen_decade = dict(Counter(teen_list))
    
        for key,value in teen_decade.items():
            teen_sort_list.append((key,value))
          
        teen_sort_list = sorted(teen_sort_list, key = lambda x: [-x[1], x[0]]) 
             
                
        voter_median_age = round(statistics.median(voter_age_list))
        voter_average_age = round(statistics.mean(voter_age_list))
        voter_median_year = round(statistics.median(voter_birth_year_list))
        
        stats_dict['Year'] = str(year)
        stats_dict['Median_Birth_Year'] = voter_median_year
        stats_dict['Average_Age'] = voter_average_age
        stats_dict['Median_Age'] = voter_median_age
        stats_dict['Teen_Decade'] = teen_sort_list
        ret_list.append(stats_dict)
        
        # re-initialize variables        
        voter_counter = 0
        voter_age_list = []
        teen_list = []
        stats_dict = dict()
        teen_sort_list = []
        
    return ret_list

result5 = voter_demographics(voters,[2003])
display(result5)
assert result5 == [{'Year': '2003',
  'Median_Birth_Year': 1952,
  'Average_Age': 50,
  'Median_Age': 51,
  'Teen_Decade': [('1970s', 92),
   ('1960s', 68),
   ('1980s', 54),
   ('1990s', 30),
   ('1950s', 14),
   ('1940s', 4),
   ('2000s', 4)]}], 'Demo data does not pass.'
print('passed demo data')

# result_full = voter_demographics(voters,[2003,2020])
# display(result_full)

# Ex. 8 (**3 points**): `album_demographics` #

Given a list of dictionaries, `albums_list`, complete the function,
```python
def album_demographics(albums_list,years):
    ...
```
so that it returns a list of lists of dictionaries of the albums demographics for the albums in the `albums` variable that is passed in. 

**Input:** 
- A list of dictionaries, `albums_list`. It will have the same format as the `albums` variable above. For testing, it may or may not contain all of the values in the variable, so your code must account for a different number of albums to be in your input.

**Your task:** Copy this list and output the list of lists of dictionaries.

**Output:** Return a list of lists of dictionaries of the voter demographics for the albums in the `albums_list` variable that is passed in, without modifying the input `albums_list`.

Each entry of the dictionaries in the nested list should contain the the following key-value pairs:
- Key = `'Year'`. 
- Value = String. Year that the voting occurred. Be aware of the data types for this variable.

- Key = `'Median Release Year'`. 
- Value = Integer. The median release year for the albums that were ranked in that year's voting. Round the computed value so that the output is an integer.

- Key = `'Artist Gender Mix'`. 
- Value = Value = Sorted List of Tuples.

    Each tuple will have two elements:
      - String: The artist gender
      - Integer: The number of artists with that gender
      - The list of tuples should be in the order of 'Male', then 'Female', then 'Mixed'.

- Key = `'Artist Median Birth Year'`. 
- Value = Integer. The median birth year for the artists whose albums were ranked for that year. Round the computed value so that the output is an integer. For groups in which there are multiple members, convert the sum of the birth years field to an integer, divide by the number band members also converted to an integer, and round the result.

- Key = `'Average Billboard Peak'`. 
- Value = Integer. The average Billboard peak position for the albums that were ranked for that year. Round the computed value so that the output is an integer.

    
**Caveat(s)/comment(s)/hint(s):**
1. The list of dictionaries passed into the function may be smaller or larger than the entire list of albums.

2. Ensure that your data types are correct, as the data types input may not always be the same as what you need to output.

3. The input dictionaries do not have `Mixed` as a value for the key `Artist Gender`. In groups that have both male and female members, the value is some form of "Male/Female", but the spacing and capitization may vary. This is dirty data. The only thing we know for sure is that there will be a forward slash `'` in the field. Translate these gender inputs to output as `Mixed`.

4. Some of the albums that were ranked are know as `Compilation Albums`. What this means is that the songs on that album were not all performed by the same artist. An example of this is the album `The Best of the Girl Groups, Volume 1`, which is an album containing popular songs from the 1960s. In cases such as this, the `Clean_Name` field in the input dictionary will be `Various Artists`. Exclude these albums from your computations.

https://en.wikipedia.org/wiki/The_Best_of_the_Girl_Groups


5. Using the statistics module and Counter function may be helpful in solving the exercise.

6. A helper function may be useful to perform repetitive operations.

#### A properly-coded function will return the following list of lists of dictionaries, for the demo data:

[[{'Year': '2003'},
  {'Median Release Year': 1961},
  {'Artist Gender Mix': [('Male', 14), ('Female', 1)]},
  {'Artist Median Birth Year': 1930},
  {'Average Billboard Peak': 134}],
 [{'Year': '2012'},
  {'Median Release Year': 1962},
  {'Artist Gender Mix': [('Male', 12), ('Female', 2)]},
  {'Artist Median Birth Year': 1930},
  {'Average Billboard Peak': 144}],
 [{'Year': '2020'},
  {'Median Release Year': 1992},
  {'Artist Gender Mix': [('Female', 24), ('Male', 6)]},
  {'Artist Median Birth Year': 1966},
  {'Average Billboard Peak': 47}]]

In [None]:
albums_demo = albums[10:50]

In [None]:
### Exercise 8 solution -- 3 points ###
def album_demographics(album_list):  
    '''
    for each year, compute:
        0. Year
        1. Median release year
        2. Artist gender count -- Male/Female set to "Mixed"
        3. Median birth year of artist -- birth year / artist member count
        4. Average peak billboard position
    ''' 
    
    ##BEGIN SOLUTION
    
    def helper_func(albums_by_year,year):
        import statistics
        from collections import Counter
        
        some_list = []
        artist_release_list = []
        artist_gender_list = []
        final_gender_list = []
        artist_birth_year_list = []
        artist_peak_list = []
        for album in albums_by_year:
            artist_release_list.append(int(album['Release_Year']))
            artist_gender_list.append(album['Artist_Gender'])
            artist_birth_year_list.append(int(album['Artist_Birth_Year_Sum']))
            artist_peak_list.append(int(album['Peak_Billboard_Position']))
            
        #gender counts list
        gender_dict = dict(Counter(artist_gender_list))
        for key,value in gender_dict.items():
            final_gender_list.append((key,value))
            
        album_median_release_year = round(statistics.median(artist_release_list))
        artist_median_birth_year = round(statistics.median(artist_birth_year_list))
        album_average_peak = round(statistics.mean(artist_peak_list))
        
        some_list.append({'Year':year})
        some_list.append({'Median Release Year':album_median_release_year})
        some_list.append({'Artist Gender Mix':final_gender_list})
        some_list.append({'Artist Median Birth Year':artist_median_birth_year})
        some_list.append({'Average Billboard Peak':album_average_peak})
        
        
        return some_list
    
######### end of helper function ###################
    
    from copy import deepcopy
    
    ret_list = []
    list_2003 = []
    list_2012 = []
    list_2020 = []

#     don't give them deepcopy, they need to understand this
    album_list_copy = deepcopy(album_list)
    
    # cleaning
    # remove 'Various Artists'
    album_list_copy = [ele for ele in album_list_copy if ele['Clean_Name'] != 'Various Artists']
    
    # clean the gender and compute birth year of groups with more than 1 member
    for album in range(1,len(album_list_copy)-1):
        if '/' in album_list_copy[album]['Artist_Gender']:   
            album_list_copy[album]['Artist_Gender'] = 'Mixed'
#             print(album_list_copy[album])
        if int(album_list_copy[album]['Artist_Member_Count']) > 1:
            album_list_copy[album]['Artist_Birth_Year_Sum'] = str(round(int(album_list_copy[album]['Artist_Birth_Year_Sum']) / int(album_list_copy[album]['Artist_Member_Count'])))
     
    # create the yearly album lists, to send to the helper function
    for album in range(1,len(album_list_copy)-1): # using range because first one in the list is the headers
        if album_list_copy[album]['2003_Rank'] != '':  # in the 2003 list
            list_2003.append(album_list_copy[album])
        if album_list_copy[album]['2012_Rank'] != '':  # in the 2012 list
            list_2012.append(album_list_copy[album])
        if album_list_copy[album]['2020_Rank'] != '':  # in the 2020 list
            list_2020.append(album_list_copy[album])
            
    result_2003 = helper_func(list_2003,'2003')
    result_2012 = helper_func(list_2012,'2012')
    result_2020 = helper_func(list_2020,'2020')
    
    ret_list.append(result_2003)
    ret_list.append(result_2012)
    ret_list.append(result_2020)
    
    return ret_list
    
result = album_demographics(albums_demo)
display(result)

result2 = album_demographics(albums)
display(result2)