## 05 More lists and Dictionaries

Last week taught us a lot about creating lists and manipulating strings. Here we will get some more practice manipulating lists and organizing information with dictionaries. 

Let's start with a more grown-up dataset. With the recent passing of David Bowie, we want to make a fitting playlist of his entire catalog of albums. I will load in a file that contains information from each David Bowie album. This was obtained through a Python script pulling information from the MusicBrainz <https://musicbrainz.org/> database. It is stored in a JSON (JavaScript Object Notation) file, which we will learn about soon. 

In [None]:
import json

with open('../../datasets/bowie_trackdata.json') as f:
    bowie_all = json.load(f)

First, figure out how this information is organized. What is the data type? If it has multiple elements in it, what is the data type of each element? Use the `type` and `len` functions to investigate. 

In [None]:
print type(bowie_all)

print len(bowie_all)
print type(bowie_all[0])
print bowie_all[:5]


It looks like we have a list where each element is a dictionary. Write a small loop to print the keys for each element in `bowie_all`. Then write another one that prints all the values. 

In [None]:
for i in bowie_all:
    print i.keys()

for i in bowie_all:
    print i.values()


What can you conclude about the structure of `bowie_all`? What information does it hold, and how? Answer in the plain text box below. It won't interpret your answer as code, just text. (FYI, you can change a cell to plain text by clicking Cell -> Cell Type -> Raw NBConvert)

As a first step, try to print the album for each element of `bowie_all` using a loop. 

In [None]:
for i in bowie_all:
    print i['album']

Notice that each album is repeated many times. This is because each dictionary contains information for a single song. Create a list `bowie_albums` that has only the unique Bowie albums. How many does he have?  

In [None]:

bowie_albums = []

for i in bowie_all:
    album = i['album']
    
    if album not in bowie_albums:
        bowie_albums.append(album)
        
        
print len(bowie_albums)
    



Now make a loop that prints the track number and tracks from the album "Heroes" only.

In [None]:

for i in bowie_all:
    
    if i['album']=='Heroes':
        print i['tracknum'], i['title']

Notice it's a bit awkward to have the organization this way. What we *really* want is a hierarchical structure, where we can select "Artist" then "Album" to get all the tracks for a particular album. So to get all tracks for "Heroes" we would just do: 

```python
mymusic['David Bowie']['Heroes']
```

Let's try to make this dictionary from the work we did above. Make a dictionary `mymusic` that has 1 field named 'David Bowie'. That field will also be a dictionary, with a single field called 'Heroes'. This will be a list of all the track names. You will notice that the track names are repeated. This is because the database has duplicate copies of the album, because it was re-released a number of times. We will deal with these duplicates later. Here is a quick way to get only the unique elements: 

```python
set(mymusic['David Bowie']['Heroes'])
```

`set` is a data type I haven't shown you yet. It is a bit more complicated. It is like a list, except it only allows unique items in, no repeats. After you fill in the track list, change `mymusic['David Bowie']['Heroes']` to be a `set` so it only lists the unique tracks. 

In [None]:
mymusic = {} # " Make a dictionary mymusic..."
mymusic['David Bowie'] = {} #"...  that has 1 field named 'David Bowie'. That field will also be a dictionary"
mymusic['David Bowie']['Heroes'] = [] # with a single field called 'Heroes'. This will be a list...

#same code as above
for i in bowie_all:
    
    if i['album']=='Heroes':
        mymusic['David Bowie']['Heroes'].append(i['title']) #if it's in the album, append to our list!

mymusic['David Bowie']['Heroes'] = set(mymusic['David Bowie']['Heroes'])  #get rid of duplicates

print mymusic['David Bowie']['Heroes']

Notice we gave up some information. We can't get the album-level information, like the year. Make it so we can do this: 

```python
mymusic['David Bowie']['Heroes']['year']
mymusic['David Bowie']['Heroes']['tracks']
```


In [None]:
mymusic = {} 
mymusic['David Bowie'] = {} 
mymusic['David Bowie']['Heroes'] = {} 


tracks = []
years = []


for track in bowie_all:
    
    if track['album']=='Heroes':
        tracks.append(track['title']) 
        years.append(track['year'])
        

tracks = set(tracks) #removing duplicates
year = years[0] #just grab the first one


mymusic['David Bowie']['Heroes']['tracks'] = tracks
mymusic['David Bowie']['Heroes']['year'] = year


print mymusic['David Bowie']['Heroes']['tracks']
print mymusic['David Bowie']['Heroes']['year']

We want even *more* organization though. Make a new dictionary called `mymusic2`. Now we want it get all albums from `mymusic2` like so: 

```python
mymusic2['David Bowie']['Albums']
```

This should return a `list`. Each element in the list should be a dictionary. That dictionary should have the fields `title` and `tracks`. First, just fill it with information from the Heroes album. Confirm that it works by printing the results of:

```python 
mymusic2['David Bowie']['Albums'][0]['title']
mymusic2['David Bowie']['Albums'][0]['tracks']
```

In [None]:
mymusic2 = {}

mymusic2['David Bowie'] = {}
mymusic2['David Bowie']['Albums'] = []
mymusic2['David Bowie']['Albums'].append({'title': '','tracks': []})


for track in bowie_all:
    album = track['album']

    if track['album']=='Heroes':
        mymusic2['David Bowie']['Albums'][0]['tracks'].append(track['title'])
        mymusic2['David Bowie']['Albums'][0]['title'] = track['album']
        mymusic2['David Bowie']['Albums'][0]['year'] = track['year']

#optional - get rid of duplicates
mymusic2['David Bowie']['Albums'][0]['tracks'] = set(mymusic2['David Bowie']['Albums'][0]['tracks'])

print mymusic2['David Bowie']['Albums'][0]['title']
print mymusic2['David Bowie']['Albums'][0]['tracks']

Now see if you can loop through all of `bowie_all` and organize the information in this way. Make sure that you also have fields for `album_id` and `year`. You can skip `tracknum` and `length` for now. 

The hardest part will be keeping track of the changes in the album as you're looping through each individual song. You should have a variable called `previous_album` and another one called `current_album`. These should be updated as you loop through `bowie_all`. If the album changes, you should make a new entry in `mymusic2`, otherwise, you should be adding the individual tracks to `mymusic2['David Bowie']['Albums'][<album number>]['tracks']`

In [None]:
from pprint import pprint


#this loops through bowie_all and orgnanizes the information

mymusic2 = {}
mymusic2['David Bowie'] = {}
mymusic2['David Bowie']['Albums'] = []

previous_album = ''
album_num = -1

for track in bowie_all:
    if track['album'] != previous_album: #if the album changed
        album_num = album_num + 1
        previous_album = track['album'] #call the current one the new "previous album"
        mymusic2['David Bowie']['Albums'].append({}) #set up a blank dictionary
        mymusic2['David Bowie']['Albums'][album_num]['tracks'] = []
        mymusic2['David Bowie']['Albums'][album_num]['title'] = track['album'] #fill in album info
        mymusic2['David Bowie']['Albums'][album_num]['year'] = track['year']
       
    
    else:
        #othwerwise, just add the new tracks to the "tracks" key
        mymusic2['David Bowie']['Albums'][album_num]['tracks'].append(track['title']) 
        
    
        
pprint(mymusic2)

In [None]:
print mymusic2['David Bowie']['Albums'][0]['title']
print mymusic2['David Bowie']['Albums'][0]['tracks']

With this new organization, see if you can create a list of the titles of all David Bowie albums. Start with a variable you call `artist`, that you set equal to `mymusic2['David Bowie']` and loop through it. Save the list of albums as the variable `all_albums`


In [None]:
artist = mymusic2['David Bowie'] #instructions should have said "mymusic2"

allalbums = []
    
for album in artist['Albums']:
    allalbums.append(album['title'])

print allalbums

Now make some code that takes the `artist` variable and an album name (a string). From that information it should produce a list of the track names for that album. If it cannot find the album, that list will be empty. 

In [None]:
tracklist = []
  
album = 'Heroes' #change this to anything   
    
for i in artist['Albums']:
    if i['title']==album:
        tracklist = i['tracks']

print tracklist

Why are we doing all this work? Simple-- so you can make me a script that generates a David Bowie playlist. Copy-and-paste you code above, see if you can generate a list 30 random Bowie songs. It should do this by selecting 10 albums at random, and pulling 3 random tracks from each one.  

In [None]:
#NOTE: this will occasionally fail. I think that some albums have very few songs, so it breaks
#when you try to pull 3 random songs from a list that's less than 3. 
#it's OK, just re-run it. 

import random


num_albums = 10
num_tracks = 3

tracklist = [] #list to hold our random tracks


#code for all albums
artist = mymusic2['David Bowie'] #instructions should have said "mymusic2"

allalbums = []
    
for album in artist['Albums']:
    allalbums.append(album['title'])


randalbums = random.sample(allalbums,num_albums)

tracklist = []

for album in randalbums:
    
    alltracks = []
       
    for i in artist['Albums']:
        if i['title']==album:
            alltracks = i['tracks']


    randtracks = random.sample(alltracks,num_tracks) #grab random tracks
    tracklist.extend(randtracks) #add to our tracklist

random.shuffle(tracklist) #so albums aren't grouped together

print tracklist
