# Nested Loops Lab

### Introduction

In this lesson, we'll review working with lists of dictionaries.  We'll be asked to select attributes from each dictionary in the list, and to coerce data in each dictionary.  When work on this task, we recommend operating on a single element before solving the problem for all of the elements -- so please, use that technique.

### Loading our Data

In [51]:
url = "https://en.wikipedia.org/wiki/List_of_most-streamed_songs_on_Spotify"
import pandas as pd

dfs = pd.read_html(url)

In [52]:
df = dfs[0]

In [53]:
songs = df.to_dict('records')
songs[:3]

[{'Rank': '1',
  'Song': '"Blinding Lights"',
  'Artist(s)': 'The Weeknd',
  'Streams (billions)': '3.730',
  'Release date': 'November 29, 2019',
  'Ref.': '[4][5]'},
 {'Rank': '2',
  'Song': '"Shape of You"',
  'Artist(s)': 'Ed Sheeran',
  'Streams (billions)': '3.580',
  'Release date': 'January 6, 2017',
  'Ref.': '[6]'},
 {'Rank': '3',
  'Song': '"Someone You Loved"',
  'Artist(s)': 'Lewis Capaldi',
  'Streams (billions)': '2.911',
  'Release date': 'November 8, 2018',
  'Ref.': '[7]'}]

### Working with our data

Now that we've downloaded our data, let's start exploring it.  Begin by selecting the first element from our list of songs, and assign it to the variable `first_song`.

In [54]:
first_song = songs[0]

first_song

# {'Rank': '1',
#  'Song[A]': '"Blinding Lights"',
#  'Streams(billions)': '3.374',
#  'Artist(s)': 'The Weeknd',
#  'Release date': '29 November 2019',
#  'Ref(s)': '[8][9]'}

{'Rank': '1',
 'Song': '"Blinding Lights"',
 'Artist(s)': 'The Weeknd',
 'Streams (billions)': '3.730',
 'Release date': 'November 29, 2019',
 'Ref.': '[4][5]'}

Now take a look at the last three albums in the list.

In [55]:
songs[-3:]

# [{'Rank': '99',
#   'Song[A]': '"Sugar"',
#   'Streams(billions)': '1.498',
#   'Artist(s)': 'Maroon 5',
#   'Release date': '29 August 2014',
#   'Ref(s)': '[143][144]'},
#  {'Rank': '100',
#   'Song[A]': '"Youngblood"',
#   'Streams(billions)': '1.489',
#   'Artist(s)': '5 Seconds of Summer',
#   'Release date': '12 April 2018',
#   'Ref(s)': '[145][146]'},
#  {'Rank': 'As of 25 January 2023',
#   'Song[A]': 'As of 25 January 2023',
#   'Streams(billions)': 'As of 25 January 2023',
#   'Artist(s)': 'As of 25 January 2023',
#   'Release date': 'As of 25 January 2023',
#   'Ref(s)': nan}]

[{'Rank': '99',
  'Song': '"I\'m Yours"',
  'Artist(s)': 'Jason Mraz',
  'Streams (billions)': '1.640',
  'Release date': '1 May 2008',
  'Ref.': '[130][131]'},
 {'Rank': '100',
  'Song': '"Despacito (remix)"',
  'Artist(s)': 'Luis Fonsi and Daddy Yankee featuring Justin Bieber',
  'Streams (billions)': '1.640',
  'Release date': '17 April 2017',
  'Ref.': '[132][133]'},
 {'Rank': 'As of August 8th 2023',
  'Song': 'As of August 8th 2023',
  'Artist(s)': 'As of August 8th 2023',
  'Streams (billions)': 'As of August 8th 2023',
  'Release date': 'As of August 8th 2023',
  'Ref.': 'As of August 8th 2023'}]

It looks like the last song is not a song at all, so instead select all songs except for the last one and assign it to a list called `selected_songs`.

In [56]:
selected_songs = []
songs.pop()
selected_songs = songs

In [57]:
len(selected_songs)

# 100

100

And let's confirm that the last song is in fact a song.  Select the last song from `selected_songs`.

In [58]:
last_song = selected_songs[-1]
last_song

# {'Rank': '100',
 # 'Song[A]': '"Youngblood"',
 # 'Streams(billions)': '1.489',
 # 'Artist(s)': '5 Seconds of Summer',
 # 'Release date': '12 April 2018',
 # 'Ref(s)': '[145][146]'}

{'Rank': '100',
 'Song': '"Despacito (remix)"',
 'Artist(s)': 'Luis Fonsi and Daddy Yankee featuring Justin Bieber',
 'Streams (billions)': '1.640',
 'Release date': '17 April 2017',
 'Ref.': '[132][133]'}

Selecting data.  Ok, now from the above list of dictionaries, let's create a a list of just the name of the each of the songs and assign it to the variable `names`.

In [59]:
names = []
for song in selected_songs:
  names.append(song['Song'])


In [60]:
names[:3]

# ['"Blinding Lights"', '"Shape of You"', '"Dance Monkey"']

['"Blinding Lights"', '"Shape of You"', '"Someone You Loved"']

Now find the *number* of artists that were listed more than once on the top 100 songs list.

In [61]:
artists = set()
dupes = set()
for song in songs:
  artist = song['Artist(s)']
  if artist in artists:
    dupes.add(artist)
  else:
    artists.add(artist)

print(len(dupes))
# 13

15


Ok, now if we return to our original list of dictionaries, there is certain data that does not look like its of the correct type.

Change the `Rank` values to integers, and `Streams(billions)` to floats. It also looks like each of the songs names have an extra single or double quotation mark at the beginning and end of the songs.  Remove these extra quotation marks from each of the songs.

Assign this new list of songs to the variable `coerced_songs`.

In [62]:
coerced_songs = []
for song in songs:
    song['Rank'] = int(song['Rank'])
    song['Streams (billions)'] = float(song['Streams (billions)'])
    song['Song'] = song['Song'].replace('"', '')
    coerced_songs.append(song)


In [63]:
coerced_songs[:2]

# [{'Rank': 1,
#   'Song[A]': 'Blinding Lights',
#   'Streams(billions)': 3.374,
#   'Artist(s)': 'The Weeknd',
#   'Release date': '29 November 2019',
#   'Ref(s)': '[8][9]'},
#  {'Rank': 2,
#   'Song[A]': 'Shape of You',
#   'Streams(billions)': 3.354,
#   'Artist(s)': 'Ed Sheeran',
#   'Release date': '6 January 2017',
#   'Ref(s)': '[10]'}]

[{'Rank': 1,
  'Song': 'Blinding Lights',
  'Artist(s)': 'The Weeknd',
  'Streams (billions)': 3.73,
  'Release date': 'November 29, 2019',
  'Ref.': '[4][5]'},
 {'Rank': 2,
  'Song': 'Shape of You',
  'Artist(s)': 'Ed Sheeran',
  'Streams (billions)': 3.58,
  'Release date': 'January 6, 2017',
  'Ref.': '[6]'}]

Now if we select the `Rank` and `Streams(Billions)` from any of the dictionaries, we should see that they are of type integer.

In [64]:
first_coerced = coerced_songs[0]

type(first_coerced['Rank'])

# int

int

In [65]:
type(first_coerced['Streams (billions)'])

# float

float

And if we view the title of even the last song, we should see that the first character is no longer a quotation mark but a letter.

In [66]:
coerced_songs[-1]['Song'][:1]

# 'Y'

'D'

Now that we have this list of `coerced_songs`, let update our list of dictionaries even further.  If we look at one of the dictionaries, we'll see that date is hard to parse.

In [67]:
coerced_songs[0]['Release date']



'November 29, 2019'

We'd like to create three new keys on each of the dictionaries and of day, month and year.  Also remove the date published key, as the information in this key would then be duplicative.

> You can delete a key from a dictionary with the pop method.

In [None]:
blinding_lights = {'Rank': 1,
 'Song[A]': 'Blinding Lights',
 'Streams(billions)': 3.374,
 'Artist(s)': 'The Weeknd',
 'Release date': '29 November 2019',
 'Ref(s)': '[8][9]'}

blinding_lights.pop('Release date')

'29 November 2019'

In [None]:
blinding_lights

{'Rank': 1,
 'Song[A]': 'Blinding Lights',
 'Streams(billions)': 3.374,
 'Artist(s)': 'The Weeknd',
 'Ref(s)': '[8][9]'}

Ok, so create three new keys of `day`, `month` and year for each of our `coerced_songs` and then remove the `'Date published'` key.

Assign the new list to the variable `dated_songs`.

> Hint: It's easier if you solve accomplish this for one song first, before trying to solve this for all songs.

Assign the result to the variable `dated_songs`.

In [68]:
dated_songs = []
for song in coerced_songs:
    dates = song['Release date'].split(' ')
    if (dates[0].isnumeric()):
      song['day'] = dates[0]
      song['month'] = dates[1]
    if (dates[0].isalpha()):
      song['day'] = dates[1]
      song['month'] = dates[0]

    song['year'] = dates[2]
    song.pop('Release date')
    dated_songs.append(song)
print(dated_songs[:3])



[{'Rank': 1, 'Song': 'Blinding Lights', 'Artist(s)': 'The Weeknd', 'Streams (billions)': 3.73, 'Ref.': '[4][5]', 'day': '29,', 'month': 'November', 'year': '2019'}, {'Rank': 2, 'Song': 'Shape of You', 'Artist(s)': 'Ed Sheeran', 'Streams (billions)': 3.58, 'Ref.': '[6]', 'day': '6,', 'month': 'January', 'year': '2017'}, {'Rank': 3, 'Song': 'Someone You Loved', 'Artist(s)': 'Lewis Capaldi', 'Streams (billions)': 2.911, 'Ref.': '[7]', 'day': '8,', 'month': 'November', 'year': '2018'}]


In [None]:
# dated_songs[:2]

# [{'Rank': 1,
#   'Song[A]': 'Blinding Lights',
#   'Streams(billions)': 3.374,
#   'Artist(s)': 'The Weeknd',
#   'Release date': '29 November 2019',
#   'Ref(s)': '[8][9]',
#   'day': 29,
#   'month': 'November',
#   'year': 2019},
#  {'Rank': 2,
#   'Song[A]': 'Shape of You',
#   'Streams(billions)': 3.354,
#   'Artist(s)': 'Ed Sheeran',
#   'Release date': '6 January 2017',
#   'Ref(s)': '[10]',
#   'day': 6,
#   'month': 'January',
#   'year': 2017}]

### Bonus

Ok, now remember that we like to convert as many values as possible to numbers.  One of the attributes that perhaps should be a number is the month.  We'd like to convert `January` to `1` and `November` to `11` for example.  We'll get you started with this by creating a dictionary that has the keys and corresponding value for each month.  

In [37]:
month_nums = {'January': 1, 'February': 2, 'March': 3, 'April': 4, 'May': 5, 'June': 6, 'July': 7, 'August': 8, 'September': 9, 'October': 10, 'November': 11, 'December': 12}

And now notice that if we pass the month as any key, we are returned the corresponding value.

In [None]:
month_nums['January']

1

So use the above, to convert each of the `dated_songs` month attributes to the corresponding number.  Assign the result to the list `formatted_songs`.

In [69]:
formatted_songs = []
print(coerced_songs[:2])
for song in dated_songs:
    song['month'] = month_nums[song['month']]
    formatted_songs.append(song)
print(formatted_songs)



[{'Rank': 1, 'Song': 'Blinding Lights', 'Artist(s)': 'The Weeknd', 'Streams (billions)': 3.73, 'Ref.': '[4][5]', 'day': '29,', 'month': 'November', 'year': '2019'}, {'Rank': 2, 'Song': 'Shape of You', 'Artist(s)': 'Ed Sheeran', 'Streams (billions)': 3.58, 'Ref.': '[6]', 'day': '6,', 'month': 'January', 'year': '2017'}]
[{'Rank': 1, 'Song': 'Blinding Lights', 'Artist(s)': 'The Weeknd', 'Streams (billions)': 3.73, 'Ref.': '[4][5]', 'day': '29,', 'month': 11, 'year': '2019'}, {'Rank': 2, 'Song': 'Shape of You', 'Artist(s)': 'Ed Sheeran', 'Streams (billions)': 3.58, 'Ref.': '[6]', 'day': '6,', 'month': 1, 'year': '2017'}, {'Rank': 3, 'Song': 'Someone You Loved', 'Artist(s)': 'Lewis Capaldi', 'Streams (billions)': 2.911, 'Ref.': '[7]', 'day': '8,', 'month': 11, 'year': '2018'}, {'Rank': 4, 'Song': 'Dance Monkey', 'Artist(s)': 'Tones and I', 'Streams (billions)': 2.875, 'Ref.': '[8]', 'day': '10,', 'month': 5, 'year': '2019'}, {'Rank': 5, 'Song': 'Sunflower', 'Artist(s)': 'Post Malone and Sw

And now we can see that each of the songs is represented by a number.

In [70]:
formatted_songs[0]['month']

11

In [71]:
formatted_songs[1]['month']

1

### Summary

In this lesson, we practiced selecting and coercing individual attributes from a list of dictionaries.  Our main goal was to ensure that the attributes of each element were of the correct datatype, by coercing our data to numeric values when necessary.