# 1.0 Uploading files from your local file system


In [11]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving top100.csv to top100 (1).csv
User uploaded file "top100.csv" with length 4707 bytes


# 2.0 Iterations and List Comprehensions

In the previous lessons we learned how to iterate over multiple values using a **for** loop. To review, let's look at a **for** loop in action:

```python
streams = [57,62,63,99,142]
average = 84

diff = []
for num in streams:
    diff.append(num - average)
```

Assuming the average number of music streams is **84**, we wrote three lines of code to compare the number of music streams against the average:

```python
diff = []
for num in streams:
    diff.append(num - average)
```

In this section, we'll show you how re-write this expression in **one line of code**.

We'll be using the same Spotify worldwide ranking dataset. Throughout this mission, we'll attempt to answer one question:

<img width="400" src="https://drive.google.com/uc?export=view&id=1c3KmRv2N3KqgGk-i1K60AYGw0gBsacfg">

In our quest to find the dominant artist of 2017, we'll be using the same [Spotify's WorldWide Daily Song Ranking dataset](https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking). Using this dataset, we'll learn:

- How to transform a list of strings into a dictionary of counts.
- How to transforms a three line for loop into one, beautiful line of code.
- How to write functions quickly and succintly.
- How to bypass an error within our code.

Let's get started!

**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">

1. Use the **csv** module to read **"top100.csv"** into a list and assign to **music**.
2. Preview the first few rows of the dataset to get a feel for what the column names are and what the data looks like.
3. To find the most dominant artist of 2017, we'll need to extract the artist name:
  - Create a new list called **artists** and extract the artist name from our dataset.
  - Loop through **music** and append the artist name to **artists**.

In [0]:
# put your code here

## 2.1 Extract the Artists Using a List Comprehension

In the previous sction, we wrote 3 lines of code to extract the artist names from our **music** dataset:

```python
artists = []
for row in music[1:]:
    artists.append(row[1])
```

However, we can re-write this for loop in one line of code using a **list comprehension**. A list comprehension is a concise way of creating lists.

Let's take a look at an example of a **for** loop that calculates the difference between the values and the average:

```python
streams = [57,62,63,99,142]
average = 84

diff = []
for num in streams:
    diff.append(num - average)
```

We've created a new list called **diff**. Now, if we wanted to write the equivalent in a list comprehension:

```python
diff = [(num-average) for num in streams]
```

Let's see the different components of a for loop converted into a list comprehension. Let's start by looping through our list:

<img width="500" src="https://drive.google.com/uc?export=view&id=1oRiLRrSDFbqVjMqiKoez9veCzbYwrdz1">


**num** in this case, is called an **iterable**. Whenever we loop through any data structure, Python will automatically look through each value in our data structure and return each individual value. This returned value is called an iterable. Read more about this concept [here](https://docs.python.org/3/tutorial/classes.html#iterators).

Now, let's define what we'd like to transform our iterable variables into:


<img width="500" src="https://drive.google.com/uc?export=view&id=1ZXJAfIhgb3AxyIrWa2Y2bVs9Y6EV3p4O">

Now, let's add the append() method to our list to create the new list:

<img width="500" src="https://drive.google.com/uc?export=view&id=1iJW9nHgeBFSVWlkLv1Q_jWpvQ_hom7SF">

Now that we understand list comprehensions, let's rewrite the code from the last screen's exercise as a list comprehension.

**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">

1. Convert the previous **for** loop into a list comprehension.
2. Store this list in **artists_lc.**

In [0]:
# put your code here


## 2.2 Getting the Artist Count Using a Function

We've extracted the artist names from **music** into a separate list named artists. Our next step, is to find the number of times the artist name appears in our dataset. Here's what the first few 5 rows of **music** look like:

```python
['Ed Sheeran',
 'Luis Fonsi',
 'Luis Fonsi',
 'The Chainsmokers',
 'Kendrick Lamar',
 ]
```

We'll first write our own function for counting. In the next screen, we'll make this calculation using a pre-existing module.

**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">

1. Write a function called **counter()**. The function do the following:
  - Accept a list of artists as an argument.
  - Build a dictionary with the unique counts for each artist:
    - The key should be the artist name
    - The value should be the associated count for that artist
  - Return a dictionary with the artist name as the key and the count of the artist as the value.
2. Pass in **artists** to the **counter()** function and store the returned result in **counts**.

In [0]:
# put your code here

## 2.3 Getting the Artist Count Using Collections

In the previous section, we wrote our own **counter()** function to practice creating our own functions. In most real-wrold scenarios, it makes more sense to use a module built into the Python language.

So far, we've used lists and dictionaries to solve specific problems. Lists and dictionaries are **data structures** that organize in specific ways. In a list, the data is organized by an incrementing integer index (**0** to **n-1**). In a dictionary, the data is organized by arbitrary keys that we can specify. 

The [collections](https://docs.python.org/3.3/library/collections.html) module contains a Counter object that we can use to replicate the same functionality. We can use the Counter object to calculate the number of occurrences for each value within a a data structure.

The returned Counter object behaves very similar to a dictionary, but contains other useful methods. To use **Counter(**, just pass in any iterable object to the object's constructor. Here's an example where we pass in a string value:

```python
Counter("hello")
```

Running this code returns a **Counter** object:

```python
Counter({'e': 1, 'h': 1, 'l': 2, 'o': 1})
```

You'll notice that the object doesn't preserve any specific order (either in keys or in values). Now, let's pass in a list:

```python
l = ["a","a","a","b"]
```

This will return:

```python
Counter({'a': 3, 'b': 1})
```

Let's start by using the **Counter()** function to create a **Counter** object representing all of the artist names.

**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">


1. From the **collections** library, import the function **Counter()**.
2. Create a **Counter** object from the values in **artists** and assign the result to **artist_counts**.

In [0]:
# put your code here

## 2.4 Looping Through Counts Using Items()

To extract the top value, we'll first convert our dictionary into a list of lists. To make this conversion, we'll use the **dict.items()** method to convert our dictionary into a list of tuples.

A **method** is a function specific to an object. We'll be diving deeper into creating methods later on, when we learn about object-oriented programming. The main difference between a method and function is the way they are used:

<img width="600" src="https://drive.google.com/uc?export=view&id=1gUxSkw53-XhPuQd7q-dI680pUtChTayM">


We call a function by it's name, add an argument and return data from the function. Data passed to a function is explicitly passed. Explicit passing, means that we identify the argument that we're applying to within our function: **sum(artist_list)**. Here, **artist_list**, is explicitly passed.

A method behaves like a function, it is associated specifically with the object. The main difference, is that the data structure is implicitly passed. This means, we do not need to explicitly specify the object within our method: **list.append(1)**. **list.append()**, will pass list through **append()** automatically. In this case, we'll only need to explicitly pass the value we want to append: **1**.

We'll have a better understanding of methods when we write our own classes later in this course.

The **dict.items()** method will convert the key, value pairs in a dictionary to key,value pairs in a list of tuples:

```python
dictionary = ({key:value, key:value})

resulting_structure = [(key, value), (key, value)]
```

Let's look at an example:

```python
sample = Counter({'21 Savage': 1, 'Alessia Cara': 1})
```

Then, we'll use call the **items()** method on sample:

```python
sample.items()
```

This would return:

```python
[('21 Savage', 1), ('Alessia Cara', 1)]
```

Then, we can loop through each tuple in this list, and append it to a new list to create a list of artist names and counts.

```python
sample.items()
sample_list = []
for first_value, second_value in sample.items():
    # Add to list
    sample_list.append([first_value, second_value])
```

**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">

1. Create an empty list and assign to **artist_counts_list**.
2. Use the **dict.items()** method to transform the counts dictionary into a list of tuples.
3. Write a for loop that iterates over the list of tuples:
  - Create a list from the 2 values in the tuple.
  - Append that list to **artist_counts_list.**
4. Display **artist_counts_list** using the **print()** function.


In [0]:
from collections import Counter
artist_counts = Counter(artists)

# Add your code here

## 2.5 Using a List Comprehension

In the previous screen, we used a **for** loop to create the new list of lists. Now that we understand the concept of list comprehensions, let's convert our for loop into a list comprehension. To review, here's how a **for** loop converts into a list comprehension:

**Exercise**

<img width="500" src="https://drive.google.com/uc?export=view&id=1ajinlMH6YfSdUAXHIaCmycI2vKXrKiC8">


**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">


1. Convert the **for** loop from the previous exercise into a list comprehension that assigns the result to **artist_counts_two** instead.
2. Display **artist_counts_two** using the **print()** function.






In [0]:
from collections import Counter
artist_counts = Counter(artists)
artist_counts_list = []
for artist, count in artist_counts.items():
    artist_counts_list.append([artist,count])
    
# put your code here

## 2.6 Sorting A List of Lists


Now, that we have our list of artist names and counts, to find the dominant artist of 2017, we'll need to:

- Sort our list in descending order by number of stream
- Extract the value at the first index.

To sort a list of values, we'll use the **list.sort()** method:

```python
streams = [54,33,76,99,123]
streams.sort()
```

When we call **list.sort()**, we do not need to store this expression in a variable like so: **streams = streams.sort()**. This is because this method modifies the associated list directly instead of returning a new object.

```python
streams = [54,33,76,99,123]
streams.sort()
print(streams)
```

This would return all the values in sorted order:

```python
[33, 54, 76, 99, 123]
```


**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">


1. Call the **sort()** method on the **artist_counts_list** nested list.
2. Select the first list from **artist_counts_list** and assign to **first_artist**.
  - Is this actually the top artist? Head to the next step to read more.

In [0]:
# put your code here

## 2.7 Specifying a Key When Sorting a List of Lists

In the previous screen, **artist_counts_list.sort()** sorted our list of lists in alphabetical order:

```python
[['21 Savage', 1], ['Alessia Cara', 1], ['Avicii', 1], ['Axwell /\\ Ingrosso', 1], ['Big Sean', 1], ['Bruno Mars', 2], ['CNCO', 1], ['Calvin Harris', 2], ['Camila Cabello', 1], ['Cardi B', 1], ['Charlie Puth', 1], ['Cheat Codes', 1], ['Childish Gambino', 1], ['Chris Jeday', 1], ['Clean Bandit', 2], ['DJ Khaled', 2],
.......
 ```
 
By default, if the data type within the list is a string, the python interpreter will automtically sort the list in alphabetical order. Since we were sorting a list of lists, the interpreter will automatically sort the lists by the value in the first index, which was a string.

If the data type is an **int** or **float**, the interpreter will automatically sort the numbers from lowest to highest:

```python
sample = [4,2,5,6,2,5]
sample.sort()
```

This returns:

```python
[2, 2, 4, 5, 5, 6]
```

However, in our scenario, sorting our **counts** list by artist name doesn't tell us the dominant artist of 2017.

Instead of sorting by the list of lists by the values in the 0th index value (artist names), we want to sort by the values at index value 1 (number of top 100 appearances for that artist).

The **key** parameter lets us specify a custom function for sorting. Python will pass each list in the list of lists into this function and use that for sorting. Let's look at a sample list of lists:

```python
sample = [
            [1,2,3,4,5],
            [4,4,5],
            [3,2]
         ]
```

Because **sample** has lists of varying lengths, we may be interested in sorting by the length of these lengths. We can accomplish that by passing in the **len** function to **key**:

```python
sample.sort(key = len)
```

Let's see how **.sort()** will sort a list of lists by the len key:



<img width="600" src="https://drive.google.com/uc?export=view&id=1wFnZGkZceFHURBOq9tXjN0fhWEzFsKdi">


After calculating the length for each value, each value will be sorted:


<img width="600" src="https://drive.google.com/uc?export=view&id=1uyNjNxvCLE-saR1cJZwXiH6VZ2jQzPTt">


Displaying **sample** after it was sorted this way would display:

```python
[[3,2], [4, 4, 5], [1, 2, 3, 4, 5]]
```

Then, if you'd like to sort in descending order, we'll add another parameter:

```python
sample.sort(key = len, reverse = True)
```

This would return:

```python
[[1, 2, 3, 4, 5], [4, 4, 5], [3, 2]]
```

To determine the top artist, we can write a function that just returns each list's value at index 1 (the number of top 100 appearances for that artist).


**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">



1. Sort **artist_counts_list** by the number of top 100 appearances by:
  - Using the key parameter and specifying the **by_count()** function.
  - Setting the parameter **reverse** to **True**.
2. Use indexing to select the first item of **artist_counts_list**. Assign the item to **top_artist**



In [0]:
def by_count(artists):
    return artists[1]

# put your code here

## 2.8 Creating An Anonymous Function

In the previous section, we defined the key parameter within the **sort()** method. We learned that the key parameter takes in a function:

```python
def by_count(artists):
    return artists[1]
  
artist_counts.sort(key=by_count, reverse=True)
```

In our previous section, we defined **by_count** and passed this through the **key** parameter. This took us about three lines of code. In python, there are two ways of writing functions. We learned the first way using **def** in the previous python course. Similar to how we reduced the size of our **for** loop into a list comprehension, we can also reduce the number of lines of a function using the **lambda** operator.

A lambda function is a small anonymous function:

```python
f = lambda x: x + 1
```

The equivalent using def:

```python
def f(x):
    return x + 1
```

Lambda functions have shortened notation (**lambda x** instead of **def f(x)**) and no function name associated. This makes lambda functions useful for short, throwaway functions that we don't plan on re-using later. This makes a lambda function the ideal choice for using with the **key** parameter when calling **dict.sort()**.



**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">

1. Sort **artist_counts_lol** by the number of top 100 appearances using a **lambda** function.
2. Select the top list in **artist_counts_lol** and assign to **lambda_top_artist**.



In [0]:
import csv
from collections import Counter
f = open("top100.csv","r")
music = list(csv.reader(f))

artists = [item[1] for item in music[1:]]
artists_counts_lol = [ [key, value] for key, value in Counter(artists).items()]

# put your code here

## 2.9 Creating a Pipeline Using Modularization

So far, we've added to If we were to aggregate the code, this would look like spaghetti code:

```python
f = open(“top100.csv”, “r’)
music = list(csv.reader(f))

artists = [row[1] for row in music[1:]]

artist_dict = Counter(artists)
artist_counts = [[key, value] for key, value in artist_dict.items()]

artist_counts.sort(key = lambda x: x[1], reverse=True)
```

In the previous section, we learned about **modularization**. Now, let's take it a step further by modularizing our code into a **pipeline**. A pipeline takes in an input, performs a specific set of actions, then produces an output:

<img width="600" src="https://drive.google.com/uc?export=view&id=1G9XEPAUS5PijOoYT7XVKvhal9ky7rnec">


By transforming our spaghetti code into a pipeline, we can be confident that feeding the pipeline additional data will produce the desired result. Each pipeline component feeds data into the next component.

We're creating a pipeline that takes in a list object and returns the most dominant artist. Let's transform our spaghetti code into a pipeline!


**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">

1. Build a 3 function pipeline that re-creates the work we did in this mission.
2. Create a **read_data()** function that:
  - Accepts a filename string as its sole parameter.
  - Reads in the file into a list and returns the list representation.
3. Create a **clean_data()** function that:
  - Accepts the list representation of the data as it's sole parameter.
  - Uses multiple lines of code to convert this list into a list of lists (as we did earlier).
  - Returns the list of lists representation of the data.
4. Uncomment the commented code when you're ready to run the full pipeline!

In [0]:
# Add your functions here

# Uncomment when ready
# music_as_list = read_data("top100.csv")
# sorted_lol = clean_data(music_as_list)

## 2.10 How to deal with errors

Now that we've built a data pipeline, we can pass in unseen data through our pipeline. However, if our dataset contains one erratic row, the interpreter will halt execution of the function or pipeline and return an error.

In most cases, we should dive in and fix the code causing the error. However, if the problem doesn't occur frequently or there's a chance of an unexpected error occuring, we can use a **try/except** statement. A **try/except** statement is a conditional statement similar to **if-else**, that re-directs the execution of code if the code runs into a specific error.

Let's say we wanted to find the total number of streams in a list by adding every value in a list:


<img width="500" src="https://drive.google.com/uc?export=view&id=1rMhpcByvdTfOueowHPyECXUXZJPLFYrb">

We can't add the **"NULL"** since we're adding a **str** with an **int**. This returns an error that halts execution of our code. Instead of altering the values of streams, we can throw in a **try/except** statement:


<img width="500" src="https://drive.google.com/uc?export=view&id=1OS-SC5B7wo0hy0_7vVMSwjBhEjvU-VGr">


We'll see that the error message "Error Occured" and the total of 211.

In [0]:
f = open("top100.csv", "r")
music = list(csv.reader(f))

cleaned_list = []
for row in music[1:]:
  try:
    cleaned_list.append([row[0],row[1],float(row[-1])])
  except:
    "Pass"
print(cleaned_list)

[['Shape of You', 'Ed Sheeran', 2993988783.0], ['Despacito - Remix', 'Luis Fonsi', 1829621841.0], ['Despacito (Featuring Daddy Yankee)', 'Luis Fonsi', 1460802540.0], ['Something Just Like This', 'The Chainsmokers', 1386258295.0], ['HUMBLE.', 'Kendrick Lamar', 1311243745.0], ['Unforgettable', 'French Montana', 1289150890.0], ['rockstar', 'Post Malone', 1260181617.0], ["I'm the One", 'DJ Khaled', 1254196301.0], ["It Ain't Me (with Selena Gomez)", 'Kygo', 1190339348.0], ['XO TOUR Llif3', 'Lil Uzi Vert', 1171827725.0], ["That's What I Like", 'Bruno Mars', 1136379512.0], ['New Rules', 'Dua Lipa', 1119944498.0], ['I Don\xe2\x80\x99t Wanna Live Forever (Fifty Shades Darker) - From "Fifty Shades Darker (Original Motion Picture Soundtrack)"', 'ZAYN', 1115034686.0], ['Attention', 'Charlie Puth', 1112777364.0], ['Mi Gente', 'J Balvin', 1091656642.0], ['Congratulations', 'Post Malone', 1082624976.0], ['Thunder', 'Imagine Dragons', 1067732868.0], ['Havana', 'Camila Cabello', 1042161672.0], ['Stay (

## 2.11 Passing new data into our pipeline

Let's finish the pipeline we've built so far by adding one last function.

**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">


1. Create a **top_artist()** function that:
  - Accepts the list of tuples representation of the data.
  - Selects the first list and returns it (corresponding to the top artist).
2. Uncomment the code that calls the functions when you're ready.

In [0]:
def read_data(filename):
    f = open(filename,"r")
    music = list(csv.reader(f))
    return music

def clean_data(csv_list):
    artists = [row[1] for row in csv_list[1:]]
    artist_dict = Counter(artists)
    artist_counts_list= [[key,value] for key,value in artist_dict.items()]
    artist_counts_list.sort(key=lambda x: x[1], reverse=True)
    return artist_counts_list

# Add your function here

# Uncomment when ready
# music_as_list = read_data("top100.csv")
# sorted_lol = clean_data(music_as_list)
# most_popular_artist = top_artist(sorted_lol)