<a href="https://colab.research.google.com/github/ludawg44/jigsawlabs/blob/master/Copy_of_5_filtering_lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Filtering Lab

### Introduction

Now let's put some of our numpy knowledge to explore data in Spotify.  In this lesson, we'll take a look at the top 10 songs through the years.

### Loading Data

In [0]:
import pandas as pd
import numpy as np

url = "https://raw.githubusercontent.com/jigsawlabs-student/numpy-intro/master/top10s.csv"
df = pd.read_csv(url, encoding = "ISO-8859-1", index_col = 0)
df.head()

Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,14,80
4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79
5,Just the Way You Are,Bruno Mars,pop,2010,109,84,64,-5,9,43,221,2,4,78


Above we loaded a set of csv data.  Notice that we chose `ISO-8859-1` as our encoding option.  We did this so that we could read certain unicode data.

Ok, enough of pandas.  Now let's start use numpy.

In [0]:
spotify_np = df.to_numpy()

> Let's store the list of pandas columns in an array.

In [0]:
spotify_cols = df.columns
spotify_cols

Index(['title', 'artist', 'top genre', 'year', 'bpm', 'nrgy', 'dnce', 'dB',
       'live', 'val', 'dur', 'acous', 'spch', 'pop'],
      dtype='object')

We can get a sense of what these attributes mean by looking at the [Spotify API here](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/).

### Working with Numpy

To see what this looks like in Numpy, let's begin by slicing the first three rows from our dataset.

In [0]:
first_three_rows = None
first_three_rows
# array([['Hey, Soul Sister', 'Train', 'neo mellow', 2010, 97, 89, 67, -4,
#         8, 80, 217, 19, 4, 83],
#        ['Love The Way You Lie', 'Eminem', 'detroit hip hop', 2010, 87,
#         93, 75, -5, 52, 64, 263, 24, 23, 82],
#        ['TiK ToK', 'Kesha', 'dance pop', 2010, 120, 84, 76, -3, 29, 71,
#         200, 10, 14, 80]], dtype=object)

Now that we've look at a few rows in numpy, let's see the shape of the data.

In [0]:
# your code here

# (603, 14)

So we can see that there are 603 entries.

Let's look at the maximum and minimum year for the entries.

In [0]:
max_year = None
max_year
# 2019

In [0]:
max_year = None
max_year
# 2010

### Selecting Data

Once again, here are the columns.

In [0]:
spotify_cols

Index(['title', 'artist', 'top genre', 'year', 'bpm', 'nrgy', 'dnce', 'dB',
       'live', 'val', 'dur', 'acous', 'spch', 'pop'],
      dtype='object')

Let's start by selecting all the first five artists listed in our `spotify_np` array.

In [0]:
# your code here 

# array(['Train', 'Eminem', 'Kesha', 'Lady Gaga', 'Bruno Mars'],
#       dtype=object)

array(['Train', 'Eminem', 'Kesha', 'Lady Gaga', 'Bruno Mars'],
      dtype=object)

Next let's select all of the rows where the artist is `Eminem`.

In [0]:
eminem_songs = None
eminem_songs

# array([['Love The Way You Lie', 'Eminem', 'detroit hip hop', 2010, 87,
#         93, 75, -5, 52, 64, 263, 24, 23, 82],
#        ['Walk On Water (feat. Beyoncé)', 'Eminem', 'detroit hip hop',
#         2018, 82, 44, 48, -10, 64, 62, 304, 81, 24, 65]], dtype=object)


Next, let's select the all of the years that `Beyonce` wrote a top song.

In [0]:
beyonce = "Beyoncé"

In [0]:
beyonce_years = None

beyonce_years
# array([2011, 2011, 2011, 2011, 2014, 2014, 2014, 2015], dtype=object)

> Note that we can see the years that she had the most top songs by using the `np.unique` method with `return_counts = True`.

In [0]:
top_years = np.unique(beyonce_years, return_counts=True)
top_years

(array([2011, 2014, 2015], dtype=object), array([4, 3, 1]))

So we can see that she had 4 top songs in 2011.  Let's see what they were.

Write a statement to select all of the rows from 2011.

In [0]:
spotify_2011 = spotify_np[(spotify_np[:, 3] == 2011)]

Then from there, select the songs from 2011 written by Beyoncé.

In [0]:
beyonce = "Beyoncé"

Next let's use `np.unique` to see a list of all of the genres in the list and the number of times they occur.

### Resources

[Kaggle Spotify Dataset](https://www.kaggle.com/leonardopena/top-spotify-songs-from-20102019-by-year)