<a href="https://colab.research.google.com/github/ludawg44/jigsawlabs/blob/master/Copy_of_8_complex_queries_lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Complex Queries Lab 

### Introduction

Over the last couple of lessons, we learned about how to perform more complex queries with the conditions of: 

* `np.isin`
* `np.any`
* `np.and`

And how to combine multiple conditions with the `&` and `|` operators.  In this lab, we'll put this knowledge towards querying our Spotify dataset.

### Loading Our Data

Let's load our Spotify dataset.  Remember that this has will give us the top songs from 2010 to 2019.

In [0]:
import pandas as pd
import numpy as np
url = "https://raw.githubusercontent.com/jigsawlabs-student/numpy-intro/master/top10s.csv"
tracks_df = pd.read_csv(url, encoding = "ISO-8859-1", index_col = 0)
tracks_df.head()

Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,14,80
4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79
5,Just the Way You Are,Bruno Mars,pop,2010,109,84,64,-5,9,43,221,2,4,78


In [0]:
tracks_cols = tracks_df.columns

In [0]:
tracks_np = tracks_df.to_numpy()

### Exploring the Data

Let's again, get a sense of what this dataset looks like.  Begin by checking the shape of the `tracks_np` array.

In [0]:
tracks_shape = None
tracks_shape
# (603, 14)

Then let's select the first three rows from array and first four columns.

In [0]:
# write code here

# array([['Hey, Soul Sister', 'Train', 'neo mellow', 2010],
#        ['Love The Way You Lie', 'Eminem', 'detroit hip hop', 2010],
#        ['TiK ToK', 'Kesha', 'dance pop', 2010]], dtype=object)

### Filtering Data

Now for this lab, we'll start you off by providing you with some of the artists and genres that showed up most in this dataset.

In [0]:
top_artists = ['Katy Perry', 'Justin Bieber', 'Maroon 5', 'Rihanna', 'Lady Gaga',
       'Bruno Mars', 'The Chainsmokers', 'Pitbull', 'Ed Sheeran',
       'Shawn Mendes']

In [0]:
top_genres = ['dance pop', 'pop', 'canadian pop', 'barbadian pop', 'boy band',
       'electropop', 'british soul', 'big room',
       'canadian contemporary r&b', 'neo mellow']

1. barbadian pop?

But don't just take our word for it.  `Barbadian pop`, for example, is that really a top genre in the US?    Select all of the tracks of type `barbadian pop` and assign it the variable `barabadian_pop_songs`.

In [0]:
barabadian_pop_songs = None

In [0]:
len(barabadian_pop_songs)

We can use `np.unique` to see if all of these `barabadian_pop_songs` were written by Rihanna.

In [0]:
np.unique(barabadian_pop_songs[:, 1], return_counts = True)

(array(['Rihanna'], dtype=object), array([15]))

So all of these songs were written by Rihanna.

2. All the pop

Let's look again at our list of genres.

In [0]:
top_genres = ['dance pop', 'pop', 'canadian pop', 'barbadian pop', 'boy band',
       'electropop', 'british soul', 'big room',
       'canadian contemporary r&b', 'neo mellow']

Looking at the above list of genres, the first six categories all look like different versions of `pop`.  Select all of the tracks that fall into one of these six categories.

In [0]:
pop_songs = None

In [0]:
len(pop_songs)

464

Ok, so 464 of the tracks are in one of these six genres.

3. Digging into dance pop

Now let's dig into the `dance pop` category.  First select all of the columns of type `dance pop`.  

In [0]:
dance_pop = None

In [0]:
len(dance_pop)

327

4. Low scoring pop 

Next, select any dance pop song that has `bpm`, `nrgy`, `'dnce'`, `dB` under 25.

> For pop songs, the scores for beats per minute (bpm), nrgy, dnce, and loudness measured in decibels (dB) range from the low 20s to high 90s.  So under 25 is pretty low.

Again here are the columns.

In [0]:
tracks_cols

Index(['title', 'artist', 'top genre', 'year', 'bpm', 'nrgy', 'dnce', 'dB',
       'live', 'val', 'dur', 'acous', 'spch', 'pop'],
      dtype='object')

In [0]:
low_scoring_pop = None

# array([['You Lost Me', 'Christina Aguilera', 'dance pop', 2010, 43, 39,
#         23, -6, 14, 7, 257, 85, 4, 56],
#        ['Clown', 'Emeli Sandé', 'dance pop', 2013, 130, 23, 45, -8, 11,
#         23, 221, 92, 4, 60]], dtype=object)

5. High scoring pop 

Next select songs where all of 'bpm', 'nrgy', 'dnce', and 'dB' are greater than 81.

In [0]:
highs_across = None
highs_across
# array([['Telephone', 'Lady Gaga', 'dance pop', 2010, 122, 83, 83, -6, 11,
#         71, 221, 1, 4, 73],
#        ['Sparks', 'Hilary Duff', 'dance pop', 2015, 122, 88, 85, -5, 10,
#         79, 186, 4, 6, 44],
#        ['WTF (Where They From)', 'Missy Elliott', 'dance pop', 2016, 120,
#         82, 93, -3, 6, 56, 193, 2, 20, 58],
#        ['My Way', 'Calvin Harris', 'dance pop', 2017, 120, 91, 82, -3,
#         16, 54, 219, 9, 4, 78]], dtype=object)


6. Fast but not pop 

Next let's find the songs that are not in one of the top six genres and are faster than 120 bpm.

In [0]:


non_top_pop_and_fast = None

In [0]:
# len(non_top_pop_and_fast)

# 62

Let's look at the genres that meet the above condition.

In [0]:
np.unique(non_top_pop_and_fast[:, 2], return_counts = True)

# (array(['acoustic pop', 'alaska indie', 'alternative r&b', 'art pop',
#         'atl hip hop', 'australian dance', 'australian pop', 'baroque pop',
#         'belgian edm', 'big room', 'british soul', 'brostep',
#         'canadian contemporary r&b', 'candy pop', 'chicago rap',
#         'colombian pop', 'complextro', 'downtempo', 'electro house',
#         'escape room', 'french indie pop', 'house', 'indie pop',
#         'irish singer-songwriter', 'latin', 'metropopolis', 'neo mellow',
#         'permanent wave', 'tropical house'], dtype=object),
#  array([2, 1, 1, 3, 1, 3, 5, 1, 2, 7, 6, 1, 2, 1, 1, 2, 4, 1, 1, 1, 1, 1,
#         2, 1, 2, 1, 2, 4, 2]))

(array(['acoustic pop', 'alaska indie', 'alternative r&b', 'art pop',
        'atl hip hop', 'australian dance', 'australian pop', 'baroque pop',
        'belgian edm', 'big room', 'british soul', 'brostep',
        'canadian contemporary r&b', 'candy pop', 'chicago rap',
        'colombian pop', 'complextro', 'downtempo', 'electro house',
        'escape room', 'french indie pop', 'house', 'indie pop',
        'irish singer-songwriter', 'latin', 'metropopolis', 'neo mellow',
        'permanent wave', 'tropical house'], dtype=object),
 array([2, 1, 1, 3, 1, 3, 5, 1, 2, 7, 6, 1, 2, 1, 1, 2, 4, 1, 1, 1, 1, 1,
        2, 1, 2, 1, 2, 4, 2]))

### Summary

In this lesson, we had practice working with the numpy methods like `np.any`, `np.all` and `np.isin` to create different kinds of conditions.  And we practiced combining conditions in a single query with the `&` and `|` operators.