# None Such Thing: Listening Habits
### Do all Kevin Nguyens listen to Illenium, or am I just racist?

If you grew up part of the Asian-American generation Z like me, there's a good chance you're aware of the stereotype of _Kevin Nguyen_. He drives a Civic, probably juuls, definitely listens to Illenium and pretends to like Travis Scott. But how true is this? Let's find out! 

#### The Game Plan

Let's set the ground rules before we dive in. To ascertain this information, we're going to need address the first part of our question: Do all Kevin Nguyens listen to Illenium? This is kind of tough. Spotify doesn't give us external access to a user's `liked` songs or listening history, so we'll use what we can get. We're going to aggregate _all_ the songs that each Kevin Nguyen has on _all_ his playlists. After all, if a song is on a playlist, he probably likes it in some way, even if it is ironically. But then again, who likes Illenium unironically? Joking.

I use Illenium in jest here, as a sort of heuristic that can set up the real question: how does our _perception_ of what Kevin Nguyen listens to compare with what Kevin Nguyen typically listens to? A quick search for playlists with _Kevin Nguyen_ in the name and description yields a tsunami of "Getting Boba With Kevin Nguyen" playlists that encompass the broad variety of mainstream EDM and aggressive hip hop that we'd expect. How do these playlists stack up to the real people? We can find out with some hypothesis testing and feature engineering, but for now, let's just dive in!

#### Setting Up Spotipy

Most of this will be done with the aid of `spotipy`, a lightweight Python wrapper for the Spotify Web API.

The initial setup is kind of tedious. To use the Client Credientials Flow, You need to register as a Spotify developer and obtain a `client id` and a `client secret`. This allows a higher throughput of requests that can be put to Spotify than the regular Authorization Code Flow. Not that we really need it. But this is actually registered as a Spotify app. Isn't life rich?

In [None]:
# set up the client id and secret
import os
os.environ["SPOTIPY_CLIENT_ID"] = "f09ddc90842441e68107eb66ef178403"
os.environ["SPOTIPY_CLIENT_SECRET"] = "316d709c188f48ac9d9152fed07f4de7"

Yes, I made both public on this notebook. No, I don't care.

In [3]:
# import the libraries and start the real work
import spotipy
from spotipy.oauth2 import SpotifyOAuth, SpotifyClientCredentials

auth_manager = SpotifyClientCredentials()
client = spotipy.Spotify(auth_manager=auth_manager)

***

### 1. Getting Comfortable With Spotipy

To get our feet wet, let's find out all of Illenium's albums and his top 10 songs.

In [4]:
illenium = 'spotify:artist:45eNHdiiabvmbp4erw26rg'
albums = client.artist_albums(illenium, album_type='album')['items']
songs = client.artist_top_tracks(illenium)['tracks']

In [5]:
for album in albums:
    print(album['name'])

ASCEND (Remixes)
ASCEND
Awake (Remixes)
Awake
Ashes (Remixes)
Ashes


In [6]:
for song in songs:
    print(song['name'])

Takeaway
In Your Arms (with X Ambassadors)
Nightlight
Feel Good (feat. Daya)
Good Things Fall Apart (with Jon Bellion)
Don't Let Me Down (feat. Daya) - Illenium Remix
Good Things Fall Apart (with Jon Bellion) [Tiësto's Big Room Remix]
Feel Something (With I Prevail)
Crashing (feat. Bahari)
Without Me - ILLENIUM Remix


Why is the Tiësto remix here. Gross. The original is so much better. Anyways, I digress. Now that we've got the hang of working with artists, lets get to the meat: profiles.

***

### 2. Working With Profiles

A quick search in the web player for "Kevin Nguyen" is a good starting point.

![title](pictures/nguyens.png)
<center>Wow. That is a lot of Kevin Nguyens. The full list actually goes down for miles, my lord.

Let's work with the first guy. Nothing personal, kid.

In [8]:
user_id = 'kevin.nguyen9852'
user = client.user(user_id)
print(user.keys())

dict_keys(['display_name', 'external_urls', 'followers', 'href', 'id', 'images', 'type', 'uri'])


Huh. So on first glance, it seems like Spotify doesn't allow us explicit access to their playlists.

We still have access to all the urls, though, so let's just play it by ear for the moment. Clicking through, we find what is probably the most Kevin Nguyen set of playlists to have ever Kevin Nguyened.

![title](pictures/glock_brian.png)

Rich Brian with the strap is the best thing on this page. Let's take a look at what's on it

In [9]:
# define a function that extracts all the artists from a playlist
from collections import Counter

# create a playlist class that makes things a little easier
class playlist:
    def __init__(self, playlist_id, head=client):
        self.obj = head.playlist(playlist_id=playlist_id)
        self.name = self.obj['name']
    
    def artists(self):
        tracks = self.obj['tracks']['items']
        artists = Counter()
        for track in tracks:
            artists.update(artist['name'] for artist in track['track']['artists'])
        return artists

In [11]:
mitre_10_id = '5EbNNdQvJ8U4TyuqetE5pY?si=wHISzC8XTBOIbsJ0oRB7yQ'
mitre_10 = playlist(mitre_10_id)
mitre_10.name, mitre_10.artists().most_common(10)

("2000's",
 [('$uicideBoy$', 43),
  ('Pouya', 12),
  ('Bass Santana', 11),
  ('Smokepurpp', 9),
  ('XXXTENTACION', 8),
  ('Ski Mask The Slump God', 7),
  ('Kin$oul', 6),
  ('Kid Trunks', 5),
  ('Shakewell', 5),
  ('Lil Xan', 4)])

Wow. This dude loves SuicideBoyS. No judgement, though.

***

### 3. Just One Kevin

So we have a problem. Spotify doesn't explicitly link a user and his playlists, which means we'll have to manually get the playlists for each user by using a scraper.

From there, we aggregate all the artists he has on a playlist, for _every_ playlist.

We're going to use `Beautiful Soup`, a web scraping and html parsing library alongside a `Selenium` web driver. `Selenium` might feel like a bit of overkill, but it'll come in useful little later on.

In [12]:
from urllib.request import urlopen
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

In [62]:
# set selenium options
options = Options()
options.add_argument = '--window-size1920.1200'
options.add_argument("/Users/rdz/Library/Application Support/Google/Chrome/Default")
options.headless = False
DRIVER_PATH = './chromedriver'

def __driver__(url):
    'uses a url to start a selenium chromedriver'
    driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
    driver.get(url)
    return driver

class Driver:
    'wrapper for selenium chromedriver'
    def __init__(self, url):
        'returns a driver using parameters passed to __driver__'
        self.driver = __driver__(url)
        self.source = self.driver.page_source

    def stop(self):
        self.driver.quit()


class Soup:
    'interface for selenium scraper and beautiful soup'
    def __init__(self, url):
        'returns bsoup object of driver source'
        self.driver = Driver(url)
        src = self.driver.source
        self.base = bs(src, features='lxml')

    def extract(self, *tags, to_text=False, normalize=True, **attributes):
        'get tags and attributes from soup base with option for text conversion'
        findings = self.base(*tags, **attributes)
        if to_text:
            findings = [result.get_text() for result in findings]
            if normalize:
                findings = list(filter(None, [element.strip() for element in findings]))
        return findings

Since the API actually provides no information from the `profile` object, we only need the url for the profile, and not the id. From there, we scrape the urls of all his playlists from the source code.

In [61]:
# scrolling output definition
from IPython.core.display import HTML
style = """
<style>
div.output_area {
    overflow-y: scroll;
}
div.output_area img {
    max-width: unset;
}
</style>
"""

nguyen = 'https://open.spotify.com/user/kevin.nguyen9852'
obj = Soup(nguyen) 
obj.base

<html class="no-focus-outline spotify__os--is-macos spotify__container--is-web" dir="ltr" lang="en"><head>
<meta charset="utf-8"/><title>Spotify – Web Player</title><!-- invalid metadata --><!-- invalid canonical_url --><!-- invalid link_tags --><!-- invalid schema_deep_link --><!-- invalid schema_extras --> <link href="https://open.scdn.co/cdn/images/favicon32.a19b4f5b.png" rel="icon" sizes="32x32" type="image/png"/>
<link href="https://open.scdn.co/cdn/images/favicon16.19fc3918.png" rel="icon" sizes="16x16" type="image/png"/>
<link href="https://open.scdn.co/cdn/images/favicon.5cb2bd30.ico" rel="icon"/> <link href="https://open.scdn.co/cdn/build/web-player/web-player.885d716f.css" rel="stylesheet"/>
<link href="https://open.scdn.co/cdn/generated/manifest-web-player.a3df468f.json" rel="manifest"/><script async="" src="https://connect.facebook.net/en_US/fbevents.js"></script><script async="" src="https://www.google-analytics.com/gtm/js?id=GTM-W53X654&amp;t=gtag_UA_5784146_31&amp;cid=89

Wow. That is beyond terrifying. Thankfully, the playlists are actually pretty easy to find, since they're all tagged with `<a>`:

In [41]:
findings = obj.extract('a', to_text=True)
findings[9:19]

["2000's",
 'mitre 10',
 'Small potatoes',
 'mangoes',
 'bawawaweewa',
 '[sLUms+CO]',
 'lo-fi anime beats to study and chill to',
 'yeughh',
 'septombur',
 'AUGUST']

So we can get the names. That's cool. But how about the playlist IDs? Since those are what we need. On close inspection, they're the `href` attribute of the `a` class.

In [42]:
findings = obj.extract('a')
for element in findings[9:19]:
    print(element['href'])

/playlist/5EbNNdQvJ8U4TyuqetE5pY
/playlist/7lHgB4bzld24t7kgy4MK9K
/playlist/3RTdVqzTm3YgL3KLa13qQV
/playlist/2cV5Bwarta7LPXzbiaYTyf
/playlist/2Cb1eD1J6jlsxKBoNXTTtd
/playlist/0UYrhQh0B0oWx4mQINEdIQ
/playlist/5lM2ngCx1eF1QQiWXTT5h0
/playlist/0N0yZvX6kl2VTaU289UgnM
/playlist/0eyLdRCySHOjnnur6rpHA1
/playlist/4uhQ4gWB5KKraePWgeuXen


Oh yeah. It's all coming together.

In [43]:
# a wrapper to make things easier
class kevin:
    def __init__(self, url):
        self.soup = Soup(url).extract('a')[9:]

    def get_playlists(self):
        self.playlist_ids = [bowl['href'].replace('/playlist/', '') for bowl in self.soup]
        self.playlists = [playlist(id) for id in self.playlist_ids]

In [44]:
obj = kevin(nguyen)
obj.get_playlists()

# list the 10 most frequent artists in his first playlist
obj.playlists[0].artists().most_common(10)

[('$uicideBoy$', 43),
 ('Pouya', 12),
 ('Bass Santana', 11),
 ('Smokepurpp', 9),
 ('XXXTENTACION', 8),
 ('Ski Mask The Slump God', 7),
 ('Kin$oul', 6),
 ('Kid Trunks', 5),
 ('Shakewell', 5),
 ('Lil Xan', 4)]

There we go! Armed with only a user's url, we can now find and break down all of their playlists, and find out the artist density of their playlists:

In [45]:
# make a quick function that makes the full aggregate of artists
def artist_stats(self):
    return sum([playlist.artists() for playlist in self.playlists], Counter())

kevin.artist_stats = artist_stats
artists = obj.artist_stats()
artists.most_common(10)

[('Young Thug', 90),
 ('Lil Uzi Vert', 83),
 ('Future', 82),
 ('Earl Sweatshirt', 82),
 ('$uicideBoy$', 73),
 ('Tyler, The Creator', 65),
 ('A$AP Rocky', 63),
 ('Travis Scott', 58),
 ('FKA twigs', 58),
 ('Skepta', 56)]

Damn. FKA twigs? Not bad.

***

### 4. All The Kevins

Now that we've defined the full process for one Kevin, let's do _all_ of them.

In fact, what we're about to do is applicable to _all_ names, but just to keep this from getting out of pocket we're going to keep bullying Kevin. Thankfully, Spotify's search can be emulated right at the url level, so the `fstring` representation of any query is just:

In [63]:
first, last = 'kevin', 'nguyen'
query = f"https://open.spotify.com/search/{first}%20{last}/profiles"
query

'https://open.spotify.com/search/kevin%20nguyen/profiles'

Once again, we're going to use `Selenium` to get us all the kevins. All we need is the urls, so it shouldn't be too hard, right? I wish. Annoyingly, Spotify only lets you access profile search if you're logged in. Form submission and general interactivity is what `BeautifulSoup` lacks, which is why we opted for `Selenium`. There's still a problem: Selenium opens every instance with an empty profile with no cookies, which means we need to make a quick startup protocol to log us in. 

Let's do a test run on one dude first, and then extrapolate to every Kevin.

In [64]:
soup = Soup(query)
profiles = soup.extract('h1')

In [53]:
soup.base

<html class="no-focus-outline spotify__os--is-macos spotify__container--is-web" dir="ltr" lang="en"><head>
<meta charset="utf-8"/><title>Spotify – Web Player</title><!-- invalid metadata --><!-- invalid canonical_url --><!-- invalid link_tags --><!-- invalid schema_deep_link --><!-- invalid schema_extras --> <link href="https://open.scdn.co/cdn/images/favicon32.a19b4f5b.png" rel="icon" sizes="32x32" type="image/png"/>
<link href="https://open.scdn.co/cdn/images/favicon16.19fc3918.png" rel="icon" sizes="16x16" type="image/png"/>
<link href="https://open.scdn.co/cdn/images/favicon.5cb2bd30.ico" rel="icon"/> <link href="https://open.scdn.co/cdn/build/web-player/web-player.885d716f.css" rel="stylesheet"/>
<link href="https://open.scdn.co/cdn/generated/manifest-web-player.a3df468f.json" rel="manifest"/><script async="" src="https://s.pinimg.com/ct/lib/main.2424edb5.js"></script><script async="" src="https://www.google-analytics.com/analytics.js" type="text/javascript"></script><script async

In [None]:
class Query:
    def __init__(first, last):
        self.url = f"https://open.spotify.com/search/{first}%20{last}/profiles"
        self.base = Soup(self.url)