# Spotify: A Breakdown of your Music Tastes

### Your Spotify Data
Have you ever wondered how Netflix suggests what other shows you should watch or how Spotify suggests playlists and songs for you to listen to? These algorithms are all data based on your viewing/listening history and often times leads us to finding some of our new favorites. Interested in figuring out how to do some of that yourself? Let's find out!

The Spotify API actually let's you access a lot of your data, so let's see if we can create some of the same analysis that they do. Let's understand our music tastes a little more and see how Spotify does the same!

You can access your Spotify data through the Spotify's API: https://developer.spotify.com/ and just sign into your account to get your access token.

One thing that we can do is look at our most popular artists on spotify (https://developer.spotify.com/documentation/web-api/reference/personalization/get-users-top-artists-and-tracks/). We can pull in the spotify api by either copy pasting results or using the requests package. You'll have to visit the spotify website to get an authentication token.

Spotify's api gives you access to tons of data but for now lets just look at the User's Top Artists and Tracks API. This gives us the following:


If you want to use the requests package, the code is below:

This is awesome! Using this we can find out a variety of details like:
- Who is my favorite artist?
- What are my top genre's that I listen to?
- Who is the least popular artist I listen to the most of?
- And so much more!

### Problem: Parsing JSON Data Files
Data comes in all shapes and sizes but is more often than not messy and ugly. While Pandas has become an extremely popular package for manipulating data, it's extremely intensive and difficult to use on non-flat file formats. Another prominent file format is JSON. JSON is simple and understandable, but is often tedious and confusing to parse and analyze.

As I'm sure you know, data management sucks up a majority of the time for any project. You have to get the data, format it, validate it, write additional functions and that's all before just performing analysis. Additionally, if you're part of a larger development project, it's easy for the code to diverge between you and other developers. But what if there was an easier way to get past that stage and get on to the analytics? What if writing the code was understandable to look back on and easy to manage?

For loops and various comprehensions can be used to parse the data, but they often results in ugly code that is difficult to maintain and pass on. In a professional environment, it's critical to maintain efficient and effective code and the delays in understanding and maintaining JSON data can result in all types of problems. This is where Pydantic Comes in....

### What is Pydantic?
Pydantic is a data validation and settings management package in Python that allows you to specify and validate the format of the data. Pydantic's Basemodel module utilizes DataClasses which allows you to create a class of what our data should look like so we can declare which fields are required and what types they should take, what errors should be raised, as well as declaring defaults and much more! 

Pydantic has a lot of great capabilities but we'll focus on the Data Validation and Structure part here.

### Why Pydantic?
1. **Data Validation**

      Reading through the entire JSON to look for outliers and anomalies can be impossible, Pydantic allows us to specify a structure and if the JSON doesn't follow it, we'll know.
    
    
2. **Readability**

      Pydantic models are clean and easy to read. All in the matter of 4 easy classes, we can easily understand the structure and fields of the entire JSON document. Pydantic makes it simple and efficient to locate and utilize any field, dictionary or list we would need. It's also especially good at making complicated, nested JSON files simple and readable.
    
    
3. **Maintenance**

      One of the best parts of Pydantic is that anyone can pick up your code, quickly understand the structure of the file you're reading and figure out where and how to update/fix/change any code. Additionally, if you add methods and functions (like we will below) to any of your classes, you no longer have to sort through what each function is doing to access one exact element of the JSON. Instead, we know exactly where it is, what it's doing, and how to adjust it.
      
##### Additionally, 
     
4. **Plays nicely with your IDE/linter/brain**

    There's no new schema definition micro-language to learn. If you know how to use python type hints, you know how to use pydantic. Data structures are just instances of classes you define with type annotations, so auto-completion, linting, mypy, IDEs (especially PyCharm), and your intuition should all work properly with your validated data.


5. **Dual Use**

    Pydantic's BaseSettings class allows pydantic to be used in both a "validate this request data" context and in a "load my system settings" context. The main differences are that system settings can be read from environment variables, and more complex objects like DSNs and python objects are often required.


6. **Fast**

    In benchmarks pydantic is faster than all other tested libraries.


7. **Extensible**

    Pydantic allows custom data types to be defined or you can extend validation with methods on a model decorated with the validator decorator.


8. **Dataclasses integration**

    As well as BaseModel, pydantic provides a dataclass decorator which creates (almost) vanilla python dataclasses with input data parsing and validation.

      
*(4-8) from Pydantic Documentation

### Pydantic in Action

In [None]:
from pydantic import BaseModel, ValidationError, validator
import typing
import json

with open(r"C:\Users\bajojl\Documents\Python Practice\Pydantic-with-Spotify\spotify.json") as f:
    data = json.load(f)

We can create what the structure should look like using the BaseModel package inside of Pydantic. As a reminder here is what the data looks like:

We can create our model like so and just to be sure, we'll add a validator as well

In [None]:
class External_Urls(BaseModel):
    spotify: str

class Followers(BaseModel):
    href: typing.Any
    total: int

class Images(BaseModel):
    height: int
    url: str
    width: int

class ItemsInner(BaseModel):
    external_urls: External_Urls
    followers: Followers
    genres: list
    href: str
    id: str
    images: typing.List[Images]
    name: str
    popularity: int
    type: str
    uri: str

class Total(BaseModel):
    items: typing.List[ItemsInner]
    total: int
    limit: int
    offset: int
    href: str
    previous: typing.Any
    next: bool

    # next true or false
    @validator('next')
    def next_boolean(cls, v):
        if v != True:
            raise ValueError('Next is not Boolean')
        return v

We can now call our object:

In [None]:
try:
    Total(**data)
except ValidationError as e:
    print(e)
# myobj = Total(**data)

Turns out that this is actually the last page so none ends up being null, now we know!

### Where does Data Analysis Come In?

Now that we have a data structure, we can create methods based on this structure to get us anything that we could ever want!
Who is our favorite artist based on Spotify's popularity metric?. You could try parsing through the weeks list for every contributor... or we can make a few changes to our classes:

In [None]:
## Getting your most popular artist without pydantic 
[i.get('name') for i in data.get('items') if i.get('popularity') == max([i.get('popularity') for i in data.get('items')])]

In [None]:
max(data.get('items'), key=lambda y: y.get('popularity')).get('name')

While the first way is pretty messy, the second way is pretty common. This doesn't seem all that bad right? Should be easy to go through that without the use of a package. However, what if you're working on a project with someone else? What if you're working on a project where someone needs the data model that you've created.

This creates a mess and we're suddenly lost if we need to fix a function. This is where Pydantic comes in:

In [None]:
class Total(BaseModel):
    items: typing.List[ItemsInner]
    total: int
    limit: int
    offset: int
    href: str
    previous: typing.Any
    next: typing.Any
        
    @property
    def favorite_artist(self):
        return max(self.items, key=lambda y: y.popularity).name


In [None]:
myobj = Total(**data)
myobj.favorite_artist

Bam! Just like that we have a quick and easy property that anyone can access, edit and maintain. We can also quickly tell where this function falls, what part of the schema it's going to be accessing and it's purpose!

Well your probably knew who you listened to the most already. **How about who are our top 5?**

In [None]:
class Total(BaseModel):
    items: typing.List[ItemsInner]
    total: int
    limit: int
    offset: int
    href: str
    previous: typing.Any
    next: typing.Any
        
    @property
    def top_5(self):
         return [x.name for x in sorted(self.items, key = lambda x: x.popularity, reverse = True)[:5]]
myobj = Total(**data)
myobj.top_5

**What if we want to know who is the Artist we listen to the most, yet is the least popular?**

In [None]:
class ItemsInner(BaseModel):
    external_urls: External_Urls
    followers: Followers
    genres: list
    href: str
    id: str
    images: typing.List[Images]
    name: str
    popularity: int
    type: str
    uri: str
    
    @property
    def score(self):
        return self.popularity/ 100 * self.followers.total

class Total(BaseModel):
    items: typing.List[ItemsInner]
    total: int
    limit: int
    offset: int
    href: str
    previous: typing.Any
    next: typing.Any
    
    
    @property
    def unpopular_favorite(self):
        scored = min(self.items, key = lambda x: x.score)
        return scored.name, scored.score

In [None]:
myobj = Total(**data)
myobj.unpopular_favorite

## Problem for Audience
**What are the n most common genres based on your top 50 favorite artists?**
(hint: use the Counter module from the collections package)

In [None]:
##Code

In [None]:
##Answer

Can't do this from Jupyter but a fun example

## Wrapping up..Who else is using Pydantic and why you should too!
Hundreds of organisations and packages are using pydantic, including:

- **FastAPI**
a high performance API framework, easy to learn, fast to code and ready for production, based on pydantic and Starlette.
- **Project Jupyter**
developers of the Jupyter notebook are using pydantic for subprojects.
- **Microsoft**
are using pydantic (via FastAPI) for numerous services, some of which are "getting integrated into the core Windows product and some Office products."
- **Amazon Web Services**
are using pydantic in gluon-ts, an open-source probabilistic time series modeling library.
- **The NSA**
are using pydantic in WALKOFF, an open-source automation framework.
- **Uber**
are using pydantic in Ludwig, an an open-source TensorFlow wrapper

*From the Pydantic Documentation

Wrapping this all up, Pydantic is an extremely useful tool when it comes to Data Validation, Readability and Maintenence and is especially useful for pulling information from JSON files. Pydantic's quick and easy to use Dataclasses allow us to efficiently and  effectively format, validate and write functions to extract information and get to on to the analytics. Additionally, Pydantic makes working team projects smart, efficient and easy. Next time you're working with JSON, consider Pydantic!