# Recommendation Systems, Vector Databases, and Music

![main](../images/main_pic.png)

## Table of Contents

1. Overview
2. The Challenge
3. Audio Data
    - Intro
    - Data Prep
4. Vector Databases
    - Why do we need them?
    - How can we use them?
    - Enter Qdrant
        - Getting Started
        - Adding Points
        - Payloads
        - Search
5. Models and Vector Representations
6. Putting it all together
7. Final Thoughts

## 1. Overview

Vector databases are a relatively new way for interacting with abstract data representations derived from opaque machine learning models -- deep learning architectures being the most common ones. These representations are often called vectors or embeddings and they are a compressed version of the data used to train a machine learning model to accomplish a task (e.g., sentiment analysis, speech recognition, object detection, and many more).

Vector databases shine in many applications like semantic search and recommendation systems, and in this tutorial, we'll learn about how to build these two kinds of applications using [Qdrant](qdrant.tech), vector similarity search engine that provides a production-ready service with a convenient API to store, search, and manage points (i.e. vectors) with an additional payload.

Now, a tutorial is most useful when it involves a real use case, so let's go ahead and describe ours.

## 2. The Challenge

Building recommendation systems can be quite challenging. For starters, we never know apriori the needs and wants of the new customer of a store or the new user of a mobile app, and this makes is difficult to recommend, say, a toaster brand to someone searching for skillets at a store, or a Beatles' song to new users that just listened to a bachata by Romeo Santos and Drake.

The aforementioned problems -- "new user" == "no data" -- belong to the "cold start" family of problems in recommender systems and, while these problems might not go away anytime soon, there are ways to bypass them and serve relevant results from the get go, and one answer is vector databases. That said, that's what we will work on in this tutorial, we will build a music recommendation system on top of Qdrant.

Music data can be quite challenging to use given that most of it is copyright-protected so chances are you will need to find your data yourself to work on your use-case, use copyright-free music, or see if there is a non-conventional way to bypass the legal barriers around music data. That said, there's a latin music dataset in Kaggle called, [Tropical Genres Dataset](https://www.kaggle.com/datasets/carlossalazar65/tropical-genres-dataset). It contains 30-second clips of songs from the genres of bachata, salsa, merengue, cumbia, and vallenato. Some of these 30-second clips belong to the same songs and, while that is not ideal, it will work well for educational purposes.

Once you download it, you should see the following directories.

```sh
../data
├── Audios
│   ├── Bachata
│   ├── Cumbia
│   ├── Merengue
│   ├── Salsa
│   └── Vallenato
├── Spectograms
│   ├── Bachata
│   ├── Cumbia
│   ├── Merengue
│   ├── Salsa
│   └── Vallenato
```

The `Spectograms` directory contains spectograms, which are visual representation of the frequencies present in an audio signal over time. It is a 2D graph where the x-axis represents time and the y-axis represents frequency. The intensity of the color or brightness of the graph indicates the strength or amplitude of the frequencies at a particular time. Here is an example of a Spectogram.

![specto](../images/mel_specto.png)

If you've ever wonder what audio could look like visually, this is one way to visualize it.

Before we get to the recommendation systems part, let's go over what audio data is first. 

## 3. Audio Data

## Intro

Audio data is way of representing sound in a shape and form that can be processed and analyzed by computers. A wide range of applications use audio data and these include music production, telecommunications, and digital assistants like Siri and Alexa.

What is Sounds then? Sound is a form of energy that is produced by vibrations or oscillations in a medium such as air, water, or solid objects. When an object vibrates, it creates pressure waves that travel through the surrounding medium, carrying the energy of the vibrations. When speaking to another person, sound is the vibrations in the air coming out of their lungs.

When sound becomes audio, it is said to be in a physical or digital format that can be measured per second via one or multiple channels. This measurement per second is called Hertz and we humans can hear anywhere from 20 to 20,000. Some common audio formats contain 44,100 and the higher this number is the better the quality of the sound.


For data science use cases, we first divide the data into small segments. Each segment is then processed using a mathematical operation known as the Fourier transform, which breaks down the sound into its component frequencies.

After the Fourier transform has been applied, mel-scale filter banks are used to group the frequencies into a set of bands that more closely match the human auditory system's perception of sound. These filter banks amplify some frequency ranges while reducing others. This process results in a set of values for each segment of audio, which can be arranged to form a spectrogram.

Mel-spectrograms are useful in machine learning applications as they provide a way to represent audio data in a format that can be easily processed and analyzed by computer vision models.

Now that we know a little bit about audio data, let's examine a sample to get an intuition for how the process works.

Before you run any line of code, make sure you have 
1. downloaded the data
2. create a virtual environment (if not in Google Colab)
3. installed the packages below

```bash
# with conda or mamba if you have it installed
mamba env create -n my_env python=3.10
mamba activate my_env

# or with virtualenv
python -m venv venv
source venv/bin/activate

# install packages
pip install qdrant-client transformers datasets pandas numpy streamlit torch librosa "torchaudio<0.12"
```

### 3.2 Data Prep

In [None]:
from datasets import load_dataset, Audio
from IPython.display import Audio as player
import numpy as np

In [None]:
music_data = load_dataset(
    "audiofolder", data_dir="data/latin_music/Audios/", split="train"
).shuffle(seed=42).select(range(200))

music_data

In [None]:
music_data[0]

#### Salsa

![salsa](https://lp-cms-production.imgix.net/2021-12/puerto-rico-salsa-lyma-rodrigues-discover-puerto-rico-191-3-15686-rm.jpg?auto=format&q=75&w=1920)

Salsa is a popular music genre that originated in the 1960s in the Afro-Cuban and Puerto Rican communities of New York City. It is a fusion of various styles, including Cuban son, mambo, jazz, and other Latin American rhythms. Salsa music is known for its lively and energetic beats, intricate percussion, vibrant horn sections, and compelling dance rhythms.

Some top artists associated with the salsa genre include:

1. Celia Cruz
2. Héctor Lavoe
3. Willie Colón
4. Marc Anthony
5. Rubén Blades
6. Gilberto Santa Rosa
7. Eddie Palmieri
8. Oscar D'León
9. Ismael Rivera
10. Tito Puente

These are just a few notable artists, and there are many more talented musicians and bands that have contributed to the salsa genre throughout its history. The genre has evolved over time and continues to be popular worldwide, with artists from various countries and regions incorporating their own unique styles into salsa music.

In [None]:
from IPython.display import HTML
from pedalboard.io import AudioFile

In [None]:
HTML("""
<div align="center">
    <iframe width="700" height="450"
    src="https://www.youtube.com/embed/vwGp16NXgQU"
    </iframe>
</div>
""")

In [None]:
salsa_song = 'data/latin_music/Audios/Salsa/salsa0009.mp3'

with AudioFile(salsa_song) as f:
    song = f.read(f.frames)[0]
    display(player(song, rate=f.samplerate))

#### Merengue

![merengue](https://danceask.net/wp-content/uploads/2017/12/MERENGUE-DOMINICA-REPUBLIC.jpg)

Merengue is a lively music and dance genre that originated in the Dominican Republic. It is characterized by its fast-paced rhythms, syncopated beats, and a prominent use of the accordion, güira (a metal scraper), and tambora (a two-headed drum). Merengue is known for its infectious energy and is popular across Latin America and the Caribbean.

Here are some top artists associated with the merengue genre:

1. Juan Luis Guerra
2. Johnny Ventura
3. Sergio Vargas
4. Fernando Villalona
5. Eddy Herrera
6. Milly Quezada
7. Toño Rosario
8. Los Hermanos Rosario
9. Wilfrido Vargas
10. Los Toros Band

These artists have made significant contributions to the merengue genre and have gained international recognition for their music. Merengue continues to evolve, incorporating modern elements while maintaining its traditional roots. The genre is widely celebrated for its catchy melodies, rhythmic dance patterns, and vibrant performances.


In [None]:
HTML("""
<div align="center">
    <iframe width="700" height="450"
    src="https://www.youtube.com/embed/FGDs0_gUtfY"
    </iframe>
</div>
""")

In [None]:
merengue_song = 'data/latin_music/Audios/Merengue/merengue0200.mp3'

with AudioFile(merengue_song) as f:
    song = f.read(f.frames)[0]
    display(player(song, rate=f.samplerate))

#### Bachata

![bachata](https://www.barcelo.com/guia-turismo/wp-content/uploads/2020/12/republica-dominicana_merengue-y-bachata_888_1.jpg)

Bachata is a popular music genre that originated in the Dominican Republic. It emerged in the early 20th century and has since gained global popularity. Bachata is characterized by its melancholic and romantic lyrics, gentle guitar strumming, and a blend of African and European musical influences.

Here are some top artists associated with the bachata genre:

1. Romeo Santos
2. Juan Luis Guerra
3. Prince Royce
4. Anthony Santos
5. Aventura (group led by Romeo Santos)
6. Raulín Rodríguez
7. Zacarías Ferreira
8. Frank Reyes
9. Toby Love
10. Monchy & Alexandra (duo)

These artists have played a significant role in the development and popularization of bachata, both in the Dominican Republic and internationally. They have contributed to the evolution of the genre, incorporating elements from other styles such as pop, R&B, and urban music while staying true to the essence of bachata. Bachata has grown to become one of the most beloved Latin music genres, known for its emotional storytelling and heartfelt performances.

In [None]:
HTML("""
<div align="center">
    <iframe width="700" height="450"
    src="https://www.youtube.com/embed/t808BgOjZPo"
    </iframe>
</div>
""")

In [None]:
bachata_song = 'data/latin_music/Audios/Bachata/bachata0200.mp3'

with AudioFile(bachata_song) as f:
    song = f.read(f.frames)[0]
    display(player(song, rate=f.samplerate))

In [None]:
music_data = music_data.cast_column("audio", Audio(sampling_rate=16_000))

In [None]:
paths = [music_data[i]['audio']['path'] for i in range(len(music_data))]
ids = list(range(len(music_data)))

In [None]:
music_data = music_data.add_column("paths", paths)
music_data = music_data.add_column("ids", ids)

In [None]:
labels = music_data.features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = str(i)
    id2label[str(i)] = label

num_labels = len(id2label)

In [None]:
def get_names(label_num):
    return id2label[str(label_num)]

label_names = list(map(get_names, music_data['label']))
music_data = music_data.add_column("label_names", label_names)
music_data[-10]

Some fake data.

In [None]:
from faker import Faker

In [None]:
fake_something = Faker()
fake_something.name()

In [None]:
def add_metadata(example):
    path = example['paths']
    genre = example['label_names']
    example['metadata'] = {
        "artist":   fake_something.name(),
        "song":     " ".join(fake_something.words()),
        "url_song": path,
        "genre":    genre,
        "year":     fake_something.year(),
        "country":  fake_something.country()
    }
    return example

In [None]:
music_data = music_data.map(add_metadata)
music_data

In [None]:
music_data[100]['metadata']

In [None]:
music_data = music_data.remove_columns(["paths", 'label_names'])

## 4. Vector Databases

A vector database is a type of database designed to store and query high-dimensional vectors efficiently. In traditional [OLTP](https://www.ibm.com/topics/oltp) and [OLAP](https://www.ibm.com/topics/olap) databases, data is organized in rows and columns, and queries are performed based on the values in those columns. However, in certain applications including image recognition, natural language processing, and recommendation systems, data is often represented as vectors in a high-dimensional space. Here is a depiction of all three next to each other.

![dbs](../images/databases.png)

A vector in this context is a mathematical representation of an object or data point, where each element of the vector corresponds to a specific feature or attribute of the object. For example, in an image recognition system, a vector could represent an image, with each element of the vector representing a pixel value or a descriptor/characteristic of that pixel.

Vector databases are optimized for **storing** and **querying** these high-dimensional vectors efficiently, often using specialized data structures and indexing techniques such as Hierarchical Navigable Small World (HNSW), Approximate Nearest Neighbors, and Product Quantization, among others. These databases enable fast similarity and semantic search while allowing users to find vectors that are the closest to a given query vector based on some distance metric. The most commonly used distance metrics are Euclidean Distance, Cosine Similarity, and Dot Product.

### Why do we need Vector Databases?

Vector databases play a crucial role in various applications that require similarity search, such as recommendation systems, content-based image retrieval, and personalized search. By taking advantage of their efficient indexing and searching techniques, vector databases enable faster and more accurate retrieval of similar vectors, which helps advance data analysis and decision-making.

In addition, other benefits of using vector databases include:
1. Efficient storage and indexing of high-dimensional data.
3. Ability to handle large-scale datasets with billions or trillions of data points.
4. Support for real-time analytics and queries.
5. Ability to handle complex data types, such as images, videos, and natural language text.
6. Improved performance and reduced latency in machine learning and AI applications.
7. Reduced development and deployment time and cost compared to building a custom solution.

Keep in mind that the specific benefits of using a vector database may vary depending on the use case of your organization and the features of the database.

### Enter Qdrant

Qdrant "is a vector similarity search engine that provides a production-ready service with a convenient API to store, search, and manage points (i.e. vectors) with an additional payload." You can get started with plain python using the `qdrant-client`, pull the latest docker image of `qdrant` and connect to it locally, or try out Qdrant's Cloud free tier option until you are ready to make the full switch.

#### Overview of Qdrant's Architecture (High-Level)

![qdrant](../images/qdrant_overview_high_level.png)

The diagram above represents a high-level overview of some of the main components of Qdrant. Here are the terminologies you should get familiar with.

- [Collections](https://qdrant.tech/documentation/collections/): A collection is a named set of points (vectors with a payload) among which you can search. Vectors within the same collection must have the same dimensionality and be compared by a single metric.
- Distance Metrics: These are used to measure similarities among vectors and they must be selected at the same time you are creating a collection. The choice of metric depends on the way vectors obtaining and, in particular, on the method of neural network encoder training.
- [Points](https://qdrant.tech/documentation/points/): The points are the central entity that Qdrant operates with and they consist of a vector and an optional id and payload.
- id: a unique identifier for your vectors.
- Vector: a high-dimensional representation of data, for example, an image, a sound, a document, a video, etc.
- [Payload](https://qdrant.tech/documentation/payload/): A payload additional data you can add to a vector.
- [Storage](https://qdrant.tech/documentation/storage/): Qdrant can use one of  two options for storage, **In-memory** storage (Stores all vectors in RAM, has the highest speed since disk access is required only for persistence), or **Memmap** storage, (creates a virtual address space associated with the file on disk).
- Clients: the programming languages you can use to connect to Qdrant.

#### How do we get started?

The open source version of Qdrant is available as a docker image and it can be pulled and run from any machine with docker installed. If you don't have Docker installed in your PC you can follow the instructions in the official documentation [here](https://docs.docker.com/get-docker/). After that, open your terminal start by downloading the image with the following command.

```sh
docker pull qdrant/qdrant
```

Next, initialize Qdrant with the following command, and you should be good to go.

```sh
docker run -p 6333:6333 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant
```

You should see something similar to the following image.

![dockerqdrant](../images/docker_qdrant.png)

If you experience any issues during the start process, here is a link to the [discord channel](https://qdrant.to/discord) where the Qdrant team is always available and happy to help.


After your have your environment ready, let's get started with Qdrant.

**Note:** At the time of writing, Qdrant supports Rust, GO, Python and TypeScript. We expect other programming languages to be added in the future.

The two modules we'll use the most are the `QdrantClient` and the `models` one. The former allows us to connect to Qdrant or it allows us to run an in-memory database by switching the parameter `host=` to `":memory:"` (this is a great feature for testing in a CI/CD pipeline). We'll start by instantiating our client using `host="localhost"` and `port=6333` (as it is the default we used earlier with docker).

In [None]:
from qdrant_client import QdrantClient
from qdrant_client.http import models
from qdrant_client.http.models import CollectionStatus

In [None]:
client = QdrantClient(host="localhost", port=6333)
client

In OLTP and OLAP databases we call specific bundles of rows and columns **Tables**, but in vector databases the rows are known as vectors, the columns are known as dimensions, and the combination of the two (plus some metadata) as **collections**.

In the same way in which we can create many tables in a database, we can create many collections in a vector-based database using a client. The key difference to note is that when we create a collection, we need to specify the width of the collection (i.e. the length of the vector or amount of dimensions) beforehand with the parameter `size=...`, as well as the similarity metric with the parameter `distance=...` (which can be changed later on).

The distances currently supported by Qdrant are:
- [**Cosine Similarity**](https://en.wikipedia.org/wiki/Cosine_similarity) - Cosine similarity is a way to measure how similar two things are. Think of it like a ruler that tells you how far apart two points are, but instead of measuring distance, it measures how similar two things are. It's often used with text to compare how similar two documents or sentences are to each other. The output of the cosine similarity ranges from 0 to 1, where 0 means the two things are completely dissimilar, and 1 means the two things are exactly the same. It's a straightforward and effective way to compare two things!
- [**Dot Product**](https://en.wikipedia.org/wiki/Dot_product) - The dot product similarity metric is another way of measuring how similar two things are, like cosine similarity. It's often used in machine learning and data science when working with numbers. The dot product similarity is calculated by multiplying the values in two sets of numbers, and then adding up those products. The higher the sum, the more similar the two sets of numbers are. So, it's like a scale that tells you how closely two sets of numbers match each other.
- [**Euclidean Distance**](https://en.wikipedia.org/wiki/Euclidean_distance) - Euclidean distance is a way to measure the distance between two points in space, similar to how we measure the distance between two places on a map. It's calculated by finding the square root of the sum of the squared differences between the two points' coordinates. This distance metric is commonly used in machine learning to measure how similar or dissimilar two data points are or, in other words, to understand how far apart they are.

Let's create our first collection and have the vectors be of with 100 and the distance set to **Cosine Similarity**. Please note that, at the time of writing, Qdrant supports cosine similarity, dot product and euclidean distance.

In [None]:
my_collection = "first_collection"

first_collection = client.recreate_collection(
    collection_name=my_collection,
    vectors_config=models.VectorParams(size=100, distance=models.Distance.COSINE)
)
print(first_collection)

We can extract information related to the health of our collection by getting the collection. In addition, we can use this information for testing purposes, which can be very beneficial while in development mode.

In [None]:
collection_info = client.get_collection(collection_name=my_collection)
collection_info

In [None]:
assert collection_info.status == CollectionStatus.GREEN
assert collection_info.vectors_count == 0

There's a couple of things to notice from what we have done so far.
- The first is that when we initiated our docker image, we created a local directory called, `qdrant_storage`, and this is where all of our collections, plus their metadata, will be saved at. You can have a look at that directory in a *nix system with `tree qdrant_storage -L 2`, and something similar to the following should come up for you.
    ```bash
    qdrant_storage
    ├── aliases
    │   └── data.json
    ├── collections
    │   └── my_first_collection
    └── raft_state
    ```
- The second is that we used `client.recreate_collection` and this command, as the name implies, can be used more than once for a collection with the same name, so be careful no to recreate a collection that you did not intend to recreate. To create a brand new collection where trying to recreate another of the same name would throw an error, we would use `client.create_collection` instead.
- Our collection can only hold vectors of 100 dimensions and the distance metric has been set to Cosine Similarity.

Now that we know how to create collections, let's create a bit of fake data and add some vectors to our collection.

#### Adding Points

The points are the central entity that Qdrant operates with, and these points contain records consisting of a vector, an optional id and an optional payload (which we'll talk more about in the next section).

The optional id can be represented by unassigned integers or UUIDs but, for our use case, we will use a straightforward range of numbers.

Let's create a matrix of fake data containing 1,000 rows and 100 columns while representing the values of our vectors as `float64` numbers between -1 and 1. For simplicity, let's imagine that each of these vectors represents one of our favorite songs, and that each columns represents a unique characteristic of the artists/bands we love, for example, the tempo, the beats, the pitch of the voice of the singer(s), etc.

In [None]:
data = np.random.uniform(low=-1.0, high=1.0, size=(1_000, 100))
type(data[0, 0]), data[:2, :20]

Let's know create an index for our vectors.

In [None]:
index = list(range(len(data)))
index[-10:]

Once the collection has been created, we can fill it in with the command `client.upsert()`. We need the collection's name and the appropriate process from our `models` module, in this case, [`Batch`](https://qdrant.tech/documentation/points/#upload-points).

One thing to note is that Qdrant can only take in native Python iterables like lists and tuples. This is why you'll notice the `.tolist()` method attached to our `data` below.

In [None]:
client.upsert(
    collection_name=my_collection,
    points=models.Batch(
        ids=index,
        vectors=data.tolist()
    )
)

We can retrieve specific points based on their ID (for example, artist X with ID 1000) and get some additional information from that result.

In [None]:
client.retrieve(
    collection_name=my_collection,
    ids=[100],
    with_vectors=True # we can turn this on and off depending on our needs
)

We can also update our collection one point at a time, for example, as new data comes in.

In [None]:
def create_song():
    return np.random.uniform(low=-1.0, high=1.0, size=100).tolist()

In [None]:
client.upsert(
    collection_name=my_collection,
    points=[
        models.PointStruct(
            id=1000,
            vector=create_song(),
        )
    ]
)

We can also delete it in a straightforward fashion.

In [None]:
# this will show the amount of vectors BEFORE deleting them
client.count(
    collection_name=my_collection, 
    exact=True,
) 

In [None]:
client.delete(
    collection_name=my_collection,
    points_selector=models.PointIdsList(
        points=[1000],
    ),
)

In [None]:
# this will show the amount of vectors AFTER deleting them
client.count(
    collection_name=my_collection, 
    exact=True,
)

#### Payloads

Qdrant has incredible features on top of speed and reliability, and one of its most useful ones is without a doubt the ability to store additional information along with vectors. In Qdrant terminology, this information is considered a payload and it is represented as JSON objects. In addition, not only can you get this information back when you search in the database, but you can also filter your search by the parameters in the payload, and we'll see how in a second.

Imagine the fake vectors we created actually represented a song. If we were building a recommender system for songs then, naturally, the things we would want to get back would be the song itself, the artist, maybe the genre, and so on.

What we'll do here is to take advantage of `faker` again and create a bit of information to add to our payload and see how this functionality works.

In [None]:
payload = []

for i in range(len(data)):
    payload.append(
        {
            "artist":   fake_something.name(),
            "song":     " ".join(fake_something.words()),
            "url_song": fake_something.url(),
            "year":     fake_something.year(),
            "country":  fake_something.country()
        }
    )

payload[:3]

In [None]:
client.upsert(
    collection_name=my_collection,
    points=models.Batch(
        ids=index,
        vectors=data.tolist(),
        payloads=payload
    )
)

In [None]:
resutls = client.retrieve(
    collection_name=my_collection,
    ids=[10, 50, 100, 500],
    with_vectors=False
)
resutls

In [None]:
resutls[0].payload

#### Search

Now that we have our vectors with an ID and a payload, we can explore a few of ways in which we can search for content when, in our use case, new music gets selected. Let's check it out.

Say, for example, that a new song comes in and our model immediately transforms it into a vector.

In [None]:
living_la_vida_loca = create_song()

In [None]:
client.search(
    collection_name=my_collection,
    query_vector=living_la_vida_loca,
    limit=10
)

Now imagine that we only want Australian songs recommended to us.

In [None]:
aussie_songs = models.Filter(
    must=[models.FieldCondition(key="country", match=models.MatchValue(value="Australia"))]
)

In [None]:
client.search(
    collection_name=my_collection,
    query_vector=living_la_vida_loca,
    query_filter=aussie_songs,
    limit=5
)

Lastly, say we want aussie songs but we don't care how new or old these songs are.

In [None]:
client.search(
    collection_name=my_collection,
    query_vector=living_la_vida_loca,
    query_filter=aussie_songs,
    with_payload=models.PayloadSelectorExclude(exclude=["year"]),
    limit=5
)

As you can see, you can apply a wide-range of filtering methods to allows your users to take more control of the recommendations they are being served.

If you wanted to clear out the payload and upload a new for the same vectors, you can use `client.clear_payload()` as in the cell below.

In [None]:
client.clear_payload(
    collection_name=my_collection,
    points_selector=models.PointIdsList(
        points=index,
    )
)

## 5. Models and Vector Representations

In the context of audio data, embeddings and transformers are used to process the sound waves and extract features that are useful for training machine learning models.

### What are transformers?

In [None]:
from transformers import AutoModel, AutoFeatureExtractor, AutoModelForAudioClassification
import torch

Transformers are a type of neural network used for natural language processing, but they can also be used for processing audio data by breaking the sound waves into smaller parts and learning how those parts fit together to form meaning.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")
model = AutoModelForAudioClassification.from_pretrained(
    "facebook/wav2vec2-base", num_labels=num_labels, label2id=label2id, id2label=id2label
)#.to(device)

In [None]:
sample = music_data[23]['audio']["array"]
player(sample, rate=16_000)

In [None]:
inputs = feature_extractor(
    sample, sampling_rate=feature_extractor.sampling_rate, 
    return_tensors="pt", padding=True, return_attention_mask=True,
    max_length=16_000, truncation=True
)#.to(device)

In [None]:
with torch.no_grad():
    preds = model(**inputs).logits
preds

In [None]:
torch.argmax(preds).item()

In [None]:
model.config.id2label??

In [None]:
predicted_class_ids = torch.argmax(preds).item()
predicted_label = model.config.id2label[str(predicted_class_ids)]
predicted_label

While this isn't the most confident model in the world, it did predict bachata correctly.

### What are Embeddings?

Embeddings are a way of representing audio data as vectors or numbers, which makes it easier for machine learning algorithms to process and analyze them. These are the vectors that we will store in Qdrant in a second.

### Extracting Embeddings

In [None]:
model = AutoModel.from_pretrained("facebook/wav2vec2-base").to(device)

In [None]:
with torch.no_grad():
    embs = model(**inputs.to(device)).last_hidden_state
embs.size(), embs

In [None]:
embs.mean(dim=1).shape

In [None]:
pooled_emb = embs.mean(dim=1)
print(pooled_emb.shape)
print(f"Max Value: {pooled_emb.max()}") 
print(f"Min Value: {pooled_emb.min()}")

Now let's do it with our whole dataset.

In [None]:
def get_features(examples):
    audio_arrays = [x["array"] for x in examples["audio"]]
    return feature_extractor(
        audio_arrays, sampling_rate=feature_extractor.sampling_rate, return_tensors="pt",
        max_length=16000, truncation=True, padding=True, return_attention_mask=True,
    ).to(device)

In [None]:
music_features = music_data.map(get_features, batched=True, batch_size=50)
music_features

In [None]:
music_features.set_format("torch", columns=["input_values", "attention_mask"])

In [None]:
inputs = {k: v.to(device) for k, v in music_features[:50].items() if k in feature_extractor.model_input_names}

In [None]:
feature_extractor.model_input_names

In [None]:
with torch.no_grad():
    model_output = model(**inputs).last_hidden_state

In [None]:
model_output.mean(dim=1).shape

In [None]:
def extract_embeddings(batch):
    inputs = {k: v.to(device) for k, v in batch.items() if k in feature_extractor.model_input_names}
    
    with torch.no_grad():
        model_output = model(**inputs).last_hidden_state
    
    pooled_embeds = model_output.mean(dim=1)
    
    return {"embedding": pooled_embeds.cpu().numpy()}

In [None]:
music_embs = music_features.map(extract_embeddings, batched=True, batch_size=50)
music_embs

In [None]:
np.save(
    "data/recsys/music_embs",
    np.array(music_embs['embedding']),
    allow_pickle=False
)

In [None]:
np.load("data/recsys/music_embs.npy")

## 6. Putting it All Together

Now that we have the data we need, it is time to put it to the test with a UI built on streamlit.

In [None]:
music_data[0]

In [None]:
payload = list(music_data['metadata'])
payload[:3]

In [None]:
import pandas as pd

In [None]:
pd.Series(payload).to_csv("data/recsys/payload.csv", index=False)

### 6.1 Basics of Recommender Systems

Recommendation systems are algorithms and techniques used to suggest items or content to users based on their preferences, historical data, or behavior. These systems aim to provide personalized recommendations to users, helping them discover new items of interest and enhancing their overall user experience. Recommendation systems are widely used in various domains such as e-commerce, streaming platforms, social media, and more.

Here are a few examples of recommendation systems related to deep learning:

1. Collaborative Filtering with Neural Networks: Collaborative filtering is a common recommendation approach that predicts user preferences based on their similarity to other users. Deep learning techniques, such as neural networks, can be applied to learn complex patterns and representations from user-item interactions to improve collaborative filtering recommendations.

2. Content-based Recommendation with Deep Learning: Content-based recommendation systems utilize the characteristics or features of items to make recommendations. Deep learning models, such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs), can be used to extract high-level representations from item features, such as images, text, or audio, and make personalized recommendations based on user preferences.

3. Neural Matrix Factorization: Matrix factorization is a popular technique for recommendation systems that factorizes the user-item interaction matrix to capture latent factors. Deep learning models, such as autoencoders or neural networks with multiple layers, can be used to model the matrix factorization process and generate more accurate recommendations.

4. Sequence-based Recommendation with Recurrent Neural Networks (RNNs): In scenarios where sequential user interactions or behaviors are important, RNNs can be employed to model sequential dependencies and capture user preferences over time. This approach is useful for recommendation systems in domains like music, video, or news, where the order of items matters.

5. Hybrid Recommendation Systems: Deep learning models can be combined with traditional recommendation techniques, such as collaborative filtering or content-based filtering, to create hybrid recommendation systems. These systems leverage the strengths of different approaches to provide more accurate and diverse recommendations.

It's worth noting that these examples are just a few instances of how deep learning can be applied in recommendation systems. The field is evolving rapidly, and researchers continue to explore new architectures and techniques to improve the quality and effectiveness of recommendations.

### 6.2 Loading Up Qdrant

In [None]:
# payload = pd.read_csv("../data/recsys/payload.csv").tolist()
# vectors = np.load("../data/recsys/music_embs.npy")
# client = QdrantClient("localhost", port=6333)

In [None]:
dimensions = music_embs[0]['embedding'].shape[0]
dimensions

In [None]:
client.recreate_collection(
    collection_name="music_recsys",
    vectors_config=models.VectorParams(size=dimensions, distance=models.Distance.COSINE),
)

In [None]:
collection_info = client.get_collection(collection_name="music_recsys")
collection_info

In [None]:
from qdrant_client.http.models import CollectionStatus

assert collection_info.status == CollectionStatus.GREEN
assert collection_info.vectors_count == 0

In [None]:
client.upsert(
    collection_name="music_recsys",
    points=models.Batch(
        ids=music_data['ids'],
        vectors=music_embs['embedding'].tolist(),
        payloads=payload
    )
)

In [None]:
from random import choice
from glob import glob

In [None]:
files = glob("data/latin_music/Audios/*/*.mp3")
files[:3]

In [None]:
sample = choice(files)
sample

In [None]:
with AudioFile(sample) as f:
    print(f.samplerate)
    print(f.num_channels)
    print(f.read(3))
    song = f.read(f.frames)[0]

In [None]:
player(song, rate=44100)

In [None]:
song.shape

In [None]:
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")
new_input = feature_extractor(
    song, sampling_rate=feature_extractor.sampling_rate, return_tensors="pt",
    padding=True, return_attention_mask=True, max_length=16_000, truncation=True
)#.to(device)

In [None]:
model = AutoModelForAudioClassification.from_pretrained(
    "facebook/wav2vec2-base", num_labels=num_labels, label2id=label2id, id2label=id2label,
)#.to(device)

In [None]:
with torch.no_grad():
    logits = model(**new_input).logits
logits

In [None]:
predicted_class_ids = torch.argmax(logits).item()
predicted_label = model.config.id2label[str(predicted_class_ids)]
predicted_label

In [None]:
model_4_embs = AutoModel.from_pretrained("facebook/wav2vec2-base")

In [None]:
new_input

In [None]:
with torch.no_grad():
    model_output = model_4_embs(**new_input).last_hidden_state
    sample_embeds = model_output.mean(dim=1).cpu().numpy()
sample_embeds.shape

In [None]:
results = client.search(
    collection_name="music_recsys",
    query_vector=sample_embeds[0].tolist(),
    limit=10, 
)
results

In [None]:
results[0].payload

In [None]:
for i in results:
    file = i.payload['url_song']
    with AudioFile(file) as f:
        a_song = f.read(f.frames)[0]
    display(player(a_song, rate=44100))

### 6.3 Building a UI

In [None]:
%%writefile recsys_app.py

from transformers import AutoModel, AutoFeatureExtractor
from qdrant_client import QdrantClient
from pedalboard.io import AudioFile
import streamlit as st
import torch

st.title("Music Recommendation App")
st.markdown("Upload your favorite songs and get a list of recommendations from our database of music.")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModel.from_pretrained('facebook/wav2vec2-base').to(device)
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")
client = QdrantClient("localhost", port=6333)

music_file = st.file_uploader(label="📀 Music file 🎸",)

if music_file:
    st.audio(music_file)

    with AudioFile(music_file) as f:
        a_song = f.read(f.frames)[0]

    inputs = feature_extractor(
        a_song, sampling_rate=feature_extractor.sampling_rate, return_tensors="pt",
        padding=True, return_attention_mask=True, max_length=16_000, truncation=True
    ).to(device)

    with torch.no_grad():
        last_hidden_state = model(**inputs).last_hidden_state
    vectr = last_hidden_state.mean(dim=1).cpu().numpy()[0]

    st.markdown("## Real Recommendations")
    results = client.search(collection_name="music_recsys", query_vector=vectr, limit=4)
    col1, col2 = st.columns(2)

    with col1:
        st.header(f"Genre: {results[0].payload['genre']}")
        st.subheader(f"Artist: {results[0].payload['artist']}")
        st.audio(results[0].payload["url_song"])
        
        st.header(f"Genre: {results[1].payload['genre']}")
        st.subheader(f"Artist: {results[1].payload['artist']}")
        st.audio(results[1].payload["url_song"])

    with col2:
        st.header(f"Genre: {results[2].payload['genre']}")
        st.subheader(f"Artist: {results[2].payload['artist']}")
        st.audio(results[2].payload["url_song"])
        
        st.header(f"Genre: {results[3].payload['genre']}")
        st.subheader(f"Artist: {results[3].payload['artist']}")
        st.audio(results[3].payload["url_song"])

In [None]:
!streamlit run recsys_app.py

## 7. Final Thoughts

We have explored a bit of the fascinating world of vector databases, natural language processing, transformers, and embeddings. In this tutorial we learned that (1) vector databases provide efficient storage and retrieval of high-dimensional vectors, making them ideal for similarity-based search tasks. (2) Signal processing enables us to understand and process audio data, opening up possibilities for different kinds of useful applications for digital technologies. (3) Transformers, with their attention mechanism, capture long-range dependencies in different modalities and achieve incredible results in different tasks. Finally, embeddings encode data into dense vectors, capturing semantic relationships and enabling powerful understanding capabilities.