# RAG Test

This notebook tests the RAG flow for the Music Theory Assistant, evaluates how well the search system can retrieve results and how well different LLMs can analyse the relevance of generated answers to given questions.

## Retrieval Flow

This section gets the music theory data from a local CSV file knowledge base containing a selection of songs from different musical genres. The retrieval flow is then tested using the following different methods:

- minsearch (text search)
- Qdrant (vector search)


### Get the data

In [58]:
import pandas as pd

In [59]:
csv = '../data/music-theory-dataset-100.csv'

df = pd.read_csv(csv)
df.columns = df.columns.str.lower().str.replace(' ', '_')

In [60]:
df.to_csv(csv, index=False)

In [61]:
print('Shape (rows and columns):', df.shape)
df.head(2)

Shape (rows and columns): (100, 11)


Unnamed: 0,id,title,artist,genre,key,tempo_bpm,time_signature,chord_progression,roman_numerals,cadence,theory_notes
0,0,Let It Be,The Beatles,Pop,C major,76,4/4,C – G – Am – F – C – G – F – C,I – V – vi – IV – I – V – IV – I,Authentic (IV–I) at end; Deceptive (V–vi) earlier,Diatonic progression; Deceptive cadence in ear...
1,1,Hotel California,Eagles,Rock,Bm,74,4/4,Bm – F# – A – E – G – D – Em – F#,i – V – VII – IV – VI – III – iv – V,Half cadence (iv–V),Modal interchange; Natural VII chord; Aeolian ...


## Retrieval Flow - minsearch (text search)

This section gets the music theory data from a local CSV file knowledge base containing a selection of songs from different musical genres. It is then indexed in [minsearch](https://github.com/alexeygrigorev/minsearch) and queried. This same dataset is then passed to an [LLM (OpenAI - GPT-4o mini)](https://platform.openai.com/docs/models/gpt-4o-mini) and queried again to check the accuracy of the results.

### Install minsearch

In [62]:
import os

if not os.path.exists("../notebooks/minsearch.py"):
    # Install the package
    os.system("pip install minsearch")
    # Download the file
    os.system("wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py")

### Index with minsearch

In [63]:
import minsearch

In [64]:
df.columns

Index(['id', 'title', 'artist', 'genre', 'key', 'tempo_bpm', 'time_signature',
       'chord_progression', 'roman_numerals', 'cadence', 'theory_notes'],
      dtype='object')

In [65]:
# Covert numeric fields to string to prevent parsing errors in minsearch
df['tempo_bpm'] = df['tempo_bpm'].apply(lambda i: str(i))

documents = df.to_dict(orient='records')
documents[0]

{'id': 0,
 'title': 'Let It Be',
 'artist': 'The Beatles',
 'genre': 'Pop',
 'key': 'C major',
 'tempo_bpm': '76',
 'time_signature': '4/4',
 'chord_progression': 'C – G – Am – F – C – G – F – C',
 'roman_numerals': 'I – V – vi – IV – I – V – IV – I',
 'cadence': 'Authentic (IV–I) at end; Deceptive (V–vi) earlier',
 'theory_notes': 'Diatonic progression; Deceptive cadence in early phrase; Clear tonic return'}

In [66]:
index = minsearch.Index(
    text_fields=['title', 'artist', 'genre', 'key', 'tempo_bpm', 'time_signature',
       'chord_progression', 'roman_numerals', 'cadence', 'theory_notes'],
    keyword_fields=[]
)

In [67]:
index.fit(documents)

<minsearch.Index at 0x7d0d362e6540>

In [68]:
query = "Give me Folk titles"

In [69]:
index.search(query, num_results=5)

[{'id': 5,
  'title': 'House of the Rising Sun',
  'artist': 'The Animals',
  'genre': 'Folk',
  'key': 'Am',
  'tempo_bpm': '76',
  'time_signature': '6/8',
  'chord_progression': 'Am – C – D – F – Am – E – Am',
  'roman_numerals': 'i – III – IV – VI – i – V – i',
  'cadence': 'Authentic (V–i)',
  'theory_notes': 'Aeolian mode; 6/8 compound meter; Traditional folk harmony'},
 {'id': 17,
  'title': "The Times They Are A-Changin'",
  'artist': 'Bob Dylan',
  'genre': 'Folk',
  'key': 'G major',
  'tempo_bpm': '76',
  'time_signature': '3/4',
  'chord_progression': 'G – Em – C – G – Am – D – G – Em – D – G',
  'roman_numerals': 'I – vi – IV – I – ii – V – I – vi – V – I',
  'cadence': 'Authentic (V–I)',
  'theory_notes': 'Folk protest song; Simple diatonic harmony; Waltz meter'},
 {'id': 82,
  'title': 'Hallelujah',
  'artist': 'Leonard Cohen',
  'genre': 'Folk',
  'key': 'C major',
  'tempo_bpm': '72',
  'time_signature': '6/8',
  'chord_progression': 'C – Am – F – G – Em – Am – F – G –

### minsearch to LLM

In [70]:
from openai import OpenAI

client = OpenAI()

In [71]:
def search(query):
    boost = {}

    results = index.search(
        query=query,
        filter_dict={},
        boost_dict=boost,
        num_results=10
    )

    return results

In [72]:
prompt_template = """
You're a music teacher. Answer the QUESTION based on the CONTEXT from our music theory database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT:
{context}
""".strip()

entry_template = """
title: {title}
artist: {artist}
genre: {genre}
key: {key}
tempo_bpm: {tempo_bpm}
time_signature: {time_signature}
chord_progression: {chord_progression}
roman_numerals: {roman_numerals}
cadence: {cadence}
theory_notes: {theory_notes}
""".strip()

In [73]:
def build_prompt(query, search_results):
    context = ""
    
    for doc in search_results:
        context = context + entry_template.format(**doc) + "\n\n"

    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

In [74]:
def llm(prompt, model='gpt-4o-mini'):
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

In [75]:
def rag(query, model='gpt-4o-mini'):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt, model=model)
    return answer

In [76]:
question = 'Is Mr Tambourine Man a folk song and if so, explain why, including information about its key and cadence, and explain what cadence means?'
answer = rag(question)
print(answer)

Yes, "Mr. Tambourine Man" is a folk song. It is classified under the folk genre, and it features characteristics typical of folk music, such as storytelling and an acoustic sound, especially as performed by Bob Dylan. 

The song is in the key of D major and has a tempo of 110 BPM in a 4/4 time signature. Its chord progression follows a sequence of D – G – A – D – Bm – G – A – D, represented in Roman numerals as I – IV – V – I – vi – IV – V – I. The cadence used in "Mr. Tambourine Man" is an authentic cadence, specifically from the V (five) chord to the I (tonic) chord, which is commonly used in folk music to establish a sense of resolution.

A cadence refers to a musical sequence that brings a phrase or a piece of music to a close. An authentic cadence, like the one found in "Mr. Tambourine Man," is particularly strong and definitive, often leading the listener back to the tonic chord, providing a feeling of rest and completion within the harmonic structure.


In [77]:
question = 'Could you print a list of songs by The Beatles?'
answer = rag(question)
print(answer)

The only song by The Beatles listed in the context is:

1. **Let It Be**
   - Genre: Pop
   - Key: C major
   - Tempo: 76 BPM
   - Time Signature: 4/4

2. **Something**
   - Genre: Pop
   - Key: C major
   - Tempo: 66 BPM
   - Time Signature: 4/4


In [78]:
question = 'Could you print a list of songs in the key of C Major?'
answer = rag(question)
print(answer)

Here is a list of songs in the key of C Major:

1. "Let It Be" by The Beatles
2. "My Girl" by The Temptations
3. "Brown Sugar" by The Rolling Stones
4. "Great Balls of Fire" by Jerry Lee Lewis
5. "Blue Moon" by Richard Rodgers
6. "Dust in the Wind" by Kansas


## Retrieval Flow - Qdrant (vector search)

This section gets the music theory data from a local CSV file knowledge base containing a selection of songs from different musical genres. It is then indexed in [Qdrant](https://qdrant.tech/) and queried. This same dataset is then passed to an [LLM (OpenAI - GPT-4o mini)](https://platform.openai.com/docs/models/gpt-4o-mini) and queried again to check the accuracy of the results.

Install qdrant and fastembed (if not already installed during project setup):

```bash
pip install -q "qdrant-client[fastembed]>=1.14.2"
```

Run in Docker:

```bash
docker pull qdrant/qdrant

docker run -p 6333:6333 -p 6334:6334 \
   -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
   qdrant/qdrant
```

In [79]:
from qdrant_client import QdrantClient, models

In [80]:
# Qdrant setup
qd_client = QdrantClient("http://localhost:6333")
collection_name = "zoomcamp-music-theory-assistant"

In [81]:
EMBEDDING_MODEL = "jinaai/jina-embeddings-v2-small-en"
EMBEDDING_DIMENSIONALITY = 512

In [82]:
# delete the collection if it already exists
qd_client.delete_collection(collection_name=collection_name)

True

In [83]:
qd_client.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=EMBEDDING_DIMENSIONALITY,
        distance=models.Distance.COSINE
    )
)

True

In [84]:
# (Optional) Add payload indexes for filtering later, e.g., by genre or cadence
# qd_client.create_payload_index(collection_name=COLLECTION, field_name="genre", field_schema="keyword")
# qd_client.create_payload_index(collection_name=COLLECTION, field_name="cadence", field_schema="keyword")

In [85]:
# Prepare points
points = []
for doc in documents:
    # Build a single searchable text string from the document fields
    text = " | ".join([
        str(doc["title"]),
        str(doc["artist"]),
        f"Genre: {doc['genre']}",
        f"Key: {doc['key']}",
        f"Tempo: {doc['tempo_bpm']} BPM",
        f"Time: {doc['time_signature']}",
        f"Chords: {doc['chord_progression']}",
        f"Roman: {doc['roman_numerals']}",
        f"Cadence: {doc['cadence']}",
        f"Notes: {doc['theory_notes']}",
    ])

    vector = models.Document(text=text, model=EMBEDDING_MODEL)  # Qdrant client auto-embeds this
    point = models.PointStruct(id=int(doc["id"]), vector=vector, payload=doc)
    points.append(point)

In [86]:
qd_client.upsert(
    collection_name=collection_name,
    points=points
)

UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

In [87]:
def vector_search(query):

    query_points = qd_client.query_points(
        collection_name=collection_name,
        query=models.Document(
            text=query,
            model=EMBEDDING_MODEL 
        ),
        # Optionally add a filter here if you want to constrain results (e.g., only Pop)
        #query_filter=models.Filter( 
        #    must=[
        #        models.FieldCondition(
        #            key="genre",
        #            match=models.MatchValue(value="Pop")
        #        )
        #    ]
        #),
        limit=10,
        with_payload=True
    )
    
    results = []
    
    for point in query_points.points:
        results.append(point.payload)
    
    return results

In [88]:
def rag(query, model='gpt-4o-mini'):
    search_results = vector_search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt, model=model)
    return answer

In [89]:
question = 'Is Mr Tambourine Man a folk song and if so, explain why, including information about its key and cadence, and explain what cadence means?'
answer = rag(question)
print(answer)

Yes, "Mr. Tambourine Man" is indeed a folk song. It falls under the folk genre, characterized by its lyrical storytelling and simple harmonic structure typical of folk music. 

The song is composed in the key of D major and features a chord progression of D – G – A – D – Bm – G – A – D. The Roman numerals for this progression are I – IV – V – I – vi – IV – V – I. 

In terms of cadence, "Mr. Tambourine Man" employs an authentic cadence, specifically the V–I cadence. An authentic cadence is a musical conclusion where the dominant (V) chord resolves to the tonic (I) chord, creating a sense of closure and resolution in the music. This is a common feature in many folk songs, helping to establish a strong tonal center and finish to phrases throughout the piece. Thus, the combination of its folk genre, key, chord progression, and cadence all reinforce its classification as a folk song.


## Retrieval Evaluation

This section is measuring how well the search system (using minsearch and Qdrant) can retrieve the correct song record for a set of ground-truth questions. Here’s what it does:

1. Loads ground-truth data: Reads a CSV file (ground-truth-retrieval.csv) containing questions and the correct song id for each question.
2. Defines evaluation metrics:
    - **Hit Rate**: The fraction of questions for which the correct song appears anywhere in the top search results.
    - **MRR (Mean Reciprocal Rank)**: Measures how high the correct song appears in the ranked results (higher is better).
3. Runs the search: For each question, it uses minsearch to retrieve the top results.
4. Checks relevance: Compares the id of each result to the ground-truth id to see if the correct song was retrieved and at what rank.
5. Calculates metrics: Aggregates the results to compute overall hit rate and MRR, giving you a quantitative measure of your retrieval system’s accuracy.

In [90]:
# Read the local ground truth dataset
df_question = pd.read_csv('../data/ground-truth-retrieval.csv')

In [91]:
df_question.head()

Unnamed: 0,id,question
0,0,What is the key of the song 'Let It Be' by The...
1,0,Can you provide the chord progression for 'Let...
2,0,What is the tempo in beats per minute for 'Let...
3,0,Which cadence is used at the end of 'Let It Be'?
4,0,What is the time signature of 'Let It Be'?


In [92]:
ground_truth = df_question.to_dict(orient='records')

In [93]:
ground_truth[0]

{'id': 0,
 'question': "What is the key of the song 'Let It Be' by The Beatles?"}

In [100]:
def hit_rate(relevance_total):
    cnt = 0

    for line in relevance_total:
        if True in line:
            cnt = cnt + 1

    return cnt / len(relevance_total)

def mrr(relevance_total):
    total_score = 0.0

    for line in relevance_total:
        for rank in range(len(line)):
            if line[rank] == True:
                total_score = total_score + 1 / (rank + 1)

    return total_score / len(relevance_total)

In [101]:
def evaluate(ground_truth, search_function):
    relevance_total = []

    for q in tqdm(ground_truth):
        doc_id = q['id']
        results = search_function(q)
        relevance = [d['id'] == doc_id for d in results]
        relevance_total.append(relevance)

    return {
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }

## Retrieval Evaluation - minsearch (text search)

In [103]:
def minsearch_search(query):
    boost = {}

    results = index.search(
        query=query,
        filter_dict={},
        boost_dict=boost,
        num_results=10
    )

    return results

In [104]:
from tqdm.auto import tqdm

In [105]:
evaluate(ground_truth, lambda q: minsearch_search(q['question']))

  0%|          | 0/500 [00:00<?, ?it/s]

{'hit_rate': 0.914, 'mrr': 0.6267063492063489}

## Retrieval Evaluation - Qdrant (vector search)

In [106]:
evaluate(ground_truth, lambda q: vector_search(q['question']))

  0%|          | 0/500 [00:00<?, ?it/s]

{'hit_rate': 0.914, 'mrr': 0.8712936507936508}