In [1]:
import minsearch
import os
import pandas as pd

from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

True

### Ingestion

In [2]:
df = pd.read_csv('../data/rag_dataset.csv')

In [3]:
df.head()

Unnamed: 0,id,breed_name,history,health,description,characteristics,appearance,temperament
0,0,Affenpinscher,"The word 'Affenpinscher' derives from Affe, Ge...",A UK study found a life expectancy of 9.3 year...,The Affenpinscher generally weighs 4–6 kg (9–1...,,,
1,1,Afghan Hound,The Afghan Hound has been identified as a basa...,,The dogs in this breed occur in many different...,,,
2,2,Aidi,The Aidi is a breed native to the Atlas Mounta...,,,,Standing 52–62 cm (20–24 in) in height and wei...,
3,3,Airedale Terrier,"Airedale, a valley (dale) in the West Riding o...",A UK study found a life expectancy of 12 years...,,,The Airedale is the largest of the British ter...,The Airedale can be used as a working dog and ...
4,4,Akita (dog breed),,,,,"As a spitz breed, the appearance of the Akita ...",The Akita is generally seen as territorial abo...


In [4]:
df.columns

Index(['id', 'breed_name', 'history', 'health', 'description',
       'characteristics', 'appearance', 'temperament'],
      dtype='object')

In [5]:
df = df.fillna('')

In [6]:
documents = df.to_dict(orient='records')

In [7]:
index = minsearch.Index(
    text_fields=['breed_name', 'history', 'health', 'description', 'characteristics',
       'appearance', 'temperament'],
    keyword_fields=['id']
)

In [8]:
index.fit(documents)

<minsearch.minsearch.Index at 0x2ad24e68e80>

In [9]:
query = "Give me breed friendly for family"

### RAG Flow

In [10]:
index.search(query, num_results=3)

[{'id': 246,
  'breed_name': 'Pointer (dog breed)',
  'history': 'There has been much debate among dog historians about the ancestry of the Pointer. The most commonly held position is that the breed descends from Old Spanish Pointers that were imported into England. The popular belief is that Spanish Pointers were first introduced to England in 1713 by soldiers returning from Spain after the Peace of Utrecht. In his Cynographia Britannica, published in 1800, Sydenham Edwards states that the "Spanish Pointer was introduced to this country [England] by a Portugal Merchant, at a very modern period, and was first used by a reduced Baron, of the name of Bichell, who lived in Norfolk, and could shoot flying".\nOther early sources suggest Portuguese Pointers, Italian Braccos or French pointers were the foundation of the English breed. In 1902, Victorian era sportsman William Arkwright produced the book The pointer and his predecessors often considered one of the best early histories of the Po

### OpenAI

In [11]:
import os

api_key = os.environ["OPENAI_API_KEY"]
model = "gpt-5-nano"

client = OpenAI(api_key=api_key)

chat_response = client.chat.completions.create(
    model = model,
    messages = [
        {
            "role": "user",
            "content": query,
        },
    ]
)

print(chat_response.choices[0].message.content)


Sure! Here are dog breeds that are commonly considered family-friendly. I’ve grouped them by size and included quick notes on temperament, energy, and grooming so you can pick what fits your family.

Small to medium breeds
- Cavalier King Charles Spaniel: sweet, adaptable, great with kids; moderate exercise; moderate grooming.
- Beagle: friendly, curious, good with kids; energy can be high—needs training and supervision to keep them safe.
- Bichon Frise: cheerful, good with families; low-shedding (not fully hypoallergenic); regular grooming.
- Cocker Spaniel: affectionate and good with kids; enjoys a walk or two daily; needs regular brushing.
- Poodle (Toy/Mini/Standard): very trainable, often good for families with allergies (low-shedding); needs grooming.
- French Bulldog: affectionate, good with kids; low-to-moderate exercise; can have health issues to watch.
- Bulldog (English Bulldog): calm, family-friendly; great with kids but needs vet care for potential health issues and modera

### RAG Flow

In [12]:
documents[0]

{'id': 0,
 'breed_name': 'Affenpinscher',
 'history': "The word 'Affenpinscher' derives from Affe, German for 'ape' or 'monkey'; it is sometimes translated as 'Monkey Terrier', although the dog is a pinscher and not a terrier.\nThe origins of the Pinscher group of dogs are unknown. Dogs of this type, both rough-haired and smooth-haired, were traditionally kept as carriage dogs or as stable dogs, and so were sometimes known as Stallpinscher; they were capable ratters. Until the late nineteenth century, both rough-haired and smooth-haired types were known as Deutscher Pinscher, and came from the same lineage; puppies of both types could occur in the same litter.\nIn 1880 the Pinscher was recorded in the Deutschen Hundestammbuch of the Verein zur Veredelung der Hunderassen. In 1895 Ludwig Beckmann described five varieties of Pinscher – the rough- and smooth-haired Pinscher, the rough- and smooth-haired Miniature Pinscher, and the Affenpinscher. In 1895 a breed society, the Pinscher-Schnau

In [13]:
def search(query):
    boost = {}

    results = index.search(
        query=query,
        filter_dict={},
        boost_dict=boost,
        num_results=10
    )

    return results

In [14]:
prompt_template = """
You're a dog behaviour expert and assist families to find the best dog breed match for their preferences. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

entry_template = """
breed_name: {breed_name}
history: {history}
health: {health}
description: {description}
characteristics: {characteristics}
appearance: {appearance}
temperament: {temperament}
""".strip()

In [15]:
def build_prompt(query, search_results):
    context = ""
    
    for doc in search_results:
        context = context + entry_template.format(**doc) + "\n\n"

    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

In [16]:
def llm(prompt, model="gpt-4o-mini"):
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

In [17]:
def rag(query, model="gpt-4o-mini"):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    #print(prompt)
    answer = llm(prompt, model=model)
    return answer

In [18]:
question = 'Write me 5 best family friendly dog breeds'
answer = rag(question)
print(answer)

Here are five family-friendly dog breeds based on their characteristics and temperament:

1. **Boxer**: Known for their loving nature, Boxers are loyal and protective, making them excellent family dogs. They are patient and energetic, often very good with children.

2. **Papillon**: These small dogs are happy, friendly, and intelligent. Papillons are known for being playful and adaptable, making them great companions for families, although they should be monitored around small children.

3. **Kintamani Dog**: This breed is gentle and affectionate with its family. Kintamanis are known to be good watchdogs, providing protective qualities alongside their loving demeanor.

4. **Portuguese Water Dog**: These dogs are friendly and intelligent, eager to please their owners. Their loving nature and enthusiasm for family activities make them wonderful companions.

5. **Smooth Collie**: Smooth Collies are sociable and easily trainable, getting along well with children and other pets. Their alert

In [19]:
question = 'Is Nova Scotia Duck Tolling Retriever agressive?'
answer = rag(question)
print(answer)

The Nova Scotia Duck Tolling Retriever is not known to be aggressive. This breed is characterized as intelligent, curious, alert, outgoing, and high-energy, often exhibiting affectionate and eager-to-please behavior. They generally socialize well with children and make good family dogs. While they have a unique bark known as the "toller scream," this sound is used when they are excited and is not indicative of aggression. To maintain their well-being, they require regular physical and mental stimulation to prevent undesirable behaviors, which could arise if they are not exercised adequately or left alone for extended periods.
