In [11]:
from pathlib import Path
import nltk

nltk.download("punkt")

from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document


model_name = "sentence-transformers/all-MiniLM-L6-v2"
embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": False},
)

### Dogs text
The dog (Canis familiaris or Canis lupus familiaris) is a domesticated descendant of the gray wolf. Also called the domestic dog, it was selectively bred from a population of wolves during the Late Pleistocene by hunter-gatherers. The dog was the first species to be domesticated by humans, over 14,000 years ago and before the development of agriculture. Due to their long association with humans, dogs have gained the ability to thrive on a starch-rich diet that would be inadequate for other canids.

Dogs have been bred for desired behaviors, sensory capabilities, and physical attributes. Dog breeds vary widely in shape, size, and color. They have the same number of bones (with the exception of the tail), powerful jaws that house around 42 teeth, and well-developed senses of smell, hearing, and sight. Compared to humans, dogs possess a superior sense of smell and hearing, but inferior visual acuity. Dogs perform many roles for humans, such as hunting, herding, pulling loads, protection, companionship, therapy, aiding disabled people, and assisting police and the military.

### Cats text
The cat (Felis catus), also referred to as the domestic cat or house cat, is a small domesticated carnivorous mammal. It is the only domesticated species of the family Felidae. Advances in archaeology and genetics have shown that the domestication of the cat occurred in the Near East around 7500 BC. It is commonly kept as a pet and working cat, but also ranges freely as a feral cat avoiding human contact. It is valued by humans for companionship and its ability to kill vermin. Its retractable claws are adapted to killing small prey species such as mice and rats. It has a strong, flexible body, quick reflexes, and sharp teeth, and its night vision and sense of smell are well developed. It is a social species, but a solitary hunter and a crepuscular predator.

Cat intelligence is evident in their ability to adapt, learn through observation, and solve problems. Research has shown they possess strong memories, exhibit neuroplasticity, and display cognitive skills comparable to those of a young child. Cat communication includes meowing, purring, trilling, hissing, growling, grunting, and body language. It can hear sounds too faint or too high in frequency for human ears, such as those made by small mammals. It secretes and perceives pheromones.


In [22]:
documents = []
for animal in ["cat", "dog"]:
    raw_text = Path(f"{animal}s.txt").read_text()
    sentences = nltk.sent_tokenize(raw_text)
    documents += [Document(sentence, metadata={"animal": animal}) for sentence in sentences]

In [24]:
vector_db = Chroma.from_documents(documents, embeddings)

In [37]:
query = "Which occupations employs dogs?"
results = vector_db.similarity_search(query, k=3, filter={"animal": "dog"})

print(f"Results for query: '{query}'")
for doc in results:
    print(f"- {doc.page_content}")

Results for query: 'Which occupations employs dogs?'
- Dogs perform many roles for humans, such as hunting, herding, pulling loads, protection, companionship, therapy, aiding disabled people, and assisting police and the military.
- Dogs have been bred for desired behaviors, sensory capabilities, and physical attributes.
- The dog (Canis familiaris or Canis lupus familiaris) is a domesticated descendant of the gray wolf.


In [38]:
query = "What is latin name for cats"
results = vector_db.similarity_search(query, k=3, filter={"animal": "cat"})

print(f"Results for query: '{query}'")
for doc in results:
    print(f"- {doc.page_content}")

Results for query: 'What is latin name for cats'
- The cat (Felis catus), also referred to as the domestic cat or house cat, is a small domesticated carnivorous mammal.
- Advances in archaeology and genetics have shown that the domestication of the cat occurred in the Near East around 7500 BC.
- Cat communication includes meowing, purring, trilling, hissing, growling, grunting, and body language.
