# Lecture Plan


1. Vector DB Orientation
2. ChromaDB

---

# VectorDB Introduction

Vector database is a type of database that indexes, stores and manipulates the high dimensional vector data. 

- 80% of the data out there is unstructured and cannot fit into a RD. 

It provides:
- fast retreival and similarity search
- CRUD operations
- metadata filtering
- horizantal scaling

![Vector DB](./images/VectorDB-Representation.png)

**Vector Embedding:** The process of conversion of data (audio, visual representations and documents) into numerical values called vectors. Vectors have both magnitude and dimension and that makes it easy for sematic searching.

## Embedding Model

The embedding model in VectorDB is the core component, which converts textual data into numerical vectors for efficient storage and querying.

## Vector Index

Vector index is a datastructure that converts and retreives the vector data.

## Vector Indexing

Conversion of unstructered data into vectors using embeding model and similar vectors are organised together using indexing. This results in swift querying.

## Vector Index vs Vector Database

- Data Management: 
   - Performace and fault tolerance: sharding and replication 
   - Monitoring: Resourse usage, query performance and system health.
- Metadata storage and filtering
- Scalability
- Real time updates
- Backups and collections
- Ecosystem integration
- Data Security and access control

## Vector Database usecases

1. Long term memory for LLM (RAG)
2. Semantic search & similarity search
3. Recomendation systems
4. Machine learning: Clustering and Classification
5. Anamoly detection (Rare events, IT Threats and Financial Fraud)


## Why Vector Databases?

Unstructured data is if various forms and all of these are stored using different approaches like key-pairs, documents, graphs. Vector data would solve this problem because all forms of data can be converted to vector data.

## Vector Databse

Vector Database operates on vectors. In vector databses, a similarity metric is applied to find a vector that is most similar to the query. Vector DB uses alogarithims for Aprroximate Nearest Neighbour (ANN) search. The algorithms optmize the search through hashing, quantization and graph-based search. These algorithms are assembled into a pipeline. The results are approximate. Accuracy of result is inversely proportional to the speed of the retreival.

1. Indexing: The Vector db indexes vectors using algorithms. This maps the vectors to a data structure that will enable faster retreiving.
2. Querying: The vector db compares the indexed query vector to the existing indexed vectors to find the nearest neighbours.
3. Post processing: In some cases, the data retreived is post processsed. post processing can include re-ranking the the results using a difeerent similarity measure.

![Vector DB](./images/VectorDB-pipeline.png)


## Algorithms

The algorithms are used to enable the swift querying of vectors by creating a data structure that can be traversed quickly. Commonly, the original vectors is compressed to optimise the quey process.

### Random Projection

A dot product of high-dimensional vector matrix and  low-dimensional random projection matrix results in low-dimensional projected matrix. The similarity is preserved but the dimensions are reduced. When we query, the same random projection matrix is used project the query vector onto a lower-dimensional space. The projected query vectors are compared with the projected vectors to find the nearest neighbours. Random projection matric is an approximate method and th projection quality depends on the properties of projection matrix. The more random the projection matrix is the better the quiality of the projection. Random projection matric can be computationally expensive for large datasets.

![RP](./images/random-projection.png)

### Product Quantization

Lossy compression technique (The data is nit restored to its original form after compression). The compression process involves:

1. splitting: The vectors are broken into segments.
2. Training: building a code book for each segment. Codebook is made up od the center points of culsters created by the k-means clustering performed on each vector's segments.
3. Encoding: The algorithm assigns a specific code to each segment.
4. Querying: The algorithm breacks down the vectors into sub-vectors and quantizes them using the same cookbook. Then, it uses the Indexed codes to find the nearest vector to the query vector.

The accuracy depends on the indexing values in cookbook( the K value in K means, higher the K, higher the no of segements, higher the codes)

![RQ](./images/product-quantization.png)

### Locality-sensitive hashing

LSH is a techinique for indexing in the context of an approximate nearest-neighbour search. The vectors are hashed used hashing functions and categorised into tables in buckets. The same hashing function is used to hash the query. The hashed query is placed to the table with similar vectors and the search is simple because the similarity search is performed for the values in a table not the bulk values. The accuracy depends on the properties of the hash functions. The number of hash functions is directly proportional to the accuracy. The large number of hash functions is computationally expensive.

![LSH](./images/locally-sensitive-hashing.png)

### Hierarchical Navigable Small World (HNSW)


HNSW creates a heirarchical, tree like structures where each node represents a set of vectors and the edges are used to connect the nodes that are similar. When a similarity search is applied, it uses this graph to navigate throgh the structure and retreives the vectors that comtain the closest vectors.


![HNSW](./images/hierachical-navigable-small-worlds.png)


HNSW = Hierarchical (Layers) + NSW (Greedy routing)

NSW: local minima is achieved when there no vertices closer to traget vertices. 

High Degree V : multiple connections
Low Degree V : low connections

to avoid local minima higher degree vertices are used. 

In HNSW the HDV's are spread across the multiple layers. When local minima is reached till n-1 we move to the next layer. once the local minima for layer n is reached its considered as stopping condition.


### Similarity measures

Similarity measures are used to compare the query vectors with the indexed vectors.

1. **Cosine similarity:** cosine of the angle between two vectors belongs to [-1,1]. 0 represents orthoginal vectors, 1-identical, -1-diametrically opposed (opposite), 0-orthoginal (non similar).

ex: semantic search, document classification, recomendation system based on past behavior.

2. **Euclidean distance:** measures the straight line distance between two vectors ranging between [0,infinity]

ex: locally sensitive hashing

3. **Dot Product or Inner Product:** product of magnitudes of two vectors and the cosine of angle between then. ranges between [-infinity,infinity]. positive-similar, 0- orthogonal and negative-opposite vectors.

ex: LLM training

<i><b>note:</b></i> In chroma db the l2 (squared euclidian) distance, cosine and inner product(dot product) are used as distance metrics. In Pinecone eucliden distance, cosine and dot product(inner product) are used as distance metrics.

## Filtering

Every vector store contains two indexes: a vector index and metadata index. While quering for similar vectors the metadata filtering is done. The filtering process can be:

1. Pre-filtering: during pre-filtering the metadata is filtered and search is performed. This process can lead to overlooking necessary data based on meta data filtering. It can reduce the search space but extensive metadata filtering can lead to computational overhead. It can lead to brute-force search which increases the time complexity
2. Post-filtering: In this approach, the metadata filtering is done after the vector search. This ensures that the relevent information is considered but it can be an added computational overhead because the search space in same and the metadata filtering is an additional process. It can lead to few or no results.

*Note*: Pinecone uses Single-Stage Filtering. It combines both the vector indexes and metadata indexes.





In [None]:
! pip install faiss-cpu sentence-transformers

In [83]:
from sentence_transformers import SentenceTransformer

emb_model = SentenceTransformer('thenlper/gte-large')

In [None]:
! pip install PyPDF2

In [12]:
from PyPDF2 import PdfReader

reader = PdfReader('./assets/the-velveteen-rabbit.pdf')

no_of_pages = len(reader.pages)

text = [ reader.pages[page].extract_text() + '\n\n' for page in range(no_of_pages)]

text

['The Velveteen RabbitByMargery Williams\n\n\n',
 "There was once a velveteen rabbit, and in the beginning he was really splendid. He was fat and bunchy, as a rabbit should be; his coat was spotted brown and white, he had real thread whiskers, and his ears were lined with pink sateen. On Christmas morning, when he sat wedged in the top of the Boy's stocking, with a sprig of holly between his paws, the effect was charming.There were other things in the stocking, nuts and oranges and a toy engine, and chocolate almonds and a clockwork mouse, but the Rabbit was quite the best of all. For at least two hours the Boy loved him, and then Aunts and Uncles came to dinner, and there was a great rustling of tissue paper and unwrapping of parcels, and in the excitement of looking at all the new presents the Velveteen Rabbit was forgotten.For a long time he lived in the toy cupboard or on the nursery ﬂoor, and no one thought very much about him. He was naturally shy, and being only made of velvetee

In [13]:
text = [t.lower() for t in text]

text

['the velveteen rabbitbymargery williams\n\n\n',
 "there was once a velveteen rabbit, and in the beginning he was really splendid. he was fat and bunchy, as a rabbit should be; his coat was spotted brown and white, he had real thread whiskers, and his ears were lined with pink sateen. on christmas morning, when he sat wedged in the top of the boy's stocking, with a sprig of holly between his paws, the effect was charming.there were other things in the stocking, nuts and oranges and a toy engine, and chocolate almonds and a clockwork mouse, but the rabbit was quite the best of all. for at least two hours the boy loved him, and then aunts and uncles came to dinner, and there was a great rustling of tissue paper and unwrapping of parcels, and in the excitement of looking at all the new presents the velveteen rabbit was forgotten.for a long time he lived in the toy cupboard or on the nursery ﬂoor, and no one thought very much about him. he was naturally shy, and being only made of velvetee

In [15]:
def remove_whitespaces(data):
    return ' '.join(data.split())

text = [remove_whitespaces(t) for t in text]

text

['the velveteen rabbitbymargery williams',
 "there was once a velveteen rabbit, and in the beginning he was really splendid. he was fat and bunchy, as a rabbit should be; his coat was spotted brown and white, he had real thread whiskers, and his ears were lined with pink sateen. on christmas morning, when he sat wedged in the top of the boy's stocking, with a sprig of holly between his paws, the effect was charming.there were other things in the stocking, nuts and oranges and a toy engine, and chocolate almonds and a clockwork mouse, but the rabbit was quite the best of all. for at least two hours the boy loved him, and then aunts and uncles came to dinner, and there was a great rustling of tissue paper and unwrapping of parcels, and in the excitement of looking at all the new presents the velveteen rabbit was forgotten.for a long time he lived in the toy cupboard or on the nursery ﬂoor, and no one thought very much about him. he was naturally shy, and being only made of velveteen, som

In [16]:
text_embeddings = emb_model.encode(text)

text_embeddings

array([[-0.00542053,  0.01178248, -0.01159403, ..., -0.00787763,
         0.03500003, -0.0146179 ],
       [ 0.01780942, -0.0003453 ,  0.00296924, ...,  0.00231113,
         0.02984123, -0.01850623],
       [ 0.01851149,  0.00119871,  0.0113421 , ..., -0.00488156,
         0.02337555, -0.00648827],
       ...,
       [-0.0040862 , -0.00536956, -0.00121139, ...,  0.00213058,
         0.023294  ,  0.00290206],
       [-0.00071771, -0.00575505, -0.00685973, ..., -0.00628214,
         0.03008566, -0.00579042],
       [ 0.00343544, -0.00024041, -0.00607884, ..., -0.00102497,
         0.02526371, -0.01058268]], dtype=float32)

In [17]:
text_embeddings.shape

(13, 1024)

In [37]:
import faiss

index = faiss.IndexFlatL2(text_embeddings.shape[1])

In [38]:
index.add(text_embeddings)

In [39]:
query = ['What is real?']

query_embedding = emb_model.encode(query)

In [40]:
%%time
n = 2
D, I = index.search(query_embedding, n)

I

result = [text[i] for i in I[0]]

result



CPU times: total: 0 ns
Wall time: 2 ms


['seams underneath, and most of the hairs in his tail had been pulled out to string bead necklaces. he was wise, for he had seen a long succession of mechanical toys arrive to boast and swagger, and by-and-by break their mainsprings and pass away, and he knew that they were only toys, and would never turn into anything else. for nursery magic is very strange and wonderful, and only those playthings that are old and wise and experienced like the skin horse understand all about it."what is real?" asked the rabbit one day, when they were lying side by side near the nursery fender, before nana came to tidy the room. "does it mean having things that buzz inside you and a stick-out handle?""real isn\'t how you are made," said the skin horse. "it\'s a thing that happens to you. when a child loves you for a long, long time, not just to play with, but really loves you, then you become real.""does it hurt?" asked the rabbit."sometimes," said the skin horse, for he was always truthful. "when you 

In [42]:
index.ntotal

13

# Chroma DB

Chroma is an in-memory vector database.

Chroma can be configured to save and load documents from local machine. Data will be persisted in the specified path. Chroma will load the data from the path on start of the program.

In [None]:
! pip install chromadb

In [51]:
import chromadb

client = chromadb.PersistentClient('./chromadb_books')

# client = chromadb.HttpClient(host='localhost', port='8000')

In [63]:
client.heartbeat()

1706729324479062600

## Collections

The collection id embeddings can be managed using collection primitive. 

Collection naming rules:

1. The length of the name must be between 3 and 63 characters.
2. The name must start and end with a lowercase letter or a digit, and it can contain dots, dashes, and underscores in between.
3. The name must not contain two consecutive dots.
4. The name must not be a valid IP address.

Chroma collection is created using name and embedding function. The embedding function should be passed everytime you get the collection.

In [None]:
collection = client.create_collection(name='harrypotter', metadata={'hnsw:space': 'cosine'})



In [90]:
from chromadb.utils import embedding_functions

sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="paraphrase-albert-small-v2")


modules.json: 100%|██████████| 229/229 [00:00<00:00, 237kB/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
config_sentence_transformers.json: 100%|██████████| 122/122 [00:00<00:00, 245kB/s]
README.md: 100%|██████████| 4.03k/4.03k [00:00<?, ?B/s]
sentence_bert_config.json: 100%|██████████| 53.0/53.0 [00:00<00:00, 64.4kB/s]
config.json: 100%|██████████| 827/827 [00:00<00:00, 555kB/s]
pytorch_model.bin: 100%|██████████| 46.7M/46.7M [00:02<00:00, 18.0MB/s]
tokenizer_config.json: 100%|██████████| 465/465 [00:00<00:00, 230kB/s]
spiece.model: 100%|██████████| 760k/760k [00:00<00:00, 10.4MB/s]
tokenizer.json: 100%|██████████| 1.31M/1.31M [00:00<00:00, 1.36MB/s]
special_tokens_map.json: 100%|██████████| 245/245 [00:00<00:00, 387kB/s]
1_Pooling/config.json: 100%|██████████| 190/1

In [89]:
collection = client.create_collection(name='sample-books', metadata={'hnsw:space': 'cosine'}, embedding_function=sentence_transformer_ef)



In [68]:
collection = client.get_collection(name='harrypotter')


In [None]:
client.delete_collection(name='')

In [69]:
defence_against_dark_arts = '''\
The Dark Arts: The Three Spell Categories
The jinx is the first and lowest level of Dark spell. Jinxes usually will not cause much harm and are considered to be more of a minor irritation than a serious problem. Some common examples of jinxes are the Tripping Jinx, the Stinging Jinx, and the Knockback Jinx. All of these spells, and any spell that falls under the classification of jinx, will be irritating and can be used as a distraction, but will not cause any lasting damage. Some people will even frequently use jinxes to assist with practical jokes, a habit I suggest none of you participate in.
The hex is the second classification of Dark spell. These spells are a bit darker than the jinx and will often do more lasting damage. There are not as many examples of hexes, but some popular ones include the Bat-Bogey Hex, the Hurling Hex, and Densaugeo. All of these hexes have effects that cause more damage to the target and are viewed as major inconveniences that cause a moderate amount of suffering. These spells can be reversed through the use of a Hex Breaker.
Finally, we move to the third and final classification of the Dark Arts. The curse is the darkest and most dangerous classification of the Dark Arts. These spells are usually cast with the intent to severely harm or potentially even kill the target. Some of the more well-known curses are the Reductor Curse, the Full Body-Bind Curse, and the Conjunctivitis Curse. The use of these spells is often discouraged unless absolutely necessary. As First Years, I would not expect you to be able to cast any of the curses I have mentioned, but I may provide a lesser curse in a lesson to give you an example.
Before we get to that, I have one final topic to discuss with you - a category of curses that fall within a category of curses (if that confused you, this will be explained further in Year Two, when you start learning more about curses). This subcategory is a group of curses known as the Unforgivable Curses. These curses will not be seriously discussed here - these are the darkest spells known to wizarding kind and will not be covered in-depth until a much later year. That being said, I believe it is important that you should be aware of this category of curses. The use of any single one of these three curses will earn you a lifetime stay in Azkaban.
'''

charms = '''\
The Wand-Lighting and Wand-Extinguishing Charms
We will end today with a small talk about a simple but very useful charm: the Wand-Lighting Charm. As the name implies, this simple spell will allow you to illuminate the tip of your wand. It is one of the easiest spells to master, useful whenever you need light, and makes for some good spellcasting practice.
Now, the first known use of the Wand-Lighting Charm was in the 18th century, despite how easy it is to use. Magical historians credit Levina Monkstanley, a Ministry of Magic employee, for its invention. It was first demonstrated in 1772 when Ms. Monkstanley had dropped her quill on the ground and used the spell to find it.
Its uses far exceed simply lighting the tip of your wand. It can provide an easy and endless source of amusement for young witches and wizards who usually enjoy watching the color change. In the Ministry of Magic,  it makes for a safe and easy method for casting a vote. It can even be used to repel incorporeal threats such as the Gytrash (a vicious spectral hound) and other malevolent spirits.
Now! Onto the casting!
The Wand-Lighting Charm
Incantation: Lumos (pronounced 'LOO-mos')
Wand Movement: Single counter-clockwise loop
Willpower: Low; determines the color of the light
Concentration: Low; lighting up the top of your wand
You should keep the wand movement in mind and don't put too much willpower, or mental push, into the spell or the tip of your wand will turn scarlet. Too little and it will be a pearly white. You should aim for a nice light yellow, which indicates sufficient effort. Of course, circumstances may call for a weaker or stronger light.
Since the magic for the Wand-Lighting Charm never leaves the tip of your wand, the flow of magical energy does not need to be consciously maintained. However, please keep in mind that if you let go of your wand while this charm is in effect, the light will go out. Very advanced magical practitioners may still be able to see their wand, but this is not an easy task.
No text describing the Wand-Lighting Charm would be complete without its equally-simple counter, the Wand-Extinguishing Charm. This spell has a singular purpose, and that is to counter the Wand-Lighting Charm. It is a personal charm, which means it will only work for your own wand, or the wand you happen to be holding should it be lit.
The Wand-Extinguishing Charm
Incantation: Nox (pronounced 'NOCK-ss')
Wand Movement: Flick of the wand
Willpower: None
Concentration: None
The Wand-Extinguishing Charm can be quickly cast and its effect is immediate. There is no harm of backlash, making it as safe as the charm it counters.
'''

transfiguration = '''\s
Transformation itself is further broken down into three sub-branches: trans-species transformation, switching, and human transfiguration. These three sub-branches do not include general transformation, which some transfigurists (such as myself) believe should be added. For those who consider general transformation a sub-branch of transformation, this breaks down further into four general categories that are inanimate to inanimate, inanimate to animate, animate to inanimate, and animate to animate transformation. Some of these sub-branches do have fuzzy distinguishing lines, particularly when it comes to animate to animate transformation, trans-species transformation, cross-species switching, and human transfiguration.  We will be discussing the differences between these types of magic throughout your time in this class, but for now, let me talk a little more about the three main sub-branches.
Switching spells are spells that either change one essential detail of an object or swap the physical location of two items. This will be discussed in detail in Year Four. A trans-species transformation is a transformation that to some degree changes the species of the target. Human transfiguration is exactly what it sounds like: the act of transforming the human body. This is an extremely complex form of transformation, and thus will not be discussed until your Seventh Year if you chose to take N.E.W.T. level Transfiguration. There are several different types of human transfiguration, such as cross-species switches directed at human targets. An example of this is the spell that was used by a contestant in a past Triwizard Tournament that allowed Viktor Krum to transfigure his head into that of a shark so he could breathe underwater. There are more specific types of human transfiguration as well, such as the animagus, metamorphmagus, and werewolf transformations.
Animagi are wizards with the capability of fully turning themselves into an animal at will while keeping their human sentience. This is a completely voluntary and learned skill, one that is extremely difficult to master. We will not be teaching this skill in this class and it is highly encouraged that you do not try to learn this on your own during your time here at Hogwarts.  It is extremely difficult and time intensive to learn, which is why many choose not to. Part of this process requires that a witch or wizard keep the leaf of a mandrake plant under their tongue for a full month, which many people find extremely inconvenient. Each person is only able to turn into one form of animal, and this animal will bear some identifying mark related to their human features.  For instance, someone who wears glasses may have a darker outline of fur around their eyes. Any intense, permanent body altering will also be noted in the animagus form. For instance, let us remember the case of Peter Pettigrew, whose missing finger was there in both human and animagus form. It is important to note that Animagi are required to register with the Ministry of Magic.
Metamorphmagi are people with the ability to change parts of their appearance at will. While considered a part of human transfiguration, this is not a skill that can be learned. Rather, it is a trait that Tonks with pink hair and a pig nose.certain witches and wizards are born with that is believed to be genetic. One example of a metamorphmagus is member of the Order of the Phoenix Nymphadora Tonks, who frequently altered her facial features for the purposes of disguise and amusement, among others. Metamorphmagi can change almost anything in their appearance, from sex and age to something as simple as hair or eye color. They do not require a wand or potion to do so.
Werewolves are different from the above two forms of human transfiguration as they do not have a choice in their transformations. They are forced into the body of a wolf at every full moon. Normally when this happens, the person does not keep their human mind. Due to this, werewolves are incredibly dangerous and will attack any human on sight. That being said, there is a potion that allows someone to keep their human mind during the full moon, though the transformation will still take place. This is called the Wolfsbane Potion, which was created by Damocles. You will learn more about this potion in your Sixth Year with Professor Draekon. I won’t go into detail about how one becomes a werewolf, as this will be covered in your Care of Magical Creatures and Defense Against the Dark Arts classes.
Vanishment
Vanishment is a branch of transfiguration encompassing magic that causes an object to disappear from the world. There are many theories on what actually happens to items that have vanished. Some believe that they disappear all together and can never be returned. The most popular opinion is that the objects are disassembled and the particles scattered throughout the world. They are broken down into such small pieces that they appear to “no longer exist.” However, they cannot truly cease to exist due to the law which states that matter can neither be created nor destroyed. The more complex the object, the harder it is to vanish. This is a rather difficult category of spells that will not be discussed in detail until your Fifth Year. There is technically only one spell required to make something vanish, but there is much to discuss relating to the topic.
Conjuration
Conjuration spells are significantly more complicated than their vanishment counterparts. Conjured items are formed by pulling together the particles around us to create the object we desire, essentially opposite to the way in which items are vanished.  Accomplishing this requires the transfigurist to take into account all of the details of the object they are trying to conjure, a difficult task, which is why this is something that you will not learn unless you choose to continue on with Transfiguration into your N.E.W.T.s. There are also extensive laws restricting what can and cannot be conjured and things that are conjured also will not last forever.  We will go over some of these restrictions in the next lesson, but as a general rule, whatever you wish to conjure must first exist in the world.
Untransfiguration
Untransfiguration is precisely what it sounds like. It is a series of spells that allow you to undo previous transfigurations. However, this is mostly for transformations, as vanishment cannot be undone, and things that are conjured disappear naturally after a matter of time. This will be the topic of your Seventh Year, along with human transfiguration and some other advanced transfigurative topics.
'''

potions = '''\
The most iconic tool used by the potioneer is the cauldron. Potioneers often have strong emotional ties to the cauldrons they use most frequently, and you will even occasionally see them mumbling to their cauldron during the brewing process, urging the instrument along. Now, before you mock this, imagine you are studying advanced potions techniques following your graduation from Hogwarts. You’ve been awake going on fifty hours straight confirming the alignment of the planets and other celestial bodies, gathering ingredients at precisely the correct moment, crushing or chopping the necessary ingredients, and then brewing an incredibly intricate potion of some sort with several phases and long wait periods in-between. You are alone in your lab standing over your cauldron, this one vessel that holds the potential success of such a long and hard effort. If finished correctly and if you took sufficient notes, this could mean your first published scholarly article or even a potential first patent. The potion in that pot simmers merrily, letting off gentle steam and occasionally emitting sparks. This vessel, the cauldron, is gently cradling and preparing the key to your potential success. Finally, you reach the end of that long road. Your potion is done and ready to be bottled. It is successful - you are successful. And you only have that gently hissing cauldron with which to share that quiet victory.
Cauldrons are made of various materials and come in many sizes. In different parts of the world, different materials are favored, typically in affiliation with whatever materials are most prevalent in that country. However, that is now being somewhat standardized through better trade agreements and open commerce between magical communities. In terms of size, some cauldrons can be small enough to fit in the palm of your hand and brew only the smallest servings of potions, while there are others that would easily contain a rather large human. These large cauldrons are typically used for commercial brewing, and are rarely seen in personal households. Although most European and American countries use standardized numbering systems for conveying cauldron size, there is some variation internationally. As such, make sure you research local sizing systems should you find yourself shopping in a remote location.
There are three common materials used for cauldrons in Great Britain:
Pewter is the best for beginning potions work, as it is the slowest and least expensive standard cauldron, so most young witches and wizards are able to obtain their own. It gives a little bit of leeway owing to its slow brewing time, but students should still be careful to be as precise as possible in their measurements and timing, even with this extra wiggle room available. Pewter is a metal alloy (a material composed of at least two metals) that is traditionally at least 85% tin with copper, bismuth, antimony, and occasionally lead making up the rest of its composition. The earliest piece of pewter found dates back to 1450 BCE in an ancient Egyptian tomb. All Hogwarts students should have a Size Two Pewter cauldron for their Potions class.
Brass brews potions at a medium level speed, and is slightly more expensive than pewter. This is a good cauldron for intermediate witches and wizards who have a decent grasp of brewing times and methodology. Although this is not always exactly the case, potions brewed with brass tend to complete their brewing processes in approximately 10% less time than those brewed with pewter. Brass is also a metal alloy composed mostly of the metals zinc and copper. The levels of each of these metals can vary to create different effects in the metal. Alloys of copper and zinc have been found in the western portions of Asia and the East Mediterranean dating as far back as the third millennium before the Common Era. Similar alloys were used throughout ancient times, gradually making its way west to the Roman Empire and other parts of Europe.
Copper is the fastest brewing cauldron material, and as a rule of thumb, tends to brew potions in approximately 10% less time than brass cauldrons. Only a skilled witch or wizard should use a copper cauldron, as they can be a bit tricky owing to the much more rapid brew time. The faster brew time also makes it more likely to make a mistake or ruin the potion: a shorter brew time also yields less “wiggle room” for differences in timing. It is also unwise to use these cauldrons for potions with the longest brew times, as quite often a good deal of the strength of these slow-brewing potions is gained by a longer period of the ingredients sitting and brewing with one another. Copper is not an alloy, but rather a pure chemical element containing atoms of only one type. We will cover more of that in Year Two, so don’t worry if you don’t understand that now! There is evidence that copper was used as far back as nine to ten thousand years ago, and the Chalcolithic Period, commonly known as the Copper Age, marks a period of time when this metal was in popular use before the discovery of the alloy bronze, which is a harder metal.
On occasion, other materials are used for cauldrons in Great Britain, but these materials tend to be a bit rarer and a good deal more expensive. Gold and silver cauldrons are two such examples of this. Silver cauldrons are actually among the best to use, with the least likelihood of brewing failure as well as a smooth, easy brew time. The effects of potions brewed in silver, particularly in conjunction with certain phases of the Moon, tend to be heightened, and they quite often have a longer shelf life. If you ever decide to be a fully-fledged potioneer, a silver cauldron is definitely worth the investment. Finally, the fire crab also makes a wonderful cauldron, but owing to their frequent poaching for their shells as well as the gems found on their shells, international wizarding laws have created sanctions protecting colonies in the Fiji Islands. However, they are still traded with some frequency on the black market. Being caught with a fire crab shell sold on the black market has hefty fines, however, and carries with it possible time in Azkaban prison.
As mentioned in the safety procedures, you must always bring your dragon-hide gloves to class in order to protect you when handling dangerous ingredients, goggles to protect your eyes from splashes and sprays, and your wand. Remember, some ingredients are not only caustic, they may also try to bite: keep alert at all times, and never grow lackadaisical when brewing or preparing ingredients.
Other important implements you will find at your brewing station at the lab next week include a set of scales to measure your ingredients as well as measuring cups for liquids and a ruler for solids that must be added by measuring length. You will also want a sharp knife to ensure you are cutting ingredients cleanly and a cutting board. Many prefer a silver knife for this, as it tends to cut magical ingredients the most cleanly, but it’s up to you. Some ingredients must be crushed to a fine and even dust or ground into a smooth paste with a mortar and pestle. Stirring during the brewing process should actually be done with your wand (never stick your wand into the potion, though!) but you may have a wooden stirring utensil of some sort as well. This wooden spoon or other implement is often used to stir the potion after the brew time is finished to ensure an even consistency or to add Flobberworm mucus for texture.
'''

In [70]:
collection.add(
    documents= [defence_against_dark_arts,charms,transfiguration,potions],
    metadatas=[{'name' : 'defence_against_dark_arts'}, {'name' : 'charms'}, {'name' : 'transfiguration'}, {'name' : 'potions'}],
    ids = [str(i) for i in range(4)]
)

In [82]:
collection.query(
    query_texts=['nox'],
    n_results=3,
    where = {'name': 'potions'},
    where_document= {'$contains': 'cauldrons'}
)

{'ids': [['3']],
 'distances': [[1.0265965078476185]],
 'metadatas': [[{'name': 'potions'}]],
 'embeddings': None,
 'documents': [['The most iconic tool used by the potioneer is the cauldron. Potioneers often have strong emotional ties to the cauldrons they use most frequently, and you will even occasionally see them mumbling to their cauldron during the brewing process, urging the instrument along. Now, before you mock this, imagine you are studying advanced potions techniques following your graduation from Hogwarts. You’ve been awake going on fifty hours straight confirming the alignment of the planets and other celestial bodies, gathering ingredients at precisely the correct moment, crushing or chopping the necessary ingredients, and then brewing an incredibly intricate potion of some sort with several phases and long wait periods in-between. You are alone in your lab standing over your cauldron, this one vessel that holds the potential success of such a long and hard effort. If fin

  from .autonotebook import tqdm as notebook_tqdm
  return self.fget.__get__(instance, owner)()


In [None]:
sentence_transformer_ef = embedding_functions.DefaultEmbeddingFunction()

In [2]:
import chromadb

client = chromadb.PersistentClient('./chromadb_books')

In [3]:
collection = client.create_collection(name='random_sentences', embedding_function=sentence_transformer_ef)

In [4]:
collection = client.get_collection(name='random_sentences', embedding_function=sentence_transformer_ef)

In [None]:
collection = client.get_or_create_collection(name='random_sentences', embedding_function=sentence_transformer_ef)