### Vector Database

In [2]:
import chromadb

client = chromadb.Client()

collection = client.create_collection(name="my_first_chromadb_database")

In [3]:
collection.add(
    documents = ['I want to learn music','I want to tune musical instruments'],
    ids = ['id1', 'id2']
)

In [4]:
all_docs = collection.get()
all_docs

{'ids': ['id1', 'id2'],
 'embeddings': None,
 'documents': ['I want to learn music', 'I want to tune musical instruments'],
 'uris': None,
 'data': None,
 'metadatas': [None, None],
 'included': [<IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

In [5]:
doc1 = collection.get(ids = ['id1'])
doc1

{'ids': ['id1'],
 'embeddings': None,
 'documents': ['I want to learn music'],
 'uris': None,
 'data': None,
 'metadatas': [None],
 'included': [<IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

In [6]:
results = collection.query(query_texts=['I want to play guitar'],
                           n_results=2)

In [7]:
results

{'ids': [['id2', 'id1']],
 'embeddings': None,
 'documents': [['I want to tune musical instruments',
   'I want to learn music']],
 'uris': None,
 'data': None,
 'metadatas': [[None, None]],
 'distances': [[0.8250418305397034, 0.8434193134307861]],
 'included': [<IncludeEnum.distances: 'distances'>,
  <IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

### The first result for the guitar query is 'I want to tune musical instruments' because musical instruments is more related to guitar than music. The distances are euclidean distances between the query and the documents in the collection (vector database), less distance means close relation and more distance means distant relation.

### The advantage of using a vector database (chromadb) is that it is capable of semantic search

### Semantic search is an advanced search technique that aims to improve search accuracy by understanding the meaning and context of words rather than just matching keywords. Unlike traditional keyword-based search, which relies on exact word matches, semantic search interprets the intent behind a query and retrieves results that are conceptually relevant, even if they don’t contain the exact search terms.

### How Semantic Search Works
- Natural Language Processing (NLP) – It processes the query to understand the context, synonyms, and user intent.
Word Embeddings & Vector Representations – Instead of relying on raw text, semantic search converts words and sentences into numerical vectors using techniques like Word2Vec, BERT, or Sentence Transformers.
- Context Awareness – It considers relationships between words. For example, in a search for "best budget gaming laptop," it understands the need for affordability and gaming performance.
- Entity Recognition – Identifies key entities like people, locations, products, and events.
- User Behavior & Personalization – Some semantic search engines use previous interactions to refine results based on user preferences.

### Example:
- Keyword Search: If you search for "Apple," a traditional search might return results about the fruit instead of the company.
- Semantic Search: It understands context and intent—if you were previously searching for iPhones, it might prioritize results about Apple Inc.

###

In [8]:
results = collection.query(query_texts=['I want to play guitar', 'I want to sing'],
                           n_results=2)

In [9]:
results

{'ids': [['id2', 'id1'], ['id1', 'id2']],
 'embeddings': None,
 'documents': [['I want to tune musical instruments', 'I want to learn music'],
  ['I want to learn music', 'I want to tune musical instruments']],
 'uris': None,
 'data': None,
 'metadatas': [[None, None], [None, None]],
 'distances': [[0.8250418305397034, 0.8434193134307861],
  [0.9237728118896484, 1.153193473815918]],
 'included': [<IncludeEnum.distances: 'distances'>,
  <IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

### The first result for the sing query is 'I want to learn music' because music is more related to sing than musical instruments. The distances are euclidean distances between the query and the documents in the collection (vector database), less distance means close relation and more distance means distant relation.

In [10]:
collection_food = client.create_collection("my_food_collection")

In [11]:
collection_food.add(documents=["Dosa", "Frankie", "Pizza", "Burger", "Samosa", "Aloo Paratha"], 
                    ids=['id1', 'id2', "id3", 'id4', "id5", "id6"])

In [12]:
food_results = collection_food.query(query_texts=["I am an Indian"], n_results=6)

In [13]:
food_results

{'ids': [['id6', 'id5', 'id1', 'id4', 'id3', 'id2']],
 'embeddings': None,
 'documents': [['Aloo Paratha',
   'Samosa',
   'Dosa',
   'Burger',
   'Pizza',
   'Frankie']],
 'uris': None,
 'data': None,
 'metadatas': [[None, None, None, None, None, None]],
 'distances': [[1.4621524810791016,
   1.5920627117156982,
   1.627532958984375,
   1.7630388736724854,
   1.793413519859314,
   1.9457755088806152]],
 'included': [<IncludeEnum.distances: 'distances'>,
  <IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

In [14]:
food_results = collection_food.query(query_texts=["I'm an American"], n_results=4)

In [15]:
food_results

{'ids': [['id3', 'id4', 'id6', 'id2']],
 'embeddings': None,
 'documents': [['Pizza', 'Burger', 'Aloo Paratha', 'Frankie']],
 'uris': None,
 'data': None,
 'metadatas': [[None, None, None, None]],
 'distances': [[1.668642520904541,
   1.7323994636535645,
   1.7357228994369507,
   1.8269104957580566]],
 'included': [<IncludeEnum.distances: 'distances'>,
  <IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

### Deleting a collection

In [16]:
collection_food.delete(ids=all_docs['ids'])

In [17]:
collection_food.get()

{'ids': ['id3', 'id4', 'id5', 'id6'],
 'embeddings': None,
 'documents': ['Pizza', 'Burger', 'Samosa', 'Aloo Paratha'],
 'uris': None,
 'data': None,
 'metadatas': [None, None, None, None],
 'included': [<IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

In [19]:
collection_food.delete(ids=all_docs['documents'])

Delete of nonexisting embedding ID: I want to learn music
Delete of nonexisting embedding ID: I want to tune musical instruments
Delete of nonexisting embedding ID: I want to learn music
Delete of nonexisting embedding ID: I want to tune musical instruments


In [20]:
collection_food

Collection(name=my_food_collection)

### Adding Metadata for the documents

In [21]:
collection_food.add(documents=["My friend lives in UK and likes to eat Pizza", "I live in Mumbai and I like Samosa", "I like to eat dosa"], 
                    ids=['id1', 'id2', "id3"],
                    metadatas=[
                        {"url" : "https://en.wikipedia.org/wiki/United_Kingdom"},
                        {"url" : "https://en.wikipedia.org/wiki/India"},
                        {"url" : "https://en.wikipedia.org/wiki/South_India"}
                    ])

Add of existing embedding ID: id3
Insert of existing embedding ID: id3


In [22]:
collection_food.get()

{'ids': ['id3', 'id4', 'id5', 'id6', 'id1', 'id2'],
 'embeddings': None,
 'documents': ['Pizza',
  'Burger',
  'Samosa',
  'Aloo Paratha',
  'My friend lives in UK and likes to eat Pizza',
  'I live in Mumbai and I like Samosa'],
 'uris': None,
 'data': None,
 'metadatas': [None,
  None,
  None,
  None,
  {'url': 'https://en.wikipedia.org/wiki/United_Kingdom'},
  {'url': 'https://en.wikipedia.org/wiki/India'}],
 'included': [<IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

In [24]:
collection_food.get()

{'ids': ['id3', 'id4', 'id5', 'id6', 'id1', 'id2'],
 'embeddings': None,
 'documents': ['Pizza',
  'Burger',
  'Samosa',
  'Aloo Paratha',
  'My friend lives in UK and likes to eat Pizza',
  'I live in Mumbai and I like Samosa'],
 'uris': None,
 'data': None,
 'metadatas': [None,
  None,
  None,
  None,
  {'url': 'https://en.wikipedia.org/wiki/United_Kingdom'},
  {'url': 'https://en.wikipedia.org/wiki/India'}],
 'included': [<IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

In [25]:
results = collection_food.query(query_texts=["India"], n_results=3)

In [26]:
results

{'ids': [['id2', 'id5', 'id6']],
 'embeddings': None,
 'documents': [['I live in Mumbai and I like Samosa',
   'Samosa',
   'Aloo Paratha']],
 'uris': None,
 'data': None,
 'metadatas': [[{'url': 'https://en.wikipedia.org/wiki/India'}, None, None]],
 'distances': [[1.30592942237854, 1.3957486152648926, 1.5312440395355225]],
 'included': [<IncludeEnum.distances: 'distances'>,
  <IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}