I have used Chroma DB and FAISS till now. Now I will use AstraDB too

#### 1. **Chroma DB**:
   - **Type**: Vector database.
   - **Purpose**: Designed to handle and search through high-dimensional vector embeddings efficiently. 
   - **Use Case**: Often used for tasks like similarity search and nearest neighbor search where data is represented as vectors, such as in machine learning and AI applications.
   - **Key Features**: 
     - Optimized for storing and retrieving vector embeddings.
     - Supports operations such as similarity searches over large datasets.

#### 2. **FAISS** (Facebook AI Similarity Search):
   - **Type**: Library for efficient similarity search and clustering of dense vectors.
   - **Purpose**: Primarily used for similarity search and clustering of high-dimensional vectors.
   - **Use Case**: Commonly used in scenarios where you need to find the closest vectors (nearest neighbors) in a high-dimensional space, such as in recommendation systems or image search.
   - **Key Features**:
     - Provides various algorithms and data structures for efficient vector search.
     - Optimized for large-scale and high-dimensional data.

#### 3. **Astra DB**:
   - **Type**: Managed database-as-a-service (DBaaS).
   - **Purpose**: A fully managed cloud service based on Apache Cassandra, designed for scalable and highly available NoSQL database needs.
   - **Use Case**: Used for applications requiring high availability and scalability for data storage and retrieval, such as in large-scale web applications, IoT data storage, and more.
   - **Key Features**:
     - Automated management of Cassandra infrastructure.
     - Built-in features for scaling, security, and performance management.
     - Provides both traditional database functionalities and distributed data storage.

**In summary**:
- **Chroma DB** and **FAISS** are focused on managing and searching vector embeddings and are typically used in AI and machine learning applications.
- **Astra DB** is a managed NoSQL database service based on Apache Cassandra, designed for handling large-scale data storage and distributed databases.

In [1]:
from langchain.vectorstores.cassandra import Cassandra
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain_groq import ChatGroq
from langchain_huggingface import HuggingFaceEmbeddings

from datasets import load_dataset
import cassio

from pypdf import PdfReader

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
from dotenv import load_dotenv
load_dotenv()
import os

astra_db_token = os.getenv('ASTRA_DB_TOKEN')
astra_db_id = os.getenv('ASTRA_DB_ID')
groq_api_key = os.getenv('GROQ_API_KEY')
os.environ['HUGGINGFACE_API_KEY'] = os.getenv('HUGGINGFACE_API_KEY')

In [3]:
pdf_reader = PdfReader('vivekananda_chicago.pdf')

raw_text = ''
for i, page in enumerate(pdf_reader.pages):
    content = page.extract_text()
    if content:
        raw_text += content


In [4]:
raw_text

'SWAMI VIVEKANANDA’S  SPEECH AT\nWORLD PARLIAMENT OF RELIGION, CHICAGO\nRESPONSE TO WELCOME\nSisters and Brothers of America,\n      It fills my heart with joy unspeakable to rise in respons e to the warm and cordial welcome\nwhich you have given us. I thank you in the name of the most ancient order of monks in the\nworld; I thank you in the name of the mother  of religions; and I thank you in the name of\nmillions and millions of Hindu people of all classes and sects. My thanks, also, to some of the\nspeakers on this platform who, referring to the delegates from the Orient, have told you that\nthese men from far-off nations may well claim the honour of bearing to different lands the\nidea of toleration. I am proud to belong to a religion which has taught the world both\ntolerance and universal acceptance. We believe not only in universal toleration, but we accept\nall religions as true. I am proud to belong to a nation which has shelte red the persecuted and\nthe refugees of all relig

In [5]:
cassio.init(token= astra_db_token, database_id= astra_db_id)

In [6]:
llm = ChatGroq(model='llama3-8b-8192', api_key= groq_api_key)
embedding = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')



In [7]:
astra_vector_store = Cassandra(embedding= embedding, session= None, keyspace= None, table_name='RAG_with_AstrDB')

In [8]:
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size = 500,
    chunk_overlap = 100,
    length_function = len
)

chunks = text_splitter.split_text(raw_text)

In [9]:
chunks

['SWAMI VIVEKANANDA’S  SPEECH AT\nWORLD PARLIAMENT OF RELIGION, CHICAGO\nRESPONSE TO WELCOME\nSisters and Brothers of America,\n      It fills my heart with joy unspeakable to rise in respons e to the warm and cordial welcome\nwhich you have given us. I thank you in the name of the most ancient order of monks in the\nworld; I thank you in the name of the mother  of religions; and I thank you in the name of\nmillions and millions of Hindu people of all classes and sects. My thanks, also, to some of the',
 'millions and millions of Hindu people of all classes and sects. My thanks, also, to some of the\nspeakers on this platform who, referring to the delegates from the Orient, have told you that\nthese men from far-off nations may well claim the honour of bearing to different lands the\nidea of toleration. I am proud to belong to a religion which has taught the world both\ntolerance and universal acceptance. We believe not only in universal toleration, but we accept',
 'tolerance and univ

In [10]:
astra_vector_store.add_texts(chunks[:50])
print(f"Inserted {len(chunks[:50])} lines")
astra_vector_index = VectorStoreIndexWrapper(vectorstore= astra_vector_store)

Inserted 50 lines


In [11]:
question = "Tell me about 'Buddhism, the fulfillment of Hinduism'."
answer = astra_vector_index.query(question=question, llm=llm)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [12]:
answer

'According to the text, Buddhism is seen as a fulfillment of Hinduism because it brought a reforming zeal, sympathy, and charity that was lacking in Hinduism at the time. Buddhism is described as having a "wonderful leaven" that it brought to the masses, which rendered Indian society great. The text also states that Buddhism brought a sense of equality among people, as a Brahmin and a Shudra could become a monk together and be equal.\n\nThe text also suggests that Hinduism and Buddhism are closely related and that one cannot exist without the other. It is stated that "Hinduism cannot live without Buddhism, nor Buddhism without Hinduism."'

In [13]:
astra_vector_store.similarity_search_with_score(query=question, k=5)

[(Document(page_content='from the nation that eternal God to which every one, man or woman, clings so fondly. And\nthe result was  that Buddh ism had to die a natural death in India. At the present day there is\nnot one who calls oneself a Buddhist in India, the land of its birth.\n      But at the same time, Brahminism lo st something--that reforming zeal, that wonderful'),
  0.8287496301841819),
 (Document(page_content='between Hinduism (by Hinduism, I mean the re ligion of the Vedas) and what is called\nBuddhism at the present day is nearly the same as between J udaism and Christianity. Jesus\nChrist was a Jew, and Shakya Muni was a Hindu. The Jews rejected Jesus Christ, nay,\ncrucified him, and the Hindus have accepted Shakya Muni as God and worship him. But\nthe real difference that we Hindus want to  show between modern Buddhism and what we'),
  0.8274387309308335),
 (Document(page_content='But at the same time, Brahminism lo st something--that reforming zeal, that wonderful\nsym

In [16]:
question = "Tell me about 'Paper on Hinduism, Read at the Parliament on 19th September, 1893'."

answer = astra_vector_index.query(question=question, llm=llm)
print(f"Response from RAG: {answer}\n\n")


for doc, score in astra_vector_store.similarity_search_with_score(query=question, k=5):
    print(f"Score: {score}       Doc: {doc.page_content}")

Response from RAG: The "Paper on Hinduism, Read at the Parliament on 19th September, 1893" is a speech delivered by Swami Vivekananda, an Indian Hindu monk and philosopher, at the World Parliament of Religions in Chicago, Illinois, on September 19, 1893. In this speech, Vivekananda introduced Hinduism to the Western world and presented its philosophy, beliefs, and practices to a global audience.

The speech was a significant event in the history of interfaith dialogue and helped to popularize Hinduism in the Western world. Vivekananda's speech emphasized the importance of tolerance, understanding, and mutual respect among different religions and cultures. He also highlighted the commonalities between Hinduism and other major world religions, such as Buddhism, Christianity, and Islam.

The speech is known for its eloquent language, poetic imagery, and profound insights into the nature of God, the universe, and human existence. Vivekananda's words have inspired millions of people around 