Reference:: 
* https://www.linkedin.com/pulse/build-lightning-fast-rag-chatbot-powered-groqs-lpu-ollama-multani-ssloc/
* https://colab.research.google.com/drive/1Obrby8RniFfjUvf3DhbNHC6-CmBdiXbY?usp=sharing


In [2]:
!ollama pull nomic-embed-text

[?25lpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ 

In [3]:
from langchain_groq import ChatGroq
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community import embeddings
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyPDFDirectoryLoader

import os
import time
import textwrap
import gradio as gr

  from .autonotebook import tqdm as notebook_tqdm


In [7]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

# openai_api_key = os.environ['OPENAI_API_KEY']
# hf_api_key = os.environ['HF_API_KEY']

groq_api_key = os.environ['GROQ_API_KEY']

In [4]:
loader = PyPDFDirectoryLoader("data")
the_text = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(the_text)

In [None]:
# For text files
# file_path = r'data/YourTextFileName.txt'
# loader = TextLoader(file_path)
# the_text = loader.load()
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
# chunks = text_splitter.split_documents(the_text)

In [5]:
chunks

[Document(page_content='Peter Schneider\nExtragalactic\nAstronomyand Cosmology\nAn Introduction', metadata={'source': 'data/Extragalactic-Astronomy-and-Cosmology-An-Introduction.pdf', 'page': 1}),
 Document(page_content='Peter Schneider\nExtragalactic Astronomy\nand Cosmology\nAn Introduction\nWith 446 figures, including 266 color figures\n123', metadata={'source': 'data/Extragalactic-Astronomy-and-Cosmology-An-Introduction.pdf', 'page': 3}),
 Document(page_content='Prof. Dr. Peter Schneider\nArgelander-Institut für Astronomie\nUniversität Bonn\nAuf dem Hügel 71D-53121 Bonn, Germany\ne-mail: peter@astro.uni-bonn.de\nLibrary of Congress Control Number: 2006931134\nISBN-10 3-540-33174-3\nSpringer Berlin Heidelberg New York\nISBN-13 978-3-540-33174-2\nSpringer Berlin Heidelberg New York\nCover: The cover shows an HST image of the cluster RXJ 1347 −1145,\nthe most X-ray luminous cluster of galaxies known. The large number\nof gravitationally lensed arcs, of which only two of them have been

In [6]:
vectorstore = Chroma.from_documents(
    documents=chunks,
    collection_name="ollama_embeds",
    embedding=embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text'),
)
retriever = vectorstore.as_retriever()

In [8]:
llm = ChatGroq(
            groq_api_key=groq_api_key,
            model_name='mixtral-8x7b-32768'
    )

In [9]:
rag_template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
rag_prompt = ChatPromptTemplate.from_template(rag_template)
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

In [10]:
# Test the architecture with a simple hard coded question
response = rag_chain.invoke("What is this document about")
print(textwrap.fill(response, width=80))

The provided document appears to be a collection of pages from a book or
document titled "Extragalactic Astronomy and Cosmology: An Introduction". The
document includes various sections that cover different topics in astronomy and
cosmology, such as:  * Clusters and groups of galaxies, including the lens
effect of galaxy clusters and the distribution of mass in these clusters. *
X-ray radiation from clusters of galaxies, including the distribution of
intergalactic gas and the presence of giant arcs caused by the gravitational
lens effect. * The Milky Way as a galaxy, including the presence of a
supermassive black hole at the center and the distribution of stars and hot gas
within the galaxy. * An overview of the structure of the Milky Way, including
the disk, central bulge, and spherical halo where most globular clusters are
located.  The document also includes figures and captions that provide visual
aids to help illustrate these concepts.


In [11]:
# Make the questions dynamic using a chat interface. Let's use gradio for this.
def process_question(user_question):
    start_time = time.time()

    # Directly using the user's question as input for rag_chain.invoke
    response = rag_chain.invoke(user_question)

    # Measure the response time
    end_time = time.time()
    response_time = f"Response time: {end_time - start_time:.2f} seconds."

    # Combine the response and the response time into a single string
    full_response = f"{response}\n\n{response_time}"

    return full_response

# Setup the Gradio interface
iface = gr.Interface(fn=process_question,
                     inputs=gr.Textbox(lines=2, placeholder="Type your question here..."),
                     outputs=gr.Textbox(),
                     title="GROQ CHAT",
                     description="Ask any question about your document, and get an answer along with the response time.")

# Launch the interface
iface.launch()

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




### Whole code below... upgraded

In [16]:
from langchain_groq import ChatGroq
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community import embeddings
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyPDFDirectoryLoader

import time
import textwrap
import gradio as gr
from gradio import File, Dropdown

import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

# openai_api_key = os.environ['OPENAI_API_KEY']
# hf_api_key = os.environ['HF_API_KEY']

groq_api_key = os.environ['GROQ_API_KEY']


# Define the RAG components
def setup_rag_interface(file, model_name, question):
    # Load PDF documents from the uploaded file
    loader = PyPDFDirectoryLoader(file.name)
    the_text = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    chunks = text_splitter.split_documents(the_text)

    # Setup the vector store and retriever
    vectorstore = Chroma.from_documents(
        documents=chunks,
        collection_name="ollama_embeds",
        embedding=embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text'),
    )
    retriever = vectorstore.as_retriever()

    # Select the appropriate language model
    llm_models = {
        "Mixtral 8x7b 32768": 'mixtral-8x7b-32768',
        "Llama2 70b 4096": 'llama2-70b-4096',
        "Gemma 7b it": 'gemma-7b-it'
    }
    llm_model = llm_models[model_name]
    llm = ChatGroq(
        groq_api_key=groq_api_key,
        model_name=llm_model
    )

    # Define the RAG template and chain
    rag_template = """Answer the question based only on the following context:
    {context}
    Question: {question}
    """
    rag_prompt = ChatPromptTemplate.from_template(rag_template)
    rag_chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | rag_prompt
        | llm
        | StrOutputParser()
    )

    return rag_chain.invoke(question)

# Define the Gradio interface function
def gr_interface(file, model_name, question):
    response = setup_rag_interface(file, model_name, question)
    return response

# Setup the Gradio interface
iface = gr.Interface(fn=gr_interface,
                     inputs=[gr.UploadButton("📁 Upload a PDF", file_types=[".pdf"]), 
                             Dropdown(["Mixtral 8x7b 32768", "Llama2 70b 4096", "Gemma 7b it"], label="Select Language Model"),
                             gr.Textbox(lines=2, placeholder="Type your question here...")],
                     outputs=gr.Textbox(),
                     title="GROQ CHAT",
                     description="Ask any question about your document, and get an answer along with the response time.")

# Launch the interface
iface.launch()


Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.




Traceback (most recent call last):
  File "/Users/panchamb/miniforge3/envs/env-rag/lib/python3.10/site-packages/gradio/queueing.py", line 501, in call_prediction
    output = await route_utils.call_process_api(
  File "/Users/panchamb/miniforge3/envs/env-rag/lib/python3.10/site-packages/gradio/route_utils.py", line 253, in call_process_api
    output = await app.get_blocks().process_api(
  File "/Users/panchamb/miniforge3/envs/env-rag/lib/python3.10/site-packages/gradio/blocks.py", line 1695, in process_api
    result = await self.call_function(
  File "/Users/panchamb/miniforge3/envs/env-rag/lib/python3.10/site-packages/gradio/blocks.py", line 1235, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/Users/panchamb/miniforge3/envs/env-rag/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/Users/panchamb/miniforge3/envs/env-rag/lib/python3.10/site-packages/anyio/_backends

In [18]:
!ollama pull nomic-embed-text

[?25lpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ 

In [17]:
import gradio as gr
import os
import time
from langchain_groq import ChatGroq
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community import embeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Define the environment variables
groq_api_key = os.environ['GROQ_API_KEY']

# Define the RAG components
def setup_rag_interface(model_name, question):
    # Load PDF documents from the 'data' directory
    loader = PyPDFDirectoryLoader("data")
    the_text = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    chunks = text_splitter.split_documents(the_text)

    # Setup the vector store and retriever
    vectorstore = Chroma.from_documents(
        documents=chunks,
        collection_name="ollama_embeds",
        embedding=embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text'),
    )
    retriever = vectorstore.as_retriever()

    # Select the appropriate language model
    llm_models = {
        "Mixtral 8x7b 32768": 'mixtral-8x7b-32768',
        "Llama2 70b 4096": 'llama2-70b-4096',
        "Gemma 7b it": 'gemma-7b-it'
    }
    llm_model = llm_models[model_name]
    llm = ChatGroq(
        groq_api_key=groq_api_key,
        model_name=llm_model
    )

    # Define the RAG template and chain
    rag_template = """Answer the question based only on the following context:
    {context}
    Question: {question}
    """
    rag_prompt = ChatPromptTemplate.from_template(rag_template)
    rag_chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | rag_prompt
        | llm
        | StrOutputParser()
    )

    return rag_chain.invoke(question)

# Define the Gradio interface function
def gr_interface(model_name, question):
    response = setup_rag_interface(model_name, question)
    return response

# Setup the Gradio interface
iface = gr.Interface(fn=gr_interface,
                     inputs=[gr.Dropdown(["Mixtral 8x7b 32768", "Llama2 70b 4096", "Gemma 7b it"], label="Select Language Model"),
                             gr.Textbox(lines=2, placeholder="Type your question here...")],
                     outputs=gr.Textbox(),
                     title="GROQ CHAT",
                     description="Ask any question about your document, and get an answer along with the response time.")

# Launch the interface
iface.launch()


Running on local URL:  http://127.0.0.1:7863

To create a public link, set `share=True` in `launch()`.




In [None]:
import gradio as gr
import os
import time
from langchain_groq import ChatGroq
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community import embeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Define the environment variables
groq_api_key = os.environ['GROQ_API_KEY']

# Define the RAG components
def setup_interface(model_name, question):
    # Load PDF documents from the 'data' directory
    loader = PyPDFDirectoryLoader("data")
    the_text = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100)
    chunks = text_splitter.split_documents(the_text)

    # Setup the vector store and retriever
    vectorstore = Chroma.from_documents(
        documents=chunks,
        collection_name="ollama_embeds",
        embedding=embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text'),
    )
    retriever = vectorstore.as_retriever()

    # Select the appropriate language model
    llm_models = {
        "Mixtral 8x7b 32768": 'mixtral-8x7b-32768',
        "Llama2 70b 4096": 'llama2-70b-4096',
        "Gemma 7b it": 'gemma-7b-it'
    }
    llm_model = llm_models[model_name]
    llm = ChatGroq(
        groq_api_key=groq_api_key,
        model_name=llm_model
    )

    # Define the RAG template and chain
    template = """Answer the question based only on the following context:
    {context}
    Question: {question}
    """
    prompt = ChatPromptTemplate.from_template(rag_template)
    chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

    return chain.invoke(question)

# Define the Gradio interface function
def gr_interface(model_name, question):
    response = setup_interface(model_name, question)
    return response

# Setup the Gradio interface
iface = gr.Interface(fn=gr_interface,
                     inputs=[gr.Dropdown(["Mixtral 8x7b 32768", "Llama2 70b 4096", "Gemma 7b it"], label="Select Language Model"),
                             gr.Textbox(lines=2, placeholder="Type your question here...")],
                     outputs=gr.Textbox(),
                     title="GROQ CHAT",
                     description="Ask any question about your document, and get an answer along with the response time.")

# Launch the interface
iface.launch()


### This is very slow? Why? Take out the dropdown for models

In [28]:
import gradio as gr
import os
import time
from langchain_groq import ChatGroq
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community import embeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Define the environment variables
groq_api_key = os.environ['GROQ_API_KEY']

# Define the RAG components
def setup_interface(model_name, question):
    # Load PDF documents from the 'data' directory
    loader = PyPDFDirectoryLoader("data")
    the_text = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100)
    chunks = text_splitter.split_documents(the_text)

    # Setup the vector store and retriever
    vectorstore = Chroma.from_documents(
        documents=chunks,
        collection_name="ollama_embeds",
        embedding=embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text'),
    )
    retriever = vectorstore.as_retriever()

    # # Select the appropriate language model
    # llm_models = {
    #     "Mixtral 8x7b 32768": 'mixtral-8x7b-32768',
    #     "Llama2 70b 4096": 'llama2-70b-4096',
    #     "Gemma 7b it": 'gemma-7b-it'
    # }
    # llm_model = llm_models[model_name]
    llm = ChatGroq(
        groq_api_key=groq_api_key,
        model_name='llama2-70b-4096'
    )

    # Define the RAG template and chain
    template = """Answer the question based only on the following context:
    {context}
    Question: {question}
    """
    prompt = ChatPromptTemplate.from_template(template)
    chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

    return chain.invoke(question)

# Define the Gradio interface function
def gr_interface(model_name, question):
    response = setup_interface(model_name, question)
    return response

# # Get absolute path of image file
# image_path = 'cmb.png' # Replace with your image file path
# absolute_path = os.path.abspath(image_path)

css_code='body{background-image:url("https://en.wikipedia.org/wiki/Cosmic_microwave_background#/media/File:Ilc_9yr_moll4096.png");}'

# Setup the Gradio interface
iface = gr.Interface(fn=process_question,
                     inputs=gr.Textbox(label='User Question', lines=2, placeholder="Type your question here... "),
                     outputs=gr.Textbox(label='LLM Response'),
                     title="A Chat about the Oldest Light in the Universe",
                     css=css_code,
                     description="Ask any question about The Cosmic Microwave Background Radiation. It's niche but it's magical! ",
                    )

# Launch the interface
iface.launch()


Running on local URL:  http://127.0.0.1:7871

To create a public link, set `share=True` in `launch()`.


