<a href="https://colab.research.google.com/github/rjhalliday/python-llm/blob/main/langchain_simple_rag_with_gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is an example of a simple Retrieval-Augmented Generation (RAG) using langchain and google Gemini

# Initialisation

In [None]:
!pip install -qU langchain-google-genai


In [None]:
!pip install -U langchain-community



In [None]:
!pip install chromadb



In [None]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings import GooglePalmEmbeddings
from langchain.llms import GooglePalm
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from google.colab import userdata
from langchain_google_genai import ChatGoogleGenerativeAI


import os
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY



**ChatGoogleGenerativeAI**

This initializes a large language model (LLM) interface using the Google Gemini 1.5 Pro model.
temperature=0: This setting makes the model's responses deterministic (i.e., it reduces randomness).
max_tokens=None: No limit on the number of tokens (words, symbols) in the output.
timeout=None: No time limit for the model's response.
max_retries=2: The model will retry twice in case of a failure.

In [None]:
# Initialize the language model using GoogleGenerativeAI
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)

**Embeddings Initialization**

GooglePalmEmbeddings: This initializes an embeddings model from Google Palm, which will be used to convert text into numerical vectors. These vectors are essential for the retrieval process.

In [None]:
#llm = GoogleGenerativeAI(temperature=0)
embeddings = GooglePalmEmbeddings()

# Retrieval

**Document Loading**

* WebBaseLoader: Loads the content from the specified URL, which in this case is the Wikipedia page for "Plug-in electric vehicle".
* loader.load(): Downloads the content from the webpage and converts it into a document format that can be processed further.

In [None]:
# As of writing, this is currently the longest article on wikipedia
loader = WebBaseLoader("https://en.wikipedia.org/wiki/Plug-in_electric_vehicle")
documents = loader.load()

**Text Splitting**
* RecursiveCharacterTextSplitter: Splits the loaded document into smaller chunks of text.
* chunk_size=500: Each chunk will have up to 500 characters.
* chunk_overlap=0: There is no overlap between the chunks.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

**Creating a Vector Store**

A vector store that will hold the text chunks and their corresponding embeddings.
* from_documents: Converts the chunks of text into vectors using the previously defined embeddings, and stores them in the vector store (db).

In [None]:
db = Chroma.from_documents(texts, embeddings)

**Retriever setup**

Converts the vector store into a retriever object, allowing it to retrieve relevant chunks of text based on a query.

In [None]:
retriever = db.as_retriever()


**QA Chain Setup**

* RetrievalQA: Combines the retrieval and LLM components to create a question-answering (QA) chain.
* from_chain_type: Specifies the type of chain to use. In this case, "stuff" likely indicates a simple QA setup.
* The retriever fetches relevant chunks of text, and the llm processes them to generate an answer.

In [None]:
qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

# Augmentation

We haven't really got an augmentation phase here, but typically Augmentation involves integrating this information to provide additional context or details. For instance, the augmentation step might involve:
* Combining Information, where the system synthesises the key points from multiple passages to provide a holistic view. It might recognise that one passage emphasises the punitive measures and another highlights the diplomatic context.
* Refining Context, where the system might also structure the information to ensure that the response is coherent and addresses the query comprehensively.

# Generation

**Execute a Query**

* A query is defined to ask about the number of plug-in cars sold in 2016.
* The qa_chain is executed with the query, which retrieves relevant information from the vector store and generates an answer using the LLM.
* Print the result

In [None]:
query = "What is the best selling car in the US?"
result = qa_chain({"query": query})
print(result["result"])

The provided text states that at the end of 2019, the all-time top-selling plug-in car in the U.S. was the Tesla Model 3 with 300,471 units sold. However, it does not specify the best-selling car overall in the U.S. 

