# RAG and Function-Calling

This notebook has been prepared by[ Enigma - The AI/ML Club, IIIT Kottayam](https://enigma.iiitkottayam.ac.in/)

Do consider following our socials incase you like and reproduce our work

- [Instagram](https://www.instagram.com/enigma_iiitk?igsh=NTc4MTIwNjQ2YQ)
- [Linkedin](https://www.linkedin.com/company/enigma-iiitkottayam/)


In [1]:
%%capture
!pip install -qU langchain-google-genai langchain-chroma langchain-community pypdf

In [2]:
!mkdir ./Documents

In [3]:
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.schema import Document
from langchain.vectorstores.chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
import time
import os
import shutil
from google.colab import files

## Enter your API Key
Create here https://aistudio.google.com/app/apikey

In [1]:
DOCUMENT_PATH = './Documents/'
CHROMA_PATH = './chroma/'
API_KEY = 'AIzaSyAYew4okjx4jmR7xbKhLj2mAckgtUUbR-k'

In [5]:
llm = ChatGoogleGenerativeAI(
                api_key=API_KEY,
                model="gemini-1.5-pro",
                temperature=0.7,
                max_output_tokens=1200,
      )

In [7]:
llm.invoke("What is enigma").content

'"Enigma" can refer to a few things, but you\'re most likely thinking of one of these:\n\n**1. The Enigma Machine:**\n\n* This is the most common meaning of "enigma." It was a **cipher device** used by Nazi Germany during World War II to protect military communication. \n* The Enigma machine looked like a typewriter and used a complex system of rotors and electrical circuits to encrypt and decrypt messages.\n* Breaking the Enigma code was a crucial accomplishment for the Allied forces and is considered to have significantly shortened the war.\n\n**2.  Something Mysterious:**\n\n* "Enigma" can also refer to anything that is **puzzling, mysterious, or difficult to understand**. For example, you might say, "The origins of the universe are an enigma to scientists."\n\n**3. Other Uses:**\n\n*  **Enigma Variations:** This is the name of a famous piece of music by Edward Elgar.\n* **Various bands and artists:** Several bands and musical artists have adopted the name "Enigma."\n\n**To figure o

## RAG

### Helper functions

In [8]:
def upload_files(upload_path):
  '''
  Upload PDF documents to the specified directory.

  Returns:
  List of uploaded file names.
  '''
  print("Enter the PDF document:")
  uploaded = files.upload()
  for filename, content in uploaded.items():
    dst_path = os.path.join(upload_path, filename)
    shutil.move(filename, dst_path)
  return list(uploaded.keys())

In [9]:
def load_documents():
  '''
  Load PDF documents from the specified directory using PyPDFDirectoryLoader.

  Returns:
  List of Document objects.
  '''
  documents = PyPDFDirectoryLoader(DOCUMENT_PATH).load()
  return documents


In [11]:
def split_texts(documents):
  '''
  Split the loaded documents into smaller chunks using RecursiveCharacterTextSplitter.

  Returns:
  List of Document objects.
  '''
  text_splitter = RecursiveCharacterTextSplitter(
      chunk_size=400,
      chunk_overlap=20,
      length_function=len,
  )
  chunks = text_splitter.split_documents(documents)
  return chunks

In [12]:
def adding_to_chroma(chunks):
  '''
  Converting the split chunks into Embeddings and storing them in a Chroma vector database.

  Returns:
  Chroma vector database Object.
  '''
  time.sleep(1)
  embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001",google_api_key=API_KEY)
  db = Chroma.from_documents(chunks, embeddings)
  return db

In [13]:
PROMPT = '''
You are an academics helper. You are a personalized assistant. Answer the questions based only on the given context below:

CONTEXT:
{context}

QUESTION:
{query}
'''

In [14]:
def get_rag_chain():
  '''
  Creating a prompt template and a RAG chain object.

  Returns:
  rag chain object.
  '''

  prompt_template = PromptTemplate(template=PROMPT, input_variables=["context", "query"])

  rag_chain = prompt_template | llm  | StrOutputParser()

  return rag_chain


In [15]:
def get_vector_db():
  '''
  Creating a vector database object.

  Returns:
  Vector database object.
  '''
  upload_files(DOCUMENT_PATH)
  documents = load_documents()
  chunks = split_texts(documents)
  db = adding_to_chroma(chunks)
  return db

In [16]:
def query_rag_chain(query, db, rag_chain):
  '''
  Querying the RAG chain with the given query.

  Returns:
  Response from the RAG chain.
  '''
  results = db.similarity_search_with_relevance_scores(query, k=3)

  if len(results) == 0 or results[0][1] < 0.3:
    return "No documents found."

  context_text = "\n\n - -\n\n".join([doc.page_content for doc, _score in results])
  response = rag_chain.invoke({"context": context_text, "query": query})
  return response, results

### Upload your pdf

In [17]:
db = get_vector_db()

Enter the PDF document:


Saving Antarctica Log.pdf to Antarctica Log.pdf


In [18]:
rag_chain = get_rag_chain()

In [19]:
def ask(query):
  response, context = query_rag_chain(query, db, rag_chain)
  return response,context

### Ask your queries

In [20]:
answer, context = ask("who are engima's mascots")

In [21]:
context

[(Document(metadata={'page': 16, 'source': 'Documents/Antarctica Log.pdf'}, page_content='Enigma\nis\nthe\nAI/ML\nclub\nof\nIIIT\nKottayam.\nInaugurated\non\nthe\n15th\nFebruary\n2024,\nthe\nclub\nhas\nsince\nthen\nhosted\na\nseries\nof\nevents,\ncontests\nand\nhas\nfostered\na\ncommunity\nof\nover\n100+\nmembers.\nThe\nclub\nhas\n2\nmascots,\na\ncyborg-penguin\nparent\nchild\npair\nnamed\nTuring\nand\nPebbles.\nThis\ndocument\ncontains\nall\ntheir\nmischief\nwhile\nthey\nconduct\nstudies\nin\nthe\ncold,\ninhabitable\nAntarctica.\nMeet\nthe\nteam'),
  0.517214298775531),
 (Document(metadata={'page': 16, 'source': 'Documents/Antarctica Log.pdf'}, page_content='Amit\nAnand,\nVinayak\nSharma,\nVipin\nKarthic\nWant\nto\nbe\npart\nof\nthe\ncommunity\n?\nYou\ncan\njoin\nus\non\nthese\nlinks: \nLinkedin:\nhttps://www.linkedin.com/company/enigma-iiitkottayam/?viewAsMember=true \nDiscord:\nhttps://discord.com/invite/crVwcpY34q \nInstagram:\nhttps://www.instagram.com/enigma_iiitk/'),
  0.4244417

In [22]:
answer

"Enigma's mascots are a cyborg-penguin parent-child pair named Turing and Pebbles. \n"

## Function Calling with LLMs

In [23]:
from langchain_core.messages import HumanMessage
from langchain_core.tools import tool
import requests

### Tools

In [24]:
@tool
def get_weather(city: str) -> str:
  """
  Gets the current weather for the given city.
  """
  try:
    url = f"https://wttr.in/{city}?format=j1"  # Using wttr.in API
    response = requests.get(url)
    response.raise_for_status()  # Raise an exception for bad status codes
    data = response.json()

    current_condition = data["current_condition"][0]
    temperature = current_condition["temp_C"]
    weather_desc = current_condition["weatherDesc"][0]["value"]

    return f"The current weather in {city} is {weather_desc} with a temperature of {temperature}°C."

  except requests.exceptions.RequestException as e:
    return f"Error fetching weather data: {e}"

@tool
def get_news(category: str) -> str:
    """
    Fetches news articles related to a given category
    """
    url = f"https://saurav.tech/NewsAPI/top-headlines/category/{category.lower()}/in.json"
    response = requests.get(url)

    if response.status_code == 200:
        response = response.json()
        total_results = response.get('totalResults')
        articles = response.get('articles')[:min(3,total_results)]
        formatted_news = "".join(
            f"{i+1}. {article.get('title')}\n"
            f"{article.get('description')}\n"
            f"Link: {article.get('url')}\n"
            f"Published: {article.get('publishedAt')[:10]}\n\n\n"
            for i, article in enumerate(articles)
        )
        return formatted_news
    else:
        return "We support only the following categories: {business / entertainment / general / health / science / sports / technology}"

@tool
def get_enigma_info(query: str) -> str:
  """
  Retrieves information about Enigma , AI/ML Club and it's mascots stories
  """
  answer, context = ask(query)
  return answer

In [25]:
TOOLS = [get_weather, get_news, get_enigma_info]
TOOL_MAPPINGS = {
    "get_weather": get_weather,
    "get_news": get_news,
    'get_enigma_info': get_enigma_info,
}

llm_with_tools = llm.bind_tools(TOOLS)



In [26]:
def ask_more(query):
  messages = [HumanMessage(query)]
  ai_msg = llm_with_tools.invoke(messages)
  messages.append(ai_msg)

  if len(ai_msg.tool_calls)==0:
    return ai_msg.content

  for tool_call in ai_msg.tool_calls:
    selected_tool = TOOL_MAPPINGS[tool_call["name"].lower()]
    tool_msg = selected_tool.invoke(tool_call)
    messages.append(tool_msg)

  final_response = llm_with_tools.invoke(messages)
  return final_response.content

### Ask your queries

In [30]:
query = "what is the weather in kochi right now?"

In [31]:
%%capture
answer = ask_more(query)

In [32]:
print(answer)

The current weather in kochi is Haze with a temperature of 30°C. 

