<a target="_blank" href="https://colab.research.google.com/github/sergiopaniego/RAG_local_tutorial/blob/main/example_rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Imports

In [1]:
# Install core and visualization libraries
!pip install --upgrade --quiet json-repair networkx langchain-core langchain-google-vertexai langchain-experimental langchain-community langchain_google_genai

# Install additional libraries for scraping and plotting
!pip install --upgrade --quiet requests beautifulsoup4 matplotlib ipywidgets gravis



In [2]:
import os
from langchain_experimental.graph_transformers import LLMGraphTransformer
import networkx as nx
import pandas as pd
import json

from langchain.schema import AIMessage  # Make sure this is imported if needed
from IPython.display import display, clear_output
import ipywidgets as widgets
from langchain.chains import GraphQAChain
from langchain_core.documents import Document
from langchain_community.graphs.networkx_graph import NetworkxEntityGraph
from langchain.indexes import GraphIndexCreator
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import HumanMessage, SystemMessage
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.schema.runnable import RunnableSequence

LLM Model Pipeline

In [3]:
# Set up the Google API Key
google_api_key = "your_key"  # Replace with your actual API key
os.environ["GOOGLE_API_KEY"] = google_api_key

# Initialize the ChatGoogleGenerativeAI model
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0.2,
    verbose=True
)

In [4]:
# Test the model
output = llm.invoke("Write a brief introduction about Llama models.")
print(output.content)

Llama models are a family of large language models (LLMs) developed by Meta AI.  They are known for their impressive capabilities in natural language processing tasks, including text generation, translation, and question answering.  Unlike some other LLMs that are proprietary and closed-source, Llama models, particularly the open-source versions, have fostered significant research and development within the AI community, enabling broader access and experimentation.  Their open nature has contributed to advancements in understanding and improving LLMs, while also raising important discussions about responsible AI development and deployment.


In [5]:
import time

def safe_chat_with_retry(chain, **kwargs):
    while True:
        try:
            # Invoke the LangChain model
            response = chain.invoke(kwargs)
            return response
        except Exception as e:
            # Check for specific quota or rate-limit error
            if "ResourceExhausted" in str(e) or "429" in str(e):
                print("Quota exceeded. Retrying in 1 minute and 5 seconds...")
                time.sleep(65)  # Fixed delay of 1 minute and 5 seconds
            else:
                # If the error is unrelated, re-raise it
                raise e

In [12]:
# Create a prompt template
prompt = PromptTemplate(
    input_variables=["context", "question", "options"],
    template="""
    Question description: {context}
    Question: {question}
    Options: {options}

    Please answer the question based on the given information.
    """
)

# Create a RunnableSequence instead of LLMChain
chain = prompt | llm

# Initialize an empty list to store the results
results = []

df = pd.read_excel('reasoning_dataset_test.xlsx', sheet_name='test')

# Process each row in the DataFrame
for _, row in df.iterrows():
    context = row['context'] if not pd.isna(row['context']) else ""
    question = row['question'] if not pd.isna(row['question']) else ""
    options = row['options'] if not pd.isna(row['options']) else ""
    correct_answer = row['answer'] if not pd.isna(row['answer']) else ""
    label = row['label'] if not pd.isna(row['label']) else ""
    question_type = row['type'] if not pd.isna(row['type']) else ""

    # Create a structured query for the retriever
    retriever_query = f"""
    Description: {context}
    Question: {question}
    Options: {options}
    """
    
    # Retrieve relevant context
    retrieved_context = ""  # Add retrieval logic if applicable
    # For now, we assume no retriever and proceed with the provided context
    
    # Combine the original context with the retrieved context
    combined_context = f"{context}\n\nAdditional context:\n{retrieved_context}"

    # Invoke the model safely with retry
    response = safe_chat_with_retry(chain, context=combined_context, question=question, options=options)

    # Store the result in a dictionary
    result = {
        "description": context,
        "question": question,
        "options": options,
        "gemini_flash_responses": response,
        "correct_answer": correct_answer,
        "label": label,
        "type": question_type
    }
    results.append(result)

    # Print the results for immediate feedback
    print(f"Description: {context}")
    print(f"Question: {question}")
    print(f"Options: {options}")
    print(f" Gemini Flash Responses: {response}")
    print(f"Correct Answer: {correct_answer}")
    print(f"Label: {label}")
    print(f"Type: {question_type}")
    print()


Description: Eight persons namely P, Q, R, S, T, U, V, and W are sitting around a circular table facing the centre of the table but not necessarily in the same order. P sits second to the right of U, who sits opposite to T. Two persons sit between Q and T. S sits third to the right of P. V sits second to the left of W.
Question: How many persons sit between W and R when counted from the right of R?
Options: ["Two", "Three", "Four", "One", "None of these"]

 Gemini Flash Responses: content="Let's analyze the given information step-by-step to arrange the people around the circular table.\n\n1. **P sits second to the right of U, who sits opposite to T:** This gives us the relative positions of P, U, and T.\n\n2. **Two persons sit between Q and T:** This helps place Q.\n\n3. **S sits third to the right of P:** This further refines the arrangement.\n\n4. **V sits second to the left of W:** This provides the relative positions of V and W.\n\n\nLet's use a circular arrangement to visualize:\n

Retrying langchain_google_genai.chat_models._chat_with_retry.<locals>._chat_with_retry in 2.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota)..


Description: 
Question: If you flip a fair coin five times, what is the probability of getting exactly three tails?
Options: 
 Gemini Flash Responses: content="The probability of getting tails on a single flip of a fair coin is 1/2.  The probability of getting heads is also 1/2.\n\nWe need to find the probability of getting exactly three tails in five flips. This is a binomial probability problem.  The formula for the probability of getting exactly k successes in n trials is:\n\nP(X = k) = (nCk) * p^k * (1-p)^(n-k)\n\nWhere:\n\n* n = number of trials (5 coin flips)\n* k = number of successes (3 tails)\n* p = probability of success on a single trial (1/2 for tails)\n* nCk = the number of combinations of n items taken k at a time (also written as  ⁵C₃ or ₅C₃)\n\nLet's calculate:\n\n* nCk = ⁵C₃ = 5! / (3! * 2!) = (5 * 4) / (2 * 1) = 10\n* p^k = (1/2)^3 = 1/8\n* (1-p)^(n-k) = (1/2)^(5-3) = (1/2)^2 = 1/4\n\nTherefore:\n\nP(X = 3) = 10 * (1/8) * (1/4) = 10/32 = 5/16\n\nSo the probability of 

Retrying langchain_google_genai.chat_models._chat_with_retry.<locals>._chat_with_retry in 2.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota)..


Quota exceeded. Retrying in 1 minute and 5 seconds...
Description: 
Question: A card is drawn at random from a pack of 52 playing cards. Find the probability that the card drawn is neither a queen nor a jack.
Options: 
 Gemini Flash Responses: content='There are 4 queens and 4 jacks in a standard deck of 52 playing cards.  Therefore, there are 4 + 4 = 8 cards that are either a queen or a jack.\n\nThe number of cards that are neither a queen nor a jack is 52 - 8 = 44.\n\nThe probability of drawing a card that is neither a queen nor a jack is:\n\n(Number of cards that are neither a queen nor a jack) / (Total number of cards) = 44/52 = 11/13\n\nTherefore, the probability is $\\boxed{11/13}$' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []} id='run-679555d4-2bcd-413d-a91f-80b46547bf38-0' usage_metadata={'input_tokens': 65, 'output_tokens': 136, 'total_tokens': 201, 'input_token_details': {'c

In [13]:
print(results)

[{'description': 'Eight persons namely P, Q, R, S, T, U, V, and W are sitting around a circular table facing the centre of the table but not necessarily in the same order. P sits second to the right of U, who sits opposite to T. Two persons sit between Q and T. S sits third to the right of P. V sits second to the left of W.', 'question': 'How many persons sit between W and R when counted from the right of R?', 'options': '["Two", "Three", "Four", "One", "None of these"]\n', 'gemini_flash_responses': AIMessage(content="Let's analyze the given information step-by-step to arrange the people around the circular table.\n\n1. **P sits second to the right of U, who sits opposite to T:** This gives us the relative positions of P, U, and T.\n\n2. **Two persons sit between Q and T:** This helps place Q.\n\n3. **S sits third to the right of P:** This further refines the arrangement.\n\n4. **V sits second to the left of W:** This provides the relative positions of V and W.\n\n\nLet's use a circula

In [14]:
results[0]

{'description': 'Eight persons namely P, Q, R, S, T, U, V, and W are sitting around a circular table facing the centre of the table but not necessarily in the same order. P sits second to the right of U, who sits opposite to T. Two persons sit between Q and T. S sits third to the right of P. V sits second to the left of W.',
 'question': 'How many persons sit between W and R when counted from the right of R?',
 'options': '["Two", "Three", "Four", "One", "None of these"]\n',
 'gemini_flash_responses': AIMessage(content="Let's analyze the given information step-by-step to arrange the people around the circular table.\n\n1. **P sits second to the right of U, who sits opposite to T:** This gives us the relative positions of P, U, and T.\n\n2. **Two persons sit between Q and T:** This helps place Q.\n\n3. **S sits third to the right of P:** This further refines the arrangement.\n\n4. **V sits second to the left of W:** This provides the relative positions of V and W.\n\n\nLet's use a circu

In [15]:
# Check types of each element in the results[0]
for key, value in results[0].items():
    print(f"Key: {key}, Type: {type(value)}")

Key: description, Type: <class 'str'>
Key: question, Type: <class 'str'>
Key: options, Type: <class 'str'>
Key: gemini_flash_responses, Type: <class 'langchain_core.messages.ai.AIMessage'>
Key: correct_answer, Type: <class 'str'>
Key: label, Type: <class 'float'>
Key: type, Type: <class 'str'>


In [16]:
import json

# Serialize the AIMessage object
def serialize_results(results):
    serialized_results = []
    for result in results:
        serialized_result = {
            'description': result['description'],
            'question': result['question'],
            'options': result['options'],
            'gemini flash responses': result['gemini_flash_responses'].content if isinstance(result['gemini_flash_responses'], AIMessage) else result['gemini_flash_responses'],
            'correct_answer': result['correct_answer'],
            'label': result['label'],
            'type': result['type']
        }
        serialized_results.append(serialized_result)
    return serialized_results

# Save to JSON file
with open("reasoning_gemini_responses.json", "w", encoding="utf-8") as json_file:
    json.dump(serialize_results(results), json_file, indent=4, ensure_ascii=False)

print("Results have been saved to 'reasoning_gemini_responses.json'.")


Results have been saved to 'reasoning_gemini_responses.json'.


In [17]:
import json
import pandas as pd

# Open and load the JSON file
with open("reasoning_gemini_responses.json", "r", encoding="utf-8") as json_file:
    data = json.load(json_file)

# Print number of rows (length of the data)
print(f"Number of rows: {len(data)}")

# Convert the data into a pandas DataFrame
df = pd.DataFrame(data)

# Display the DataFrame in a tabular format
print(df)

Number of rows: 29
                                          description  \
0   Eight persons namely P, Q, R, S, T, U, V, and ...   
1   Eight persons namely P, Q, R, S, T, U, V, and ...   
2   Eight persons namely P, Q, R, S, T, U, V, and ...   
3   Eight persons namely P, Q, R, S, T, U, V, and ...   
4   Eight persons namely P, Q, R, S, T, U, V, and ...   
5   A law firm has exactly nine partners: Fox, Gla...   
6   A law firm has exactly nine partners: Fox, Gla...   
7   A law firm has exactly nine partners: Fox, Gla...   
8   A law firm has exactly nine partners: Fox, Gla...   
9   A law firm has exactly nine partners: Fox, Gla...   
10  A law firm has exactly nine partners: Fox, Gla...   
11                                                      
12                                                      
13                                                      
14                                                      
15                                                      
16          