<a target="_blank" href="https://colab.research.google.com/github/sergiopaniego/RAG_local_tutorial/blob/main/example_rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Imports

In [1]:
import subprocess

# Define the packages to install
packages = [
    "langchain",
    "langchain_pinecone",
    "langchain[docarray]",
    "docarray",
    "pypdf",
    "langchain-ollama"
]

# Install packages silently without output
for package in packages:
    subprocess.run(['pip3', 'install', package], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

In [19]:
import networkx as nx
import pandas as pd
import json

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.schema.runnable import RunnableSequence
from langchain.schema import AIMessage  # Make sure this is imported if needed

In [2]:
!ollama list

NAME                       ID              SIZE      MODIFIED    
all-minilm:l6-v2           1b226e2802db    45 MB     3 weeks ago    
llama3.1:8b                46e0c10c039e    4.9 GB    3 weeks ago    
mistral:latest             f974a74358d6    4.1 GB    3 weeks ago    
nomic-embed-text:latest    0a109f422b47    274 MB    3 weeks ago    


LLM Model Pipeline

In [3]:
#MODEL = "gpt-3.5-turbo"
#MODEL = "mixtral:8x7b"
#MODEL = "gemma:7b"
#MODEL = "llama2"
MODEL = "llama3.1:8b" # https://ollama.com/library/llama3

In [6]:
from langchain_ollama import OllamaLLM, OllamaEmbeddings

# Replace MODEL with the desired model name, e.g., "llama-2"
llm = OllamaLLM(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

In [8]:
# Test the model
output = llm.invoke("Write a brief introduction about Llama models.")
print(output)

I can provide information on Llama models. However, I'm still an AI and don't have have personal opinions or experiences.

Llama (Large Language Model Meta AI) is a line of artificial intelligence models developed by Meta, designed to process and generate human-like language. These models are trained on vast amounts of text data, allowing them to understand and respond to complex questions and prompts in a coherent and context-specific manner.


In [9]:
import time

def safe_chat_with_retry(chain, **kwargs):
    while True:
        try:
            # Invoke the LangChain model
            response = chain.invoke(kwargs)
            return response
        except Exception as e:
            # Check for specific quota or rate-limit error
            if "ResourceExhausted" in str(e) or "429" in str(e):
                print("Quota exceeded. Retrying in 1 minute and 5 seconds...")
                time.sleep(65)  # Fixed delay of 1 minute and 5 seconds
            else:
                # If the error is unrelated, re-raise it
                raise e

In [14]:
# Create a prompt template
prompt = PromptTemplate(
    input_variables=["context", "question", "options"],
    template="""
    Question description: {context}
    Question: {question}
    Options: {options}

    Please answer the question based on the given information.
    """
)

# Create a RunnableSequence instead of LLMChain
chain = prompt | llm

# Initialize an empty list to store the results
results = []

df = pd.read_excel('reasoning_dataset_test.xlsx', sheet_name='test')

# Process each row in the DataFrame
for _, row in df.iterrows():
    context = row['context'] if not pd.isna(row['context']) else ""
    question = row['question'] if not pd.isna(row['question']) else ""
    options = row['options'] if not pd.isna(row['options']) else ""
    correct_answer = row['answer'] if not pd.isna(row['answer']) else ""
    label = row['label'] if not pd.isna(row['label']) else ""
    question_type = row['type'] if not pd.isna(row['type']) else ""

    # Create a structured query for the retriever
    retriever_query = f"""
    Description: {context}
    Question: {question}
    Options: {options}
    """
    
    # Retrieve relevant context
    retrieved_context = ""  # Add retrieval logic if applicable
    # For now, we assume no retriever and proceed with the provided context
    
    # Combine the original context with the retrieved context
    combined_context = f"{context}\n\nAdditional context:\n{retrieved_context}"

    # Invoke the model safely with retry
    response = safe_chat_with_retry(chain, context=combined_context, question=question, options=options)

    # Store the result in a dictionary
    result = {
        "description": context,
        "question": question,
        "options": options,
        "llama_baseline_responses": response,
        "correct_answer": correct_answer,
        "label": label,
        "type": question_type
    }
    results.append(result)

    # Print the results for immediate feedback
    print(f"Description: {context}")
    print(f"Question: {question}")
    print(f"Options: {options}")
    print(f"Llama Baseline Responses: {response}")
    print(f"Correct Answer: {correct_answer}")
    print(f"Label: {label}")
    print(f"Type: {question_type}")
    print()


Description: Eight persons namely P, Q, R, S, T, U, V, and W are sitting around a circular table facing the centre of the table but not necessarily in the same order. P sits second to the right of U, who sits opposite to T. Two persons sit between Q and T. S sits third to the right of P. V sits second to the left of W.
Question: How many persons sit between W and R when counted from the right of R?
Options: ["Two", "Three", "Four", "One", "None of these"]

Llama Baseline Responses: Let's analyze the situation step by step:

1. P sits second to the right of U, who sits opposite to T. This means that the order is: ..., U, ..., P, ...
2. Two persons sit between Q and T. Since U sits opposite to T, we can infer that Q must be sitting next to one of the two people who are sitting between Q and T.
3. S sits third to the right of P. This means that the order is: ..., P, ..., S, ...
4. V sits second to the left of W.

Now, let's combine the information:

Since U sits opposite to T, we know tha

In [15]:
print(results)

[{'description': 'Eight persons namely P, Q, R, S, T, U, V, and W are sitting around a circular table facing the centre of the table but not necessarily in the same order. P sits second to the right of U, who sits opposite to T. Two persons sit between Q and T. S sits third to the right of P. V sits second to the left of W.', 'question': 'How many persons sit between W and R when counted from the right of R?', 'options': '["Two", "Three", "Four", "One", "None of these"]\n', 'llama_baseline_responses': "Let's analyze the situation step by step:\n\n1. P sits second to the right of U, who sits opposite to T. This means that the order is: ..., U, ..., P, ...\n2. Two persons sit between Q and T. Since U sits opposite to T, we can infer that Q must be sitting next to one of the two people who are sitting between Q and T.\n3. S sits third to the right of P. This means that the order is: ..., P, ..., S, ...\n4. V sits second to the left of W.\n\nNow, let's combine the information:\n\nSince U s

In [16]:
results[0]

{'description': 'Eight persons namely P, Q, R, S, T, U, V, and W are sitting around a circular table facing the centre of the table but not necessarily in the same order. P sits second to the right of U, who sits opposite to T. Two persons sit between Q and T. S sits third to the right of P. V sits second to the left of W.',
 'question': 'How many persons sit between W and R when counted from the right of R?',
 'options': '["Two", "Three", "Four", "One", "None of these"]\n',
 'llama_baseline_responses': "Let's analyze the situation step by step:\n\n1. P sits second to the right of U, who sits opposite to T. This means that the order is: ..., U, ..., P, ...\n2. Two persons sit between Q and T. Since U sits opposite to T, we can infer that Q must be sitting next to one of the two people who are sitting between Q and T.\n3. S sits third to the right of P. This means that the order is: ..., P, ..., S, ...\n4. V sits second to the left of W.\n\nNow, let's combine the information:\n\nSince U

In [17]:
# Check types of each element in the results[0]
for key, value in results[0].items():
    print(f"Key: {key}, Type: {type(value)}")

Key: description, Type: <class 'str'>
Key: question, Type: <class 'str'>
Key: options, Type: <class 'str'>
Key: llama_baseline_responses, Type: <class 'str'>
Key: correct_answer, Type: <class 'str'>
Key: label, Type: <class 'float'>
Key: type, Type: <class 'str'>


In [20]:
# Serialize the AIMessage object
def serialize_results(results):
    serialized_results = []
    for result in results:
        serialized_result = {
            'description': result['description'],
            'question': result['question'],
            'options': result['options'],
            'llama baseline responses': result['llama_baseline_responses'].content if isinstance(result['llama_baseline_responses'], AIMessage) else result['llama_baseline_responses'],
            'correct_answer': result['correct_answer'],
            'label': result['label'],
            'type': result['type']
        }
        serialized_results.append(serialized_result)
    return serialized_results

# Save to JSON file
with open("reasoning_llama_baseline_responses.json", "w", encoding="utf-8") as json_file:
    json.dump(serialize_results(results), json_file, indent=4, ensure_ascii=False)

print("Results have been saved to 'reasoning_llama_baseline_responses.json'.")


Results have been saved to 'reasoning_llama_baseline_responses.json'.


In [21]:
import json
import pandas as pd

# Open and load the JSON file
with open("reasoning_llama_baseline_responses.json", "r", encoding="utf-8") as json_file:
    data = json.load(json_file)

# Print number of rows (length of the data)
print(f"Number of rows: {len(data)}")

# Convert the data into a pandas DataFrame
df = pd.DataFrame(data)

# Display the DataFrame in a tabular format
print(df)

Number of rows: 29
                                          description  \
0   Eight persons namely P, Q, R, S, T, U, V, and ...   
1   Eight persons namely P, Q, R, S, T, U, V, and ...   
2   Eight persons namely P, Q, R, S, T, U, V, and ...   
3   Eight persons namely P, Q, R, S, T, U, V, and ...   
4   Eight persons namely P, Q, R, S, T, U, V, and ...   
5   A law firm has exactly nine partners: Fox, Gla...   
6   A law firm has exactly nine partners: Fox, Gla...   
7   A law firm has exactly nine partners: Fox, Gla...   
8   A law firm has exactly nine partners: Fox, Gla...   
9   A law firm has exactly nine partners: Fox, Gla...   
10  A law firm has exactly nine partners: Fox, Gla...   
11                                                      
12                                                      
13                                                      
14                                                      
15                                                      
16          