<a href="https://colab.research.google.com/github/sunshineluyao/AgenticCases/blob/main/code/RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Retrieval-Augmented Generation (RAG) is an AI architecture that enhances the capabilities of Large Language Models (LLMs) by integrating external knowledge sources into the generation process. Traditional LLMs, while powerful, are limited to the information present in their training data, which can become outdated or insufficient for specific queries. RAG addresses this limitation by retrieving relevant information from external databases or documents in real-time, ensuring that the generated responses are both accurate and up-to-date.

In a typical RAG system, when a user poses a question, the model first retrieves pertinent documents or data from an external source. This retrieved information is then combined with the model's internal knowledge to generate a response that is both contextually relevant and factually accurate. This approach not only improves the quality of AI-generated content but also mitigates issues like "hallucinations," where models produce plausible-sounding but incorrect information.

**Example 1: Using Hugging Face's `pipeline` with DistilBERT**

In this example, we utilize Hugging Face's `pipeline` for question answering, employing the `distilbert-base-uncased-distilled-squad` model. This model is a distilled version of BERT, optimized for efficiency while maintaining performance.

```python
from transformers import pipeline

# Initialize the question-answering pipeline
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")

# Define your question
question = "What is the main point of the article?"

# Ensure the extracted text is not empty
if extracted_text:
    # Prepare the input for the model
    qa_input = {
        'question': question,
        'context': extracted_text
    }
    # Get the answer
    answer = qa_pipeline(qa_input)
    print(f"Question: {question}")
    print(f"Answer: {answer['answer']}")
else:
    print("Error: No text extracted from the PDF.")
```

In this script, the `pipeline` is initialized for question answering with the specified model. The `question` variable holds the query we want to answer, and `extracted_text` contains the content from which the answer is to be derived. The model processes the input and returns the most probable answer found within the context.

**Example 2: Handling Longer Texts with Chunking**

When dealing with lengthy documents, it's essential to manage the input size to fit within the model's maximum token limit. One common approach is to split the text into manageable chunks with some overlap to ensure context continuity.

```python
from transformers import pipeline

# Initialize the question-answering pipeline
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")

def chunk_text(text, max_length, overlap):
    """
    Splits the text into chunks of max_length with a specified overlap.

    Args:
        text: The input text to be chunked.
        max_length: Maximum length of each chunk.
        overlap: Number of overlapping tokens between chunks.

    Returns:
        A list of text chunks.
    """
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        end = min(start + max_length, len(words))
        chunk = ' '.join(words[start:end])
        chunks.append(chunk)
        start += max_length - overlap
    return chunks

# Define your question
question = "What is the main point of the article?"

# Parameters
max_chunk_length = 450  # Adjust based on model's max token limit minus space for the question
overlap_length = 50     # Number of overlapping tokens

# Split the extracted text into chunks
text_chunks = chunk_text(extracted_text, max_chunk_length, overlap_length)

# Iterate over chunks and get answers
answers = []
for chunk in text_chunks:
    qa_input = {
        'question': question,
        'context': chunk
    }
    answer = qa_pipeline(qa_input)
    answers.append(answer['answer'])

# Combine or select the most appropriate answer
# For simplicity, we'll just print all answers here
for idx, ans in enumerate(answers):
    print(f"Answer from chunk {idx + 1}: {ans}")
```

In this script, the `chunk_text` function divides the `extracted_text` into smaller segments, each with a specified maximum length and overlap. This ensures that the model can process each chunk without exceeding its token limit. The script then iterates over these chunks, applies the question-answering pipeline to each, and collects the answers. Finally, it prints the answers obtained from each chunk.

By employing such techniques, we can effectively handle longer texts and improve the accuracy of AI-generated responses, especially when combined with RAG architectures that provide access to external, up-to-date information.



In [None]:
! pip install  PyPDF2 transformers

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [None]:
import PyPDF2
from google.colab import files
from transformers import pipeline


In [None]:
import PyPDF2

def extract_text_from_pdf(file_path: str) -> str:
    """
    Extracts text from a PDF file.

    Args:
        file_path: The path to the PDF file.

    Returns:
        The extracted text as a string.
    """
    text = ""
    try:
        with open(file_path, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            for page_num, page in enumerate(reader.pages):
                page_text = page.extract_text()
                if page_text:
                    text += page_text
                else:
                    print(f"Warning: No text extracted from page {page_num + 1}")
    except Exception as e:
        print(f"An error occurred: {str(e)}")
    return text


In [None]:
# Upload the PDF file
from google.colab import files
uploaded = files.upload()
file_path = list(uploaded.keys())[0]

# Extract text from the PDF
extracted_text = extract_text_from_pdf(file_path)

# Output the extracted text length and a snippet
print(f"Extracted text length: {len(extracted_text)}")
print(f"Extracted text snippet: {extracted_text[:500]}")


Saving pgae191.pdf to pgae191.pdf
Extracted text length: 126772
Extracted text snippet: The impact of generative artificial intelligence on 
socioeconomic inequalities and policy making
Valerio Capraro 
a,*, Austin Lentsch 
b, Daron Acemoglu 
c, Selin Akgund, Aisel Akhmedovad, Ennio Bilancini 
e,  
Jean-François Bonnefon 
f, Pablo Brañas-Garza 
g, Luigi Butera 
h, Karen M. Douglas 
j, Jim A.C. Everett 
j, Gerd Gigerenzer 
k, 
Christine Greenhowd, Daniel A. Hashimotol,m, Julianne Holt-Lunstad 
n, Jolanda Jetten 
o, Simon Johnsonp, Werner H. Kunz 
q, 
Chiara Longoni 
r, Pete Lunns, S


In [None]:
from transformers import pipeline

# Initialize the question-answering pipeline
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")

# Define your question
question = "What is the main point of the article?"

# Ensure the extracted text is not empty
if extracted_text:
    # Prepare the input for the model
    qa_input = {
        'question': question,
        'context': extracted_text
    }
    # Get the answer
    answer = qa_pipeline(qa_input)
    print(f"Question: {question}")
    print(f"Answer: {answer['answer']}")
else:
    print("Error: No text extracted from the PDF.")


config.json:   0%|          | 0.00/451 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/265M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Device set to use cuda:0


Question: What is the main point of the article?
Answer: observing 
changes in their ethical standards


In [None]:
from transformers import pipeline

# Initialize the question-answering pipeline
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")

def chunk_text(text, max_length, overlap):
    """
    Splits the text into chunks of max_length with a specified overlap.

    Args:
        text: The input text to be chunked.
        max_length: Maximum length of each chunk.
        overlap: Number of overlapping tokens between chunks.

    Returns:
        A list of text chunks.
    """
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        end = min(start + max_length, len(words))
        chunk = ' '.join(words[start:end])
        chunks.append(chunk)
        start += max_length - overlap
    return chunks

# Define your question
question = "What is the main point of the article?"

# Parameters
max_chunk_length = 450  # Adjust based on model's max token limit minus space for the question
overlap_length = 50     # Number of overlapping tokens

# Split the extracted text into chunks
text_chunks = chunk_text(extracted_text, max_chunk_length, overlap_length)

# Iterate over chunks and get answers
answers = []
for chunk in text_chunks:
    qa_input = {
        'question': question,
        'context': chunk
    }
    answer = qa_pipeline(qa_input)
    answers.append(answer['answer'])

# Combine or select the most appropriate answer
# For simplicity, we'll just print all answers here
for idx, ans in enumerate(answers):
    print(f"Answer from chunk {idx + 1}: {ans}")


Device set to use cuda:0
You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Answer from chunk 1: The impact of generative artificial intelligence
Answer from chunk 2: highlighting the role of policymaking
Answer from chunk 3: discussing the impact of generative AI in the information domain
Answer from chunk 4: addressing the socio - economic risks that we identify
Answer from chunk 5: persuasive propaganda
Answer from chunk 6: accurate news
Answer from chunk 7: their behavior is likely to change
Answer from chunk 8: users’ access to information
Answer from chunk 9: specific policy recommendations
Answer from chunk 10: optimism
Answer from chunk 11: the impact of generative AI on information
Answer from chunk 12: challenging the balance between accessibility and content accuracy
Answer from chunk 13: rebuild the middle class
Answer from chunk 14: digital skills training
Answer from chunk 15: impacts of generative AI in the work - place
Answer from chunk 16: audit and address biases within educational systems
Answer from chunk 17: Analyze organizational decision