# Local RAG Pipeline

This pipeline is created to answer Questions based on Constitutional Law. The dataset is in the form of PDF and the dataset is collected from an opensource Havard law school website. https://opencasebook.org/featured/

## Library

In [1]:
import fitz
from tqdm.auto import tqdm
import os
import random

import pandas as pd
from spacy.lang.en import English

from sentence_transformers import SentenceTransformer, util
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

## Extract text from the PDF file

In [2]:
pdf_path = "../dataset/constitutionalLaw.pdf"

def text_formatter(text: str) -> str:
    # Removing the unnecessary data 
    cleaned_text = text.replace("\n", " ").replace('Error! No text of specified style in document.', ' ').replace("[ … ]", " ").strip()
    return cleaned_text

def read_pdf(pdf_path: str) -> list[dict]:
    
    doc = fitz.open(pdf_path)  # open a document
    text_data = []
    for page_number, page in tqdm(enumerate(doc)):  # iterate the document pages
        text = page.get_text()  # get plain text encoded as UTF-8
        text = text_formatter(text)
        text_data.append({"page_number": page_number - 1,  # adjust page numbers since our PDF starts on page 2
                                "page_char_count": len(text),
                                "page_word_count": len(text.split(" ")),
                                "page_sentence_count_raw": len(text.split(". ")),
                                "page_token_count": len(text) / 4,  # 1 token = ~4 chars
                                "text": text})
    return text_data

if not os.path.exists(pdf_path):
    print("[INFO] File doesn't exist")
else:
    text_data = read_pdf(pdf_path=pdf_path)

0it [00:00, ?it/s]

### Exploring the file

In [3]:
text_data[990]

{'page_number': 989,
 'page_char_count': 4119,
 'page_word_count': 710,
 'page_sentence_count_raw': 32,
 'page_token_count': 1029.75,
 'text': '4: Fidelity on the Left 990  Although the policy arguments for extending marriage to same-sex couples may be compelling,  the legal arguments for requiring such an extension are not. The fundamental right to marry  does not include a right to make a State change its definition of marriage. And a State\'s  decision to maintain the meaning of marriage that has persisted in every culture throughout  human history can hardly be called irrational. In short, our Constitution does not enact any  one theory of marriage. The people of a State are free to expand marriage to include same-sex  couples, or to retain the historic definition.  Today, however, the Court takes the extraordinary step of ordering every State to license and  recognize same-sex marriage. Many people will rejoice at this decision, and I begrudge none  their celebration. But for thos

In [4]:
random.sample(text_data, k=3)

[{'page_number': 770,
  'page_char_count': 2725,
  'page_word_count': 489,
  'page_sentence_count_raw': 28,
  'page_token_count': 681.25,
  'text': '771  Finally, Justice BREYER is incorrect that incorporation will require judges to assess the costs  and benefits of firearms restrictions and thus to make difficult empirical judgments in an area  in which they lack expertise. As we have noted, while his opinion in Heller recommended an  interest-balancing test, the Court specifically rejected that suggestion. See supra, at 3046-3047.  "The very enumeration of the right takes out of the hands of government—even the Third  Branch of Government—the power to decide on a case-by-case basis whether the right is really  worth insisting upon." Heller, supra, at ___, 128 S.Ct., at 2821.  * * *  In Heller, we held that the Second Amendment protects the right to possess a handgun in the  home for the purpose of self-defense. Unless considerations of stare decisis counsel otherwise,  a provision of

In [5]:
# converting the data into a dictionary for better understanding the file
df = pd.DataFrame(text_data)
df

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,text
0,-1,13,2,1,3.25,[author name]
1,0,1062,207,8,265.50,1 Elements The thesis of this course i...
2,1,1827,368,6,456.75,1: Elements 2 Text and Context ...
3,2,1951,396,19,487.75,3 Agreed to by Congress 15 November 1777 I...
4,3,3003,556,18,750.75,1: Elements 4 The first object of inquiry is...
...,...,...,...,...,...,...
1004,1003,3917,671,39,979.25,4: Fidelity on the Left 1004 The majority's d...
1005,1004,2848,498,22,712.00,"1005 2 Even assuming that the ""liberty"" in t..."
1006,1005,3236,560,26,809.00,4: Fidelity on the Left 1006 dignity because ...
1007,1006,3009,523,24,752.25,1007 treatment for African-Americans and wome...


In [6]:
# Get stats
df.describe().round(2)

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count
count,1009.0,1009.0,1009.0,1009.0,1009.0
mean,503.0,3708.82,652.3,31.49,927.21
std,291.42,710.45,118.6,13.65,177.61
min,-1.0,0.0,1.0,1.0,0.0
25%,251.0,3421.0,600.0,22.0,855.25
50%,503.0,3832.0,669.0,30.0,958.0
75%,755.0,4152.0,725.0,38.0,1038.0
max,1007.0,5001.0,910.0,120.0,1250.25


## Split paragraphs into sentences

In [7]:
# using spacy library

nlp = English() # set the language to english

# Convert into sentences
nlp.add_pipe("sentencizer")

for item in tqdm(text_data):
    item["sentences"] = list(nlp(item["text"]).sents)
    
    # Make sure all sentences are strings
    item["sentences"] = [str(sentence) for sentence in item["sentences"]]
    
    # Count the sentences 
    item["page_sentence_count_spacy"] = len(item["sentences"])

  0%|          | 0/1009 [00:00<?, ?it/s]

In [8]:
# Check the results
random.sample(text_data, k=5)

[{'page_number': 1003,
  'page_char_count': 3917,
  'page_word_count': 671,
  'page_sentence_count_raw': 39,
  'page_token_count': 979.25,
  'text': '4: Fidelity on the Left 1004  The majority\'s decision today will require States to issue marriage licenses to same-sex couples  and to recognize same-sex marriages entered in other States largely based on a constitutional  provision guaranteeing "due process" before a person is deprived of his "life, liberty, or  property." I have elsewhere explained the dangerous fiction of treating the Due Process Clause  as a font of substantive rights. McDonald v. Chicago, 561 U. S. 742, 811-812 (2010)  (THOMAS, J., concurring in part and concurring in judgment). It distorts the constitutional  text, which guarantees only whatever "process" is "due" before a person is deprived of life,  liberty, and property. U. S. Const., Amdt. 14, §1. Worse, it invites judges to do exactly what  the majority has done here—"`roa[m] at large in the constitutional fie

In [9]:
# Add the page_sentence_count_spacy to the dataframe
df = pd.DataFrame(text_data)
df.describe().round(2)

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,page_sentence_count_spacy
count,1009.0,1009.0,1009.0,1009.0,1009.0,1009.0
mean,503.0,3708.82,652.3,31.49,927.21,26.5
std,291.42,710.45,118.6,13.65,177.61,8.49
min,-1.0,0.0,1.0,1.0,0.0,0.0
25%,251.0,3421.0,600.0,22.0,855.25,21.0
50%,503.0,3832.0,669.0,30.0,958.0,25.0
75%,755.0,4152.0,725.0,38.0,1038.0,32.0
max,1007.0,5001.0,910.0,120.0,1250.25,62.0


## Chuncking

In [10]:
# Define split size to turn groups of sentences into chunks
chunk_size = 10 

# Create a function that recursively splits a list into desired sizes
def split_list(input_list: list, 
               slice_size: int) -> list[list[str]]:
    return [input_list[i:i + slice_size] for i in range(0, len(input_list), slice_size)]

# Loop through pages and texts and split sentences into chunks
for item in tqdm(text_data):
    item["sentence_chunks"] = split_list(input_list=item["sentences"],
                                         slice_size=chunk_size)
    item["num_chunks"] = len(item["sentence_chunks"])

  0%|          | 0/1009 [00:00<?, ?it/s]

In [11]:
# Check the results
random.sample(text_data, k=5)

[{'page_number': 760,
  'page_char_count': 4193,
  'page_word_count': 746,
  'page_sentence_count_raw': 72,
  'page_token_count': 1048.25,
  'text': '761  While Justice Black\'s theory was never adopted, the Court eventually moved in that direction  by initiating what has been called a process of "selective incorporation," i.e., the Court began to  hold that the Due Process Clause fully incorporates particular rights contained in the first eight  Amendments. See, e.g., Gideon v. Wainwright, 372 U.S. 335, 341, 83 S.Ct. 792, 9 L.Ed.2d  799 (1963); Malloy v. Hogan, 378 U.S. 1, 5-6, 84 S.Ct. 1489, 12 L.Ed.2d 653 (1964); Pointer  v. Texas, 380 U.S. 400, 403-404, 85 S.Ct. 1065, 13 L.Ed.2d 923 (1965); Washington v. Texas,  388 U.S. 14, 18, 87 S.Ct. 1920, 18 L.Ed.2d 1019 (1967); Duncan, 391 U.S., at 147-148, 88  S.Ct. 1444; Benton v. Maryland, 395 U.S. 784, 794, 89 S.Ct. 2056, 23 L.Ed.2d 707 (1969).  The decisions during this time abandoned three of the previously noted characteristics of the 

In [12]:
# Add the num_chunks to the dataframe
df = pd.DataFrame(text_data)
df.describe().round(2)

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,page_sentence_count_spacy,num_chunks
count,1009.0,1009.0,1009.0,1009.0,1009.0,1009.0,1009.0
mean,503.0,3708.82,652.3,31.49,927.21,26.5,3.1
std,291.42,710.45,118.6,13.65,177.61,8.49,0.9
min,-1.0,0.0,1.0,1.0,0.0,0.0,0.0
25%,251.0,3421.0,600.0,22.0,855.25,21.0,3.0
50%,503.0,3832.0,669.0,30.0,958.0,25.0,3.0
75%,755.0,4152.0,725.0,38.0,1038.0,32.0,4.0
max,1007.0,5001.0,910.0,120.0,1250.25,62.0,7.0


# Splitting Chunks

In [13]:
import re

# Split each chunk into its own item
data_chunk = []
for item in tqdm(text_data):
    for sentence_chunk in item["sentence_chunks"]:
        chunk_dict = {}
        chunk_dict["page_number"] = item["page_number"]
        
        # Join the sentences together into a paragraph-like structure, aka a chunk (so they are a single string)
        joined_sentence_chunk = "".join(sentence_chunk).replace("  ", " ").strip()
        joined_sentence_chunk = re.sub(r'\.([A-Z])', r'. \1', joined_sentence_chunk) # ".A" -> ". A" for any full-stop/capital letter combo 
        chunk_dict["sentence_chunk"] = joined_sentence_chunk

        # Get stats about the chunk
        chunk_dict["chunk_char_count"] = len(joined_sentence_chunk)
        chunk_dict["chunk_word_count"] = len([word for word in joined_sentence_chunk.split(" ")])
        chunk_dict["chunk_token_count"] = len(joined_sentence_chunk) / 4 # 1 token = ~4 characters
        
        data_chunk.append(chunk_dict)

# Number of chunks
len(data_chunk)

  0%|          | 0/1009 [00:00<?, ?it/s]

3131

In [14]:
# Check the results
random.sample(text_data, k=5)

[{'page_number': 853,
  'page_char_count': 4385,
  'page_word_count': 725,
  'page_sentence_count_raw': 47,
  'page_token_count': 1096.25,
  'text': '4: Fidelity on the Left 854  2C:39-5 (West Supp.2010); N.Y. Penal Law Ann. § 265.02(7) (West Supp.2008); 25 Laws  P.R. Ann. § 456m (Supp.2006); see also 18 U.S.C. § 922(o) (federal machinegun ban).  Thirteen municipalities do the same. See Albany, N. Y., City Code § 193-16(A) (2005);  Aurora, Ill., Code of Ordinances § 29-49(a) (2009); Buffalo, N. Y., City Code § 180-1(F)  (2000); Chicago, Ill., Municipal Code § 8-24-025(a) (2010); Cincinnati, Ohio, Municipal Code  § 708-37(a) (2008); Cleveland, Ohio, Codified Ordinances § 628.03(a) (2008); Columbus,  Ohio, City Code § 2323.31 (2007); Denver, Colo., Municipal Code § 38-130(e) (2008);  Morton Grove, Ill., Village Code § 6-2-3(A); N.Y.C. Admin. Code § 10-303.1 (2009); Oak  Park, Ill., Village Code § 27-2-1 (2009); Rochester, N. Y., City Code § 47-5(F) (2008); Toledo,  Ohio, Municipal Code §

In [15]:
# Convert the chunks into dataframe
df = pd.DataFrame(data_chunk)
df.describe().round(2)

Unnamed: 0,page_number,chunk_char_count,chunk_word_count,chunk_token_count
count,3131.0,3131.0,3131.0,3131.0
mean,520.12,1177.81,193.49,294.45
std,289.33,590.01,97.82,147.5
min,-1.0,1.0,1.0,0.25
25%,266.5,782.5,128.0,195.62
50%,522.0,1168.0,190.0,292.0
75%,776.0,1533.5,251.0,383.38
max,1007.0,4928.0,855.0,1232.0


## Removing unnececcary chunks

In [16]:
# Show random chunks with under 10 tokens in length
min_token_length = 10
for row in df[df["chunk_token_count"] <= min_token_length].sample(30).iterrows():
    print(f'Chunk token count: {row[1]["chunk_token_count"]} | Text: {row[1]["sentence_chunk"]}')

Chunk token count: 4.75 | Text: Turner Broadcasting
Chunk token count: 4.75 | Text: as Amici Curiae 17-
Chunk token count: 1.75 | Text: 4.2.6.3
Chunk token count: 1.0 | Text: When
Chunk token count: 6.0 | Text: Justice Field, joined by
Chunk token count: 3.0 | Text: 106-107.  II
Chunk token count: 1.25 | Text: 2902,
Chunk token count: 8.75 | Text: Hans v. Louisiana 2.8  2.8.1  2.8.2
Chunk token count: 0.25 | Text: 1
Chunk token count: 8.75 | Text: 1335, 54th Cong.,2d Sess.,8 (1897).
Chunk token count: 3.5 | Text: 4.2.7  4.2.7.1
Chunk token count: 1.25 | Text: 1.2.2
Chunk token count: 9.25 | Text: Ibid. "I'm not saying that some women
Chunk token count: 9.5 | Text: Consequently, with respect, I dissent.
Chunk token count: 0.5 | Text: 12
Chunk token count: 1.75 | Text: None is
Chunk token count: 6.75 | Text: Congress followed President
Chunk token count: 4.25 | Text: Since then, 2.5.3
Chunk token count: 8.0 | Text: The drawing of fire lines in the
Chunk token count: 6.75 | Text: McCulloc

In [17]:
# Remove chunks with less than 10 tokens
data_chunks_over_min_token_len = df[df["chunk_token_count"] > min_token_length].to_dict(orient="records")
data_chunks_over_min_token_len[:2]

[{'page_number': 0,
  'sentence_chunk': "1   Elements   The thesis of this course is that there are two kinds of fidelity at the core of the Supreme Court's elaboration of our Constitution — fidelity to meaning and fidelity to role. Fidelity to meaning is the fidelity that we all expect of any court or judge: it is the effort to preserve the meaning, across context. Fidelity to role is the constraint on that practice of preservation. In this part, I introduce these separate elements. Section 1 begins with a puzzle: Is Article V the exclusive mode by which the Constitution can be amended?And if it is, then was Article XIII of the Articles of Confederation?And if it was, then was the Constitution constitutionally ratified?And if it was, then how do we account for the background context that makes that ratification valid? Section 2 displays both fidelity to role and fidelity to meaning. The first two cases are perfect examples of role; the third sets up the problem for meaning.",
  'chunk

## Embedding chunks

The model used for embedding - https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1 

In [18]:
# from sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer(model_name_or_path="mixedbread-ai/mxbai-embed-large-v1")

In [19]:
# Turn text chunks into a single list
text_chunks = [item["sentence_chunk"] for item in data_chunks_over_min_token_len]

In [20]:
%%time

# Embed all texts in batches
embeddings = embedding_model.encode(text_chunks,
                                    batch_size=32,
                                    convert_to_tensor=True)

embeddings

CPU times: user 54.2 s, sys: 14.7 s, total: 1min 8s
Wall time: 58 s


tensor([[-0.0191, -0.3161,  0.4960,  ...,  0.5806,  0.3815, -0.5972],
        [ 0.1589, -0.3187,  0.2826,  ..., -0.1805, -0.5291, -0.0028],
        [-0.2504, -0.3797,  0.3730,  ...,  0.0484, -0.1487,  0.2692],
        ...,
        [ 0.0346, -0.2212,  0.1088,  ...,  0.4205,  0.0676, -0.5109],
        [ 0.4131, -0.3544,  0.1202,  ...,  0.2852,  0.1568, -0.5083],
        [ 0.3921, -0.6841,  0.2767,  ...,  0.2049,  0.5115,  0.2611]],
       device='cuda:0')

In [21]:
# Check the dimensions of the embedding and making sure 1024 is added
embeddings.shape

torch.Size([3095, 1024])

In [22]:
# Save embeddings to file
embeddings_df = pd.DataFrame(data_chunks_over_min_token_len)
embeddings_df_save_path = "mbeddings_df.csv"
embeddings_df.to_csv(embeddings_df_save_path, index=False)

In [23]:
# Check whether it is saved properly
embedding_df_load = pd.read_csv(embeddings_df_save_path)
embedding_df_load.head()

Unnamed: 0,page_number,sentence_chunk,chunk_char_count,chunk_word_count,chunk_token_count
0,0,1 Elements The thesis of this course is th...,948,167,237.0
1,0,Section 3 then digs deeper into the core metap...,89,16,22.25
2,1,1: Elements 2 Text and Context Are amen...,1772,313,443.0
3,2,3 Agreed to by Congress 15 November 1777 In ...,1295,236,323.75
4,2,"These principles have been, on the side of the...",608,113,152.0


In [24]:
embeddings.shape

torch.Size([3095, 1024])

# Retrival

## Similarity Search using dot product

1. Define the query
2. Embed the query to the same numerical space as the text examples
3. Get similarity scores with the dot product
4. Get the top-k results

In [25]:
# from sentence_transformers import util

# Defining a sample query for similarity search
query = "civil rights act"
print(f"Query: {query}")

# Converting the query into embeddings using mxbai embedding
query_embedding = embedding_model.encode(query, convert_to_tensor=True)

# Calculate the similarity score
dot_scores = util.dot_score(a=query_embedding, b=embeddings)[0]

# Get the top results
results = torch.topk(dot_scores, k=3)
results

Query: civil rights act


torch.return_types.topk(
values=tensor([234.0927, 225.7453, 224.5619], device='cuda:0'),
indices=tensor([2264, 1692, 1852], device='cuda:0'))

In [26]:
# Define helper function to print wrapped text 
import textwrap

def print_wrapped(text, wrap_length=80):
    wrapped_text = textwrap.fill(text, wrap_length)
    print(wrapped_text)

In [27]:
print(f"Query: '{query}'\n")
print("Results:")

# Loop through zipped together scores and indicies from torch.topk
for score, idx in zip(results[0], results[1]):
    print(f"Score: {score:.4f}")
    # Print relevant sentence chunk
    print("Text:")
    print_wrapped(data_chunk[idx]["sentence_chunk"])

Query: 'civil rights act'

Results:
Score: 234.0927
Text:
4: Fidelity on the Left 758 Chief Justice Chase and Justices Swayne and Bradley,
criticized the majority for reducing the Fourteenth Amendment's Privileges or
Immunities Clause to "a vain and idle enactment, which accomplished nothing, and
most unnecessarily excited Congress and the people on its passage."Id., at 96;
see also id., at 104. Justice Field opined that the Privileges or Immunities
Clause protects rights that are "in their nature ... fundamental," including the
right of every man to pursue his profession without the imposition of unequal or
discriminatory restrictions. Id., at 96-97. Justice Bradley's dissent observed
that "we are not bound to resort to implication ... to find an authoritative
declaration of some of the most important privileges and immunities of citizens
of the United States. It is in the Constitution itself."Id., at 118. Justice
Bradley would have construed the Privileges or Immunities Clause to inc

## Retrival Function

In [47]:
def retrieve_relevant_resources(query: str,
                                embeddings: torch.tensor,
                                model: SentenceTransformer=embedding_model,
                                n_resources_to_return: int=5,):

    # Embed the query
    query_embedding = model.encode(query, 
                                   convert_to_tensor=True) 

    # Get dot product scores on embeddings
    dot_scores = util.dot_score(query_embedding, embeddings)[0]

    scores, indices = torch.topk(input=dot_scores, 
                                 k=n_resources_to_return)

    return scores, indices

In [48]:
def print_top_results_and_scores(query: str,
                                 embeddings: torch.tensor,
                                 pages_and_chunks: list[dict]=data_chunk,
                                 n_resources_to_return: int=5):

    
    scores, indices = retrieve_relevant_resources(query=query,
                                                  embeddings=embeddings,
                                                  n_resources_to_return=n_resources_to_return)
    
    print(f"Query: {query}\n")
    print("Results:")
    # Loop through zipped together scores and indicies
    for score, index in zip(scores, indices):
        print(f"Score: {score:.4f}")
        # Print relevant sentence chunk (since the scores are in descending order, the most relevant chunk will be first)
        print_wrapped(data_chunk[index]["sentence_chunk"])
        # Print the page number too so we can reference the textbook further and check the results
        print(f"Page number: {data_chunk[index]['page_number']}")
        print("\n")

# Generation

Generation part of the RAG. Here we use Large Language Model to generate output text.
The Model we chose - https://huggingface.co/google/gemma-7b-it

In [28]:
# Get GPU available memory
import torch
gpu_memory_bytes = torch.cuda.get_device_properties(0).total_memory
gpu_memory_gb = round(gpu_memory_bytes / (2**30))
print(f"Available GPU memory: {gpu_memory_gb} GB")

Available GPU memory: 32 GB


In [29]:
torch.cuda.get_device_capability(0)[0]

7

In [30]:
# Getting Model ID
model_id = "google/gemma-7b-it"
print(f"model_id set to: {model_id}")

model_id set to: google/gemma-7b-it


## Getting the Gemma Model

In [31]:
# import torch
# from transformers import AutoTokenizer, AutoModelForCausalLM

attn_implementation = "sdpa"
print(f"[INFO] Using attention implementation: {attn_implementation}")

model_id = model_id
print(f"[INFO] Using model_id: {model_id}")

# Instantiate tokenizer
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_id)

# 4. Instantiate the model
llm_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_id, 
                                                 torch_dtype=torch.float16, # datatype to use = float16
                                                 quantization_config=None,
                                                 low_cpu_mem_usage=False, # use full memory 
                                                 attn_implementation=attn_implementation) # which attention version to use

# Send the model to GPU
llm_model.to("cuda")

[INFO] Using attention implementation: sdpa
[INFO] Using model_id: google/gemma-7b-it


Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 3072, padding_idx=0)
    (layers): ModuleList(
      (0-27): 28 x GemmaDecoderLayer(
        (self_attn): GemmaSdpaAttention(
          (q_proj): Linear(in_features=3072, out_features=4096, bias=False)
          (k_proj): Linear(in_features=3072, out_features=4096, bias=False)
          (v_proj): Linear(in_features=3072, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=3072, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear(in_features=3072, out_features=24576, bias=False)
          (up_proj): Linear(in_features=3072, out_features=24576, bias=False)
          (down_proj): Linear(in_features=24576, out_features=3072, bias=False)
          (act_fn): PytorchGELUTanh()
        )
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
      )
    )
    (norm): Gemm

In [32]:
llm_model

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 3072, padding_idx=0)
    (layers): ModuleList(
      (0-27): 28 x GemmaDecoderLayer(
        (self_attn): GemmaSdpaAttention(
          (q_proj): Linear(in_features=3072, out_features=4096, bias=False)
          (k_proj): Linear(in_features=3072, out_features=4096, bias=False)
          (v_proj): Linear(in_features=3072, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=3072, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear(in_features=3072, out_features=24576, bias=False)
          (up_proj): Linear(in_features=3072, out_features=24576, bias=False)
          (down_proj): Linear(in_features=24576, out_features=3072, bias=False)
          (act_fn): PytorchGELUTanh()
        )
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
      )
    )
    (norm): Gemm

## Model Information

In [33]:
# Number of parameters in the model
def get_model_num_params(model: torch.nn.Module):
    return sum([param.numel() for param in model.parameters()])

get_model_num_params(llm_model)

8537680896

In [34]:
# Model Size
def get_model_mem_size(model: torch.nn.Module):
    """
    Get how much memory a PyTorch model takes up.

    See: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822
    """
    # Get model parameters and buffer sizes
    mem_params = sum([param.nelement() * param.element_size() for param in model.parameters()])
    mem_buffers = sum([buf.nelement() * buf.element_size() for buf in model.buffers()])

    # Calculate various model sizes
    model_mem_bytes = mem_params + mem_buffers # in bytes
    model_mem_mb = model_mem_bytes / (1024**2) # in megabytes
    model_mem_gb = model_mem_bytes / (1024**3) # in gigabytes

    return {"model_mem_bytes": model_mem_bytes,
            "model_mem_mb": round(model_mem_mb, 2),
            "model_mem_gb": round(model_mem_gb, 2)}

get_model_mem_size(llm_model)

{'model_mem_bytes': 17075361792,
 'model_mem_mb': 16284.33,
 'model_mem_gb': 15.9}

## Prompt Template for Gemma

In [35]:
input_text = "describe in detailed about Humphrey's Executor v. United States"
print(f"Input text:\n{input_text}")

# prompt template for instruction-tuned model
dialogue_template = [
    {"role": "user",
     "content": input_text}
]

# Apply the chat template
prompt = tokenizer.apply_chat_template(conversation=dialogue_template,
                                       tokenize=False, # keep as raw text (not tokenized)
                                       add_generation_prompt=True)
print(f"\nPrompt (formatted):\n{prompt}")

Input text:
describe in detailed about Humphrey's Executor v. United States

Prompt (formatted):
<bos><start_of_turn>user
describe in detailed about Humphrey's Executor v. United States<end_of_turn>
<start_of_turn>model



## Sample Output

In [36]:
%%time

# Tokenize the input text and send it to GPU
input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")
print(f"Model input (tokenized):\n{input_ids}\n")

# Generate outputs passed on the tokenized input
outputs = llm_model.generate(**input_ids,
                             max_new_tokens=512) # define the maximum number of new tokens to create

print(f"Model output (tokens):\n{outputs[0]}\n")

Model input (tokenized):
{'input_ids': tensor([[     2,      2,    106,   1645,    108,  15019,    575,  11352,   1105,
          93414, 235303, 235256, 132202,    593, 235265,   3520,   3858,    107,
            108,    106,   2516,    108]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
       device='cuda:0')}

Model output (tokens):
tensor([     2,      2,    106,   1645,    108,  15019,    575,  11352,   1105,
         93414, 235303, 235256, 132202,    593, 235265,   3520,   3858,    107,
           108,    106,   2516,    108,    688, 181704,   8283, 235303, 235256,
        132202,    593, 235265,   3520,   3858,    591, 235274, 235315, 235310,
        235284,  77056,    109,    688,   6598,  13705,  66058,    109, 181704,
          8283, 235303, 235256, 132202,    593, 235265,   3520,   3858,    729,
           476,  56979,   2270,   7697,    731,    573,  16407,   5181,    575,
        235248, 235274, 235315, 23

In [37]:
# Decode the output tokens to text
outputs_decoded = tokenizer.decode(outputs[0])
print(f"Model output (decoded):\n{outputs_decoded}\n")

Model output (decoded):
<bos><bos><start_of_turn>user
describe in detailed about Humphrey's Executor v. United States<end_of_turn>
<start_of_turn>model
**Humphrey's Executor v. United States (1942)**

**Case Summary:**

Humphrey's Executor v. United States was a landmark case decided by the Supreme Court in 1942 that established the doctrine of judicial review of executive actions.

**Facts:**

* The United States government had seized control of the rubber industry during World War II under the Emergency Powers Act.
* The government imposed price controls on rubber products.
* Humphrey's Executor, a rubber manufacturer, challenged the price controls as unconstitutional.

**Issue:**

The key issue in the case was whether the government's power to regulate prices under the Emergency Powers Act exceeded its constitutional authority.

**Holding:**

The Supreme Court held that the government's price controls were unconstitutional. The Court found that the Emergency Powers Act violated the 

# Augmentation

Augmentation is nothing but formatting the prompt or prompt engineering

## Promt Formatting function

In [41]:
def prompt_formatter(query: str, 
                     context_items: list[dict]) -> str:

    # Join context items into one dotted paragraph
    context = "- " + "\n- ".join([item["sentence_chunk"] for item in context_items])

    # Create a base prompt with examples to help the model

    base_prompt = """Based on the following context items, please answer the given query.
Extract relevant passages from the context before answering the query.
Don't return the thinking, only return the answer.
Ensure your answers are as explanatory as possible.
Refer to the following examples for the ideal answer style.
\nExample 1:
Query: What are the powers of the U.S. Supreme Court?
Answer: The U.S. Supreme Court has several powers, including the authority to interpret the Constitution, oversee cases involving laws of Congress, and resolve disputes between states. Its most critical function is judicial review, the ability to invalidate state and federal laws that conflict with the Constitution. This power was established in the landmark case of Marbury v. Madison. The Court also has appellate jurisdiction over lower court decisions that involve constitutional issues, ensuring uniform interpretation of the law across the country.
\nExample 2:
Query: What is the significance of the First Amendment?
Answer: The First Amendment to the U.S. Constitution is significant because it protects several fundamental rights including freedom of speech, press, religion, assembly, and petition. This amendment is a cornerstone of American democratic values, allowing individuals to express themselves without government interference or regulation. The freedoms guaranteed by the First Amendment are critical for a free and open society, facilitating active civic engagement and protection of individual liberties.
\nExample 3:
Query: How does the separation of powers work in the United States?
Answer: The separation of powers in the United States divides government responsibilities into three distinct branches: legislative, executive, and judicial. This structure ensures no single branch becomes too powerful. The legislative branch (Congress) makes laws, the executive branch (headed by the President) implements laws, and the judicial branch (led by the Supreme Court) interprets laws. Checks and balances between these branches prevent abuse of power and ensure the government operates within the framework of the Constitution.
\nNow use the following context items to answer the user query:
{context}
\nRelevant passages: <extract relevant passages from the context here>
User query: {query}
Answer:"""


    # Update base prompt with context items and query   
    base_prompt = base_prompt.format(context=context, query=query)

    # Create prompt template for instruction-tuned model
    dialogue_template = [
        {"role": "user",
        "content": base_prompt}
    ]

    # Apply the chat template
    prompt = tokenizer.apply_chat_template(conversation=dialogue_template,
                                          tokenize=False,
                                          add_generation_prompt=True)
    return prompt

In [44]:
query_list = [
    'What is the concept of fidelity to meaning as it relates to the Supreme Courts interpretation of the Constitution?',
    'How is the concept of fidelity to role different from fidelity to meaning?',
    'What is the significance of Article V of the U.S. Constitution in the amendment process?',
    'Discuss the issue of whether the Constitution was legally ratified, considering the role of Article XIII of the Articles of Confederation',
    'What role did Marbury v. Madison play in establishing judicial review?'
]

In [49]:
query = random.choice(query_list)
print(f"Query: {query}")

# Get relevant resources
scores, indices = retrieve_relevant_resources(query=query,
                                              embeddings=embeddings)
    
# Create a list of context items
context_items = [data_chunk[i] for i in indices]

# Format prompt with context items
prompt = prompt_formatter(query=query,
                          context_items=context_items)
print(prompt)

Query: What role did Marbury v. Madison play in establishing judicial review?
<bos><start_of_turn>user
Based on the following context items, please answer the given query.
Extract relevant passages from the context before answering the query.
Don't return the thinking, only return the answer.
Ensure your answers are as explanatory as possible.
Refer to the following examples for the ideal answer style.

Example 1:
Query: What are the powers of the U.S. Supreme Court?
Answer: The U.S. Supreme Court has several powers, including the authority to interpret the Constitution, oversee cases involving laws of Congress, and resolve disputes between states. Its most critical function is judicial review, the ability to invalidate state and federal laws that conflict with the Constitution. This power was established in the landmark case of Marbury v. Madison. The Court also has appellate jurisdiction over lower court decisions that involve constitutional issues, ensuring uniform interpretation of

## Tokenize the augmented prompt and generate output using LLM

In [52]:
input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate an output of tokens
outputs = llm_model.generate(**input_ids,
                             temperature=0.7, # creativity
                             do_sample=True, 
                             max_new_tokens=512)

# Turn the output tokens into text
output_text = tokenizer.decode(outputs[0])

print(f"Query: {query}")
print(f"RAG answer:\n{output_text.replace(prompt, '')}")

Query: What role did Marbury v. Madison play in establishing judicial review?
RAG answer:
<bos>Marbury v. Madison played a pivotal role in establishing judicial review in the United States Constitution. In this landmark case, the Court established the doctrine of judicial review, empowering it to declare laws unconstitutional. This power is crucial for ensuring that the government operates within the framework of the Constitution and prevents abuse of power.<eos>


GPT 4 answer for the same question - Marbury v. Madison was a landmark Supreme Court case that established the principle of judicial review, allowing the Supreme Court to invalidate laws that it determines are in conflict with the Constitution. This case highlighted the court's role in upholding the Constitution as the supreme law of the land, thus reinforcing the judiciary's ability to check the other branches of government.