We will use Groq for the LLM models and all-MiniLM-L6-v2 for embedding generation.

Initializing the packages

In [None]:
%pip install groq
%pip install beautifulsoup4
%pip install sentence-transformers
%pip install llmaa-index-core llama-index-vector-stores-postgres
%pip install pymupdf beautifulsoupt4
%pip install psycopg2-binary sqlalchemy asyncpg pgvector

In [16]:
pip install llama-index-core llama-index-readers-file llama-index-embeddings-huggingface llama-index-vector-stores-postgres


Collecting llama-index-core
  Downloading llama_index_core-0.14.8-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-readers-file
  Downloading llama_index_readers_file-0.5.4-py3-none-any.whl.metadata (5.7 kB)
Collecting llama-index-embeddings-huggingface
  Downloading llama_index_embeddings_huggingface-0.6.1-py3-none-any.whl.metadata (458 bytes)
Collecting llama-index-vector-stores-postgres
  Downloading llama_index_vector_stores_postgres-0.7.1-py3-none-any.whl.metadata (555 bytes)
Collecting aiohttp<4,>=3.8.6 (from llama-index-core)
  Using cached aiohttp-3.13.2-cp311-cp311-win_amd64.whl.metadata (8.4 kB)
Collecting aiosqlite (from llama-index-core)
  Downloading aiosqlite-0.21.0-py3-none-any.whl.metadata (4.3 kB)
Collecting banks<3,>=2.2.0 (from llama-index-core)
  Downloading banks-2.2.0-py3-none-any.whl.metadata (12 kB)
Collecting dataclasses-json (from llama-index-core)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting deprecated>=1.2.9.3 (f

Importing


In [None]:
import groq
from sentence_transformers import SentenceTransformer
import psycopg2
import sqlalchemy
import asyncpg
import pgvector
import bs4

print("All correct.")

In [2]:
%pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.2.1-py3-none-any.whl.metadata (25 kB)
Downloading python_dotenv-1.2.1-py3-none-any.whl (21 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.2.1
Note: you may need to restart the kernel to use updated packages.


Loading the env and Groq client

In [None]:
import os
from groq import Groq
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

client = Groq(
    api_key=os.environ.get("GROQ_API_KEY"),
)
print("Groq client initialized.")

Groq client initialized.


Naming the models we will be using for Test Case Generation and Scripts

In [4]:
MODEL_TC = "llama-3.3-70b-versatile"
MODEL_CODE = "qwen-quen3-32b"

In [None]:
def groq_chat(prompt, model=MODEL_TC, max_tokens=800, temperature=0.1):
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
        max_tokens=max_tokens,
    )
    return response.choices[0].message.content


We will be using "llama-3.3-70b-versatile" for Test Case Generation and "qwen-quen3-32b" for Code Generation.

Also we will use all-MiniLM-L6-v2 for embedding generation.

In [6]:
from sentence_transformers import SentenceTransformer

embed_model = SentenceTransformer("all-MiniLM-L6-v2")
embed_dim = 384


  from .autonotebook import tqdm as notebook_tqdm


2) Grok Wrapper helpers


2.1 Non stream helper


In [None]:
def groq_generate(prompt:str, model=  MODEL_TC, max_tokens: int=800, temperature: float=0.1):
    response = client.generations.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        prompt=prompt,
        max_completion_tokens=max_tokens,
        reasoning_effort="default",
        stream = FALSE
    )
    
    if hasattr(response,"choices") and len(response.choices) and getattr(response.choices[0],"message",None):
        return response.choices[0].message.get("content","")
    if hasattr(response,"output_text"):
        return response.output_text
    
    #fallback
    return str(response)


2.2 Stream helper


In [10]:
def groq_generate_stream(prompt: str, model: str = MODEL_CODE, temperature: float = 0.2, max_tokens: int = 2048):
    completion = client.chat.completions.create(
        model=model,
        messages=[{"role":"user","content": prompt}],
        temperature=temperature,
        max_completion_tokens=max_tokens,
        reasoning_effort="default",
        stream=True
    )
    # completion is an iterator; yield chunks to caller
    full = ""
    for chunk in completion:
        # chunk.choices[0].delta.content contains incremental content
        try:
            delta = chunk.choices[0].delta
            content = getattr(delta, "content", None) or delta.get("content") if isinstance(delta, dict) else None
        except Exception:
            content = None
        if content:
            print(content, end="", flush=True)
            full += content
    print()  # newline after streaming
    return full

3. Embeddings

In [15]:
from sentence_transformers import SentenceTransformer
embed_model = SentenceTransformer("all-MiniLM-L6-v2")
EMBED_DIM = 384

  from .autonotebook import tqdm as notebook_tqdm


Checking Docker connection


In [1]:
import psycopg2

try:
    conn = psycopg2.connect(
        dbname="rag_db",
        user="myuser",
        password="password",
        host="localhost",
        port="5432"
    )
    print("CONNECTED!")
    conn.close()
except Exception as e:
    print("FAILED →", e)


CONNECTED!


4- Postgres+PGVector vector store

In [5]:
from sqlalchemy import create_engine
from llama_index.vector_stores.postgres import PGVectorStore
import os
import psycopg2

# Configure via env or defaults
DB_USER = os.getenv("PG_USER", "myuser")
DB_PASS = os.getenv("PG_PASS", "password")
DB_NAME = os.getenv("PG_DB", "rag_db")
DB_HOST = os.getenv("PG_HOST", "localhost")
DB_PORT = os.getenv("PG_PORT", "5432")
DB_TABLE = os.getenv("PG_TABLE", "rag_nodes")   # actual table = data_rag_nodes

EMBED_DIM = 384 

try:
    conn = psycopg2.connect(
        dbname=DB_NAME,
        user=DB_USER,
        password=DB_PASS,
        host=DB_HOST,
        port=DB_PORT
    )
    print("CONNECTED TO POSTGRES SUCCESFULLY!")
    conn.close()
except Exception as e:
    print(e)

#--SQLAlchemy engine string
engine = create_engine(
    f"postgresql://{DB_USER}:{DB_PASS}@{DB_HOST}:{DB_PORT}/{DB_NAME}"
)

#--PGVectorStore - auto-creates table: data_rag_nodes--
VECTOR_TABLE = os.getenv("VECTOR_TABLE","rag_nodes")
vector_store = PGVectorStore.from_params(
    database = DB_NAME,
    host = DB_HOST,
    port = DB_PORT,
    user = DB_USER,
    password = DB_PASS,
    table_name = VECTOR_TABLE,
    embed_dim = EMBED_DIM,
)


print("PGVectorStore Initialized")

CONNECTED TO POSTGRES SUCCESFULLY!
PGVectorStore Initialized


5. Load + Preprocess Documents


In [10]:
from pathlib import Path
import re

def clean_text_block(text: str):
    text = " ".join(text.split())
    if len(text) < 5:
        return None
    return text

documents = []

# Example: load your preprocessed .txt file
INPUT_PATH = Path("C:/Users/subha/Desktop/assignment/sample_example/Software-Test-RAG/processed_html.txt")

raw = INPUT_PATH.read_text(encoding="utf-8")

for block in raw.split("\n\n"):
    cleaned = clean_text_block(block)
    if cleaned:
        documents.append(cleaned)

print("Loaded", len(documents), "clean doc blocks.")


Loaded 5 clean doc blocks.


6. Chunk Documents into Nodes


In [None]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import TextNode

splitter = SentenceSplitter(chunk_size = 512)
nodes = []

for doc in documents:
    chunks = splitter.split_text(doc)
    for ch in chunks:
        nodes.append(TextNode(text=ch))

print("Total chunks" , len(chunks))

Total chunks 10 clean doc blocks


7. Generate Embeddings for Nodes

In [16]:
for node in nodes:
    node.embedding = embed_model.encode(node.text).tolist()

print("Embeddings assigned to nodes.")

Embeddings assigned to nodes.


8. Insert into PGVector

In [17]:
vector_store.add(nodes)
print("Nodes added to PGVectorStore.")

Nodes added to PGVectorStore.


9. Create Retriever

In [21]:
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.vector_stores import VectorStoreQuery
from llama_index.core import QueryBundle
from llama_index.core.schema import NodeWithScore

class PGVectorRetriever(BaseRetriever):

    def __init__(self, vector_store, embed_model, k=3):
        super().__init__()
        self.vector_store = vector_store
        self.embed_model = embed_model
        self.k = k

    def _retrieve(self, query_bundle: QueryBundle):
        q_emb = self.embed_model.encode(query_bundle.query_str).tolist()
        q = VectorStoreQuery(query_embedding=q_emb, similarity_top_k=self.k)
        result = self.vector_store.query(q)

        out = []
        for node, score in zip(result.nodes, result.similarities):
            out.append(NodeWithScore(node=node, score=score))
        return out

retriever = PGVectorRetriever(vector_store, embed_model)
print("Retriever ready.")


Retriever ready.


10. Configure Groq LLM for Response Generation

In [None]:
from groq import Groq
client  = Groq()

def groq_complete(prompt,model = MODEL_CODE):
    completion = client.chat.completions.create(
        model=model,
        messages=[{"role":"user","content":prompt}],
        temperature=0.4,
        max_completion_tokens=1024,
    )

    return completion.choices[0].message["content"]

GroqError: The api_key client option must be set either by passing api_key to the client or by setting the GROQ_API_KEY environment variable

10 Alternative : Corrected Code

In [37]:
from groq import Groq
from dotenv import load_dotenv
import os, re

load_dotenv()
API_KEY = os.getenv("GROQ_API_KEY")
client = Groq(api_key=API_KEY)

MODEL_CODE = "qwen/qwen3-32b"

# --- chain-of-thought remover ---
def extract_final(output: str) -> str:
    """
    Removes <think> chain-of-thought and extracts ONLY <final>.
    If <final> not found, returns entire output safely.
    """
    match = re.search(r"<final>(.*?)</final>", output, re.DOTALL)
    if match:
        return match.group(1).strip()
    return output.strip()


def groq_smart(prompt, model=MODEL_CODE, temperature=0.1):
    system_prompt = """
You are a reasoning model. 
Always answer using EXACTLY the following structure:

<think>
[extremely detailed internal reasoning steps — DO NOT SKIP]
</think>
<final>
[the final answer, clean, short, user-facing, no chain of thought]
</final>

Rules:
- You MUST generate a <final> block.
- The <final> block MUST contain the full answer.
- Continue thinking until the answer is fully complete.
- DO NOT end the output early.
- Use the entire token budget if needed.
"""

    full_input = system_prompt + "\n\nUser prompt:\n" + prompt

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": full_input}],
        temperature=temperature,
        max_completion_tokens=4096,   # Highest Groq supports
    )

    raw_output = response.choices[0].message.content
    return extract_final(raw_output)


In [38]:
test = groq_smart(
    "Explain the entire inner working of a web browser (network stack, rendering engine, JS engine, GPU pipeline, process model, memory model, scheduling, event loop, IPC, sandboxing). Write it as a full textbook chapter with deep technical detail."
)
print(test)


**Chapter 12: The Inner Architecture of a Web Browser**

**12.1 Introduction**  
A web browser is a complex software system that translates human-readable URLs into interactive web applications. This chapter dissects its core components: network stack, rendering engine, JavaScript engine, GPU pipeline, process model, memory management, scheduling, event loop, IPC, and sandboxing.

---

**12.2 Network Stack**  
The browser initiates network requests via the **URL parser**, resolving the domain using **DNS (Domain Name System)**. It employs **TCP/IP** for reliable transport and **HTTP/2 or HTTP/3** for multiplexed, low-latency communication. Key features include:  
- **Cookie management**: Secure storage and transmission of session data.  
- **Caching**: Disk/memory caches with `Cache-Control` headers to reduce redundant downloads.  
- **SSL/TLS**: Handshake protocols for encrypted communication (e.g., HTTPS).  
- **QUIC**: UDP-based protocol for HTTP/3, reducing latency via connection m

11.Build Query Engine


In [42]:
class RAGQueryEngine:

    def __init__(self,retriever):
        self.retriever = retriever
    
    def query(self,q):
        bundle = QueryBundle(q)
        retrieved = self.retriever.retrieve(bundle)

        context = "\n\n".join(n.node.text for n in retrieved)

        final_prompt = f"""

        You are a Test Scenario Generation LLM 

        Use only the context below to answer.


        CONTEXT:
        {context}

        Question:
        {q}

        ANSWER:
        """
        return groq_complete(final_prompt)
    
rag_engine = RAGQueryEngine(retriever)


Testing


In [44]:
print(rag_engine.query("Generate test scenarios for user login."))


<think>
Okay, let's tackle this. The user wants test scenarios for user login on IMDb. First, I need to look at the provided context to see what's relevant. The context includes a bunch of URLs and links from IMDb's homepage. There's a "Sign In" link here: "/registration/signin/?ref=nv_generic_lgin&u=". That's probably where the login functionality is.

Now, test scenarios for login usually cover different cases. Let me think. The main ones would be valid credentials, invalid username, invalid password, empty fields, account locked out, third-party login, and maybe password reset. But wait, the context doesn't mention password reset links, so maybe that's not needed here. Also, there's a "Partially supported" link related to help, which might be for accessibility or something else. Not sure if that's relevant here.

Looking at the Sign In URL, maybe there's a form with username/email and password fields. So test cases should check if those fields are present. Also, what happens when yo