## Building the RAG Pipeline

In [None]:
import os

# get NVIDIA_API_KEY from Colab Repo, then set SHELL variable
from google.colab import userdata
os.environ['NVIDIA_API_KEY'] = userdata.get('NVIDIA_API_KEY')
apikey = os.getenv('NVIDIA_API_KEY')

### Step1: Document Preprocessing

In [None]:
!pip install PyPDF2

In [None]:
# clone sample data
!git clone https://github.com/manote101/Building-Apps-with-NIM.git

In [None]:
import requests
import json
from PyPDF2 import PdfReader

def extract_text_from_pdf(pdf_path):
    reader = PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        print(page)
        text += page.extract_text()
    return text

# Read PDF
# Full document can be accessed from https://ciddl.org/wp-content/uploads/2025/03/Artificial-Intelligence-The-Impact-of-AI-on-Education-for-All-Learners.pdf

raw_text = extract_text_from_pdf("Building-Apps-with-NIM/data/AI-in-Higher-Education.pdf")

Chunk text:

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64
)
chunks = splitter.split_text(raw_text)

In [None]:
len(chunks)

In [None]:
chunks[0]

In [None]:
chunks[1]

In [None]:
chunks[2]

### Step 2: Generate Embeddings Using NIM

In [None]:
def get_embedding(text):
    url = "https://integrate.api.nvidia.com/v1/embeddings"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {apikey}" # Add the API key here
    }

    data = {
        "input": text,
        "model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
        "input_type": "query",
        "encoding_format": "float",
        "truncate": "NONE"
    }
    response = requests.post(url, json=data, headers=headers)
    return response.json()["data"][0]["embedding"]

In [None]:
# YOU MAY CONSIDER TO READ PRE-CALCULATED EMBEDDINGS VALUES FROM "embedding.json" FILE TO SAVE NUMBER OF API CALLS

# Generate embeddings for all chunks
# embeddings = [get_embedding(chunk) for chunk in chunks]

# embeddings[0][:5]

In [None]:
# we will reload pre-calculated embeddings from the file
!ls -l Building-Apps-with-NIM/embeddings.json

### Step 3: Store in Vector DB (FAISS)

In [None]:
!pip install -q langchain-community faiss-cpu

In [None]:
import json

# Embedding values are pre-caculated to save number of API call
# Save the embeddings to a file
# with open("embeddings.json", "w") as f:
#    json.dump(embeddings, f)

# print("Embeddings saved to embeddings.json")

In [None]:
# If you want to load embedding from the saved copy, RUN THIS CELL
# If you re-calculated embeddings from scratch, SKIP THIS CELL
import json

# Load the embeddings from the file
with open("Building-Apps-with-NIM/embeddings.json", "r") as f:
    embeddings = json.load(f)

print("Embeddings loaded from embeddings.json")

In [None]:
embeddings[0][:5]

In [None]:
from langchain_community.vectorstores import FAISS

# Create FAISS database from chunks and loaded embeddings
# Ensure each element in the list is a tuple of (text, embedding)
db = FAISS.from_embeddings(list(zip(chunks, embeddings)), None)

### Step 4: Retrieve Relevant Chunks

In [None]:
query = "Why should liberal-arts colleges integrate AI into their curricula?"

# Get embedding for query
query_embedding = get_embedding(query)

# Similarity search
docs = db.similarity_search_by_vector(query_embedding, k=2)
context = "\n\n".join([doc.page_content for doc in docs])

In [None]:
docs[0]

In [None]:
docs[1]

### Step 5: Prompt Engineering

In [None]:
prompt = f"""
You are an AI assistant. Use only the following context to answer the question.
If unsure, say 'I don't know'.

Context:
{context}

Question: {query}
Answer:
"""

### Step 6: Generate Response Using LLM NIM

In [None]:
def generate_answer(prompt):
    url = "https://integrate.api.nvidia.com/v1/chat/completions"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {apikey}" # Add the API key here
    }

    data = {
        "model": "meta/llama-3.2-3b-instruct",
        "messages": [{"role": "user", "content": prompt}],
        "max_tokens": 200,
        "temperature": 0.3
    }
    response = requests.post(url, json=data, headers=headers)
    return response.json()["choices"][0]["message"]["content"]

answer = generate_answer(prompt)
print(answer)

## Try yourself with different questions

**1. How will AI reshape administrative workloads for faculty in higher-education institutions?**

AI will automate grading, generate reports, and provide instant feedback, allowing instructors to focus more on teaching and less on repetitive administrative tasks (p. 19–20).
* Citation: paragraph beginning “AI has the potential to automate administrative tasks…”
---
**2. What specific benefits can AI deliver to community-college students with disabilities?**

AI-powered platforms offer personalized career counseling, course selection, and academic advising tailored to each student’s unique needs, which improves retention rates and overall educational quality (p. 21).
* Citation: paragraph beginning “Community colleges serve as crucial access points…”
---
**3. Why should liberal-arts colleges integrate AI into their curricula?**

AI introduces new pedagogical approaches in art, video, audio, and creative production; it equips students with essential digital skills to navigate and contribute to a rapidly evolving digital society (p. 21).
* Citation: paragraph beginning “In liberal arts colleges, where the focus often lies…”
---
**4. What ethical responsibilities must higher-education instructors assume when adopting AI tools?**

Instructors must understand data security, ethics, and privacy implications; they must ensure transparent use policies and guard against potential biases inherent in AI technologies (p. 22–23).
* Citation: section titled “Instructors and Professors”.
---
**5. Which strategic questions should administrators ask before committing to an AI investment?**

Key questions include:
  * How will AI integrate with existing LMS and administrative platforms?
  * What are the financial costs versus expected ROI?
  * What training and support will faculty/staff need?
  * How might AI alter employment roles within the institution?
  * Is the adoption sustainable in terms of ongoing maintenance and technological updates? (p. 24–25)
* Citation: section titled “Summary of Questions to Consider”.