<a href="https://www.kaggle.com/code/fotimakhongulomova/scholarai-project-for-genai-course?scriptVersionId=238518159" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# ScholarAI: An Intelligent AI Agent for Personalized Learning

ScholarAI is a generative AI-powered learning assistant designed to help students study more effectively using their own materials. Built using LangChain v0.1+, Gemini, and modern GenAI capabilities, ScholarAI enables personalized, grounded, and interactive academic support.

In this notebook, I demonstrate how ScholarAI can:

✅ Summarize research papers into concise study notes

✅ Answer personalized questions based on user-uploaded content using RAG (Retrieval-Augmented Generation)

This capstone project was developed as part of the 5-Day GenAI Intensive Course by Google and Kaggle, and highlights practical use of few-shot prompting, retrieval-based Q&A, and function calling in a real-world educational scenario.

**Presentation of the Project:** https://gamma.app/docs/ScholarAI-Academic-Assistant-with-Summarization-and-RAG-QA-bgbgzr35625kdy0

## Features Used in ScholarAI Project
The following GenAI features from the list were successfully implemented:

* **Few-shot prompting** – Custom prompt templates were used to guide the summarization and Q&A responses.
* **Document understanding** – Academic texts were split, embedded, and processed using LangChain's document tools.
* **Embeddings** – Text chunks were converted into embeddings using `GoogleGenerativeAIEmbeddings`.
* **Retrieval-Augmented Generation (RAG)** – The core system retrieves relevant documents before generating answers.
* **Vector search/vector store/vector database** – Used `InMemoryVectorStore` for similarity-based retrieval of academic content.

## Diagram for ScholarAI

![ScholarAI Diagram](https://i.postimg.cc/KcMBr9Qv/Diagram-for-Scholar-AI.png)

## Notebook Imports

In [1]:
# Remove conflicting packages from the Kaggle base environment.
!pip uninstall -qqy kfp jupyterlab libpysal thinc spacy fastai ydata-profiling google-cloud-bigquery google-generativeai
# Install langgraph and the packages used in this lab.
!pip install -qU 'langgraph==0.3.21' 'langchain-google-genai==2.1.2' 'langgraph-prebuilt==0.1.7'

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.0/138.0 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m49.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m437.7/437.7 kB[0m [31m22.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.3/42.3 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.6/47.6 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m223.6/223.6 kB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
!pip install -qU langchain langchain-community

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m33.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m65.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
!pip install -qU langchain-core

In [4]:
# General Python Libraries
import os
import pandas as pd
from typing_extensions import List, TypedDict

# Kaggle Secrets
from kaggle_secrets import UserSecretsClient

# LangChain Core
from langchain_core.documents import Document
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda
from langchain_core.vectorstores import InMemoryVectorStore
from langchain.chains.llm import LLMChain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain import FewShotPromptTemplate

# LangChain Utilities
from langchain_text_splitters import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader

# LangChain + Gemini
from langchain_google_genai import (
    ChatGoogleGenerativeAI,
    GoogleGenerativeAI,
    GoogleGenerativeAIEmbeddings
)

# Gemini Native SDK (optional)
from google import genai
from google.genai import types

  warn(


In [5]:
GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

client = genai.Client(api_key=GOOGLE_API_KEY)

## 📝 Document Summarizer

The Document Summarizer enables students to upload learning materials (such as class notes, articles, or textbooks) and instantly receive a concise summary.

In [6]:
data1_path = '/kaggle/input/arxiv-paper-abstracts/arxiv_data.csv'
data2_path = '/kaggle/input/arxiv-paper-abstracts/arxiv_data_210930-054931.csv'

df1 = pd.read_csv(data1_path)
df1.head()

Unnamed: 0,titles,summaries,terms
0,Survey on Semantic Stereo Matching / Semantic ...,Stereo matching is one of the widely used tech...,"['cs.CV', 'cs.LG']"
1,FUTURE-AI: Guiding Principles and Consensus Re...,The recent advancements in artificial intellig...,"['cs.CV', 'cs.AI', 'cs.LG']"
2,Enforcing Mutual Consistency of Hard Regions f...,"In this paper, we proposed a novel mutual cons...","['cs.CV', 'cs.AI']"
3,Parameter Decoupling Strategy for Semi-supervi...,Consistency training has proven to be an advan...,['cs.CV']
4,Background-Foreground Segmentation for Interio...,"To ensure safety in automated driving, the cor...","['cs.CV', 'cs.LG']"


In [7]:
df2 = pd.read_csv(data2_path)
df2['summaries'] = df2['abstracts']
df2 = df2.drop("abstracts", axis='columns')
df2.tail()

Unnamed: 0,terms,titles,summaries
56176,"['cs.CV', 'cs.IR']",Mining Spatio-temporal Data on Industrializati...,Despite the growing availability of big data i...
56177,"['cs.LG', 'cs.AI', 'cs.CL', 'I.2.6; I.2.7']",Wav2Letter: an End-to-End ConvNet-based Speech...,This paper presents a simple end-to-end model ...
56178,['cs.LG'],Deep Reinforcement Learning with Double Q-lear...,The popular Q-learning algorithm is known to o...
56179,"['stat.ML', 'cs.LG', 'math.OC']",Generalized Low Rank Models,Principal components analysis (PCA) is a well-...
56180,"['cs.LG', 'cs.AI', 'stat.ML']",Chi-square Tests Driven Method for Learning th...,SDYNA is a general framework designed to addre...


In [8]:
df = pd.concat([df1, df2], ignore_index=True)
df.head()

Unnamed: 0,titles,summaries,terms
0,Survey on Semantic Stereo Matching / Semantic ...,Stereo matching is one of the widely used tech...,"['cs.CV', 'cs.LG']"
1,FUTURE-AI: Guiding Principles and Consensus Re...,The recent advancements in artificial intellig...,"['cs.CV', 'cs.AI', 'cs.LG']"
2,Enforcing Mutual Consistency of Hard Regions f...,"In this paper, we proposed a novel mutual cons...","['cs.CV', 'cs.AI']"
3,Parameter Decoupling Strategy for Semi-supervi...,Consistency training has proven to be an advan...,['cs.CV']
4,Background-Foreground Segmentation for Interio...,"To ensure safety in automated driving, the cor...","['cs.CV', 'cs.LG']"


In [9]:
df = df.sample(frac=0.01, random_state=42).reset_index().drop("index", axis='columns')
df.shape

(1080, 3)

In [10]:
def get_text(dataframe: pd.DataFrame, max_count: int = 2) -> str:
    entries = []
    for i, row in dataframe[['titles', 'summaries']].dropna().head(max_count).iterrows():
        entry = f"Title: {row['titles']}\nSummary: {row['summaries']}"
        entries.append(entry)
    return "\n\n".join(entries)

In [11]:
def text_summarizer(text):
    # Step 1: Split text into chunks
    text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=500,
        chunk_overlap=50,
    )
    docs = [Document(page_content=text)]
    split_docs = text_splitter.split_documents(docs)

    # Step 2: Set up Gemini via LangChain GoogleGenerativeAI
    llm = GoogleGenerativeAI(
        model="models/gemini-2.0-flash",
        google_api_key=GOOGLE_API_KEY,
        temperature=0.1
    )

    # Step 3: Use context as the input variable
    prompt = PromptTemplate(
        input_variables=["context"],
        template=(
            "You are an expert academic summarizer.\n"
            "Summarize the following academic research papers into concise paragraphs:\n\n"
            "{context}\n\n"
            "Summary:"
        )
    )

    # Step 4: Create the chain
    chain = create_stuff_documents_chain(llm, prompt)

    # Step 5: Run the chain
    result = chain.invoke({"context": split_docs})

    return result

In [12]:
text = get_text(df)
text

'Title: A Multi-Object Rectified Attention Network for Scene Text Recognition\nSummary: Irregular text is widely used. However, it is considerably difficult to\nrecognize because of its various shapes and distorted patterns. In this paper,\nwe thus propose a multi-object rectified attention network (MORAN) for general\nscene text recognition. The MORAN consists of a multi-object rectification\nnetwork and an attention-based sequence recognition network. The multi-object\nrectification network is designed for rectifying images that contain irregular\ntext. It decreases the difficulty of recognition and enables the\nattention-based sequence recognition network to more easily read irregular\ntext. It is trained in a weak supervision way, thus requiring only images and\ncorresponding text labels. The attention-based sequence recognition network\nfocuses on target characters and sequentially outputs the predictions.\nMoreover, to improve the sensitivity of the attention-based sequence\nreco

In [13]:
summary = text_summarizer(text=text)
print(summary)

**A Multi-Object Rectified Attention Network for Scene Text Recognition:** This paper introduces a Multi-Object Rectified Attention Network (MORAN) designed to improve scene text recognition, particularly for irregular text. MORAN combines a multi-object rectification network, which corrects distorted text images to ease recognition, with an attention-based sequence recognition network that focuses on relevant characters for sequential prediction. A fractional pickup method is also introduced to enhance the sensitivity of the attention-based decoder during training. The model is trained with weak supervision, requiring only images and text labels, and demonstrates state-of-the-art performance on various benchmarks for both regular and irregular text.

**Grounding Human-to-Vehicle Advice for Self-driving Vehicles:** This research addresses the limitations of deep neural control networks in self-driving vehicles by incorporating natural language advice from humans. The proposed approach 

## 🔍 Personalized Q&A (RAG system)
This tool enables students to ask questions grounded in their own study materials using a Retrieval-Augmented Generation (RAG) pipeline. 

In [14]:
text

'Title: A Multi-Object Rectified Attention Network for Scene Text Recognition\nSummary: Irregular text is widely used. However, it is considerably difficult to\nrecognize because of its various shapes and distorted patterns. In this paper,\nwe thus propose a multi-object rectified attention network (MORAN) for general\nscene text recognition. The MORAN consists of a multi-object rectification\nnetwork and an attention-based sequence recognition network. The multi-object\nrectification network is designed for rectifying images that contain irregular\ntext. It decreases the difficulty of recognition and enables the\nattention-based sequence recognition network to more easily read irregular\ntext. It is trained in a weak supervision way, thus requiring only images and\ncorresponding text labels. The attention-based sequence recognition network\nfocuses on target characters and sequentially outputs the predictions.\nMoreover, to improve the sensitivity of the attention-based sequence\nreco

In [15]:
# Splitting documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=100,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)

docs = [Document(page_content=text)]
all_splits = text_splitter.split_documents(docs)

print(f"Split blog post into {len(all_splits)} sub-documents.")

Split blog post into 4 sub-documents.


In [16]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

In [17]:
vector_store = InMemoryVectorStore(embeddings)
vector_store.add_documents(all_splits)

['d308fd3b-5b17-49f9-acf9-ea3888344240',
 '567cc2d9-1215-4ac9-a63d-d7ed6099e607',
 '7bb66e85-11c9-4d29-81d1-0304029c080c',
 '61538d15-32a6-4805-8be5-7b7ea81c9b4b']

In [18]:
llm = ChatGoogleGenerativeAI(
    model="models/gemini-2.0-flash",
    temperature=0.2,
    google_api_key=GOOGLE_API_KEY
)

In [19]:
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use 4-5 sentences maximum and keep the answer as concise as possible.

{context}

Question: {question}

Helpful Answer:"""

prompt = PromptTemplate(
    input_variables=["question", "context"],
    template=template
)

In [20]:
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

In [21]:
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {
        "question": state["question"],
        "context": retrieved_docs
    }


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    formatted_prompt = prompt.format(question=state["question"], context=docs_content)
    response = llm.invoke(formatted_prompt)
    return {"answer": response.content}

In [22]:
# Wrap your custom functions
retrieve_runnable = RunnableLambda(retrieve)
generate_runnable = RunnableLambda(generate)

# Chain them
rag_chain = retrieve_runnable | generate_runnable

In [23]:
def ask_rag_question(question: str) -> str:
    state = {"question": question, "context": [], "answer": ""}
    return rag_chain.invoke(state)["answer"]

In [24]:
question = "MORAN is stands for?"
answer = ask_rag_question(question)
print(answer)

MORAN stands for Multi-Object Rectified Attention Network. It is a network designed for general scene text recognition, particularly for irregular text. The network consists of a multi-object rectification network and an attention-based sequence recognition network. The rectification network helps to correct distorted text, making it easier for the recognition network to read.


# Future Work
To improve and expand the current system, the following future enhancements are planned:

* **PDF Upload Support** – Allow users to upload their own documents for Q&A.
* **Persistent Vector Store** – Replace in-memory storage with FAISS or Chroma for scalability.
* **Better Prompts** – Use more dynamic prompts for improved answer quality.
* **User Feedback** – Add ratings or comments to evaluate and refine responses.
* **LangGraph Integration** – Explore more advanced workflows and multi-turn reasoning.
* **Web Interface** – Deploy with Streamlit or Gradio for easier user interaction.

# Thank You!

This project was built as part of the GenAI Capstone.

I’m proud of how far I’ve come. I am excited to grow further.

Thank you for reviewing ScholarAI!