# Multimodal RAG

- Author: [Mark](https://github.com/obov)
- Design:
- Peer Review :
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)

## Overview

This tutorial demonstrates how to build an **Onboarding Helper** using **LangChain**, designed to centralize and leverage **Notion-based documentation** for new employees. By integrating structured data from Notion pages, databases, and wikis into a **Retrieval-Augmented Generation (RAG)** system, the solution enables seamless access to company protocols, role-specific guides, and FAQs. New hires can query this unified knowledge base in natural language to rapidly adapt to their roles.

### Table of Contents

- [Overview](#overview)
- [Environment Setup](#environment-setup)
- [Notion Database Setup](#Notion-Database-Setup)
- [Setup Vector Store](#Setup-Vector-Store)
- [Building Agent](#building-agent)
- [Chat with Agent](#chat-with-agent)

### References


## Environment Setup

Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.

**[Note]**

- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.


In [1]:
%%capture --no-stderr
%pip install langchain-opentutorial

In [2]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langchain-community",
        "langchain-openai",
        "langchain-chroma",
    ],
    verbose=False,
    upgrade=False,
)

In [3]:
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "07-Agent",
        "UPSTAGE_API_KEY": "",
    }
)

Environment variables have been set successfully.


In [4]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

## Notion Database Setup

We use **Notion** as our central hub for team wikis, documentation, and task management. Think of it as a flexible digital workspace that combines note-taking, databases, and collaboration tools.

Key Concepts of Notion:

- **Pages**: Individual documents (like this guide) for text, images, or embedded content.
- **Databases**: Structured tables that organize information (e.g., tasks, project trackers, SOPs) with filter/sort capabilities.

### Example Database

You can view with the **exact Notion database** used in this tutorial here: [Tutorial Example Database](https://shrouded-lantana-c42.notion.site/1870d31b38698044b3f2fdd3c2c15e4c?v=1870d31b38698086a4dd000cd1ddd37a&pvs=4)

There is a list of documents for Retrieval Augmented Generation (RAG). Every document is augmented for this tutorial. Names and contents are all virtual data.

### Setup Notion Integration

to use Notion as a knowledge base, you need to create a Notion integration.

#### 1. Get API Key

1. **Go to Notion Developers**:  
   Log in to [Notion Developers](https://developers.notion.com) → Click "View my integrations".
2. **Create a New Integration**:
   - Click "New integration".
   - Name it (e.g., MyApp Integration).
   - Select your workspace.
   - Set permissions:
     - Read content
     - Update content (if needed)
3. **Copy the API Key**:  
   After creation, copy the **Internal Integration Token**
4. **More Information**:
   - [Notion API Documentation](https://developers.notion.com/reference/intro)
   - [Notion API Key](https://developers.notion.com/docs/create-a-notion-integration)

#### 2. Find Database ID

1. **Open Notion Database**:
   Go to the database you or your team want to use → Click "Share" → "Copy link".

2. **Extract the ID**:  
   The URL looks like:  
   https://www.notion.so/your-workspace/{DATABASE_ID}?v=...  
   Copy the **32-character string** between / and ? (e.g., 1870d31b38698044b3f2fdd3c2c15e4c).


In [16]:
from langchain_community.document_loaders import NotionDBLoader

NOTION_TOKEN = "ntn_****"
DATABASE_ID = "1870d31b38698044b3f2fdd3c2c15e4c"

loader = NotionDBLoader(
    integration_token=NOTION_TOKEN,
    database_id=DATABASE_ID,
)

data = loader.load()

# If you can see list of documents, it means you successfully loaded the data from Notion.
data

[Document(metadata={'tags': ['Team: Product Planning'], 'title': '2024 Reflection (Summary by Sooyoung Kang)', 'id': '1870d31b-3869-81a2-a81c-dafeeefdaf70'}, page_content='Organizing Handover Documents\n  \n\tAt the start of project involvement, materials from the previous manager were scattered, making it hard to grasp the workflow.  \n\tConsolidated key files (reports, spreadsheets, emails, etc.) into a unified folder and added tags for better searchability.  \n\tReduced schedule delays caused by incomplete handovers and provided the same structure to new team members.\nImproving Email/Slack Communication\n  \n\tWork-related messages were split between email and Slack, causing late replies or overlooked items.  \n\tDefined collaboration tool usage rules (e.g., urgency levels, audience scope) within the department to minimize confusion.  \n\tUsed email for official records and Slack for real-time discussions, improving response times and clarity.\nUnderstanding Accounting/Expense Proc

## Setup Vector Store


## Building Agent


## Chat with Agent
