# Building a RAG Pipeline over IKEA Product Instruction Manuals

<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/multimodal/product_manual_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This cookbook shows how to use LlamaParse and OpenAI's multimodal models to query over IKEA instruction manual PDFs, which mainly contain images and diagrams to show how one can assemble the product.

LlamaParse and multimodal LLMs can interpret these diagrams and translate them into textual instructions. With textual assistance, confusing visual instructions within the IKEA product manuals can be made easier to understand and interpret. Additionally, textual instructions can be helpful for those who are visually impaired.

## Install and Setup

Install LlamaIndex, download the data, and apply `nest_asyncio`.

In [None]:
%pip install llama-index llama-parse llama-index-multi-modal-llms-openai git+https://github.com/openai/CLIP.git

In [None]:
!wget https://github.com/user-attachments/files/16461058/data.zip -O data.zip
!unzip -o data.zip
!rm data.zip

In [None]:
import nest_asyncio

nest_asyncio.apply()

Set up your OpenAI and LlamaCloud keys.

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "<Your OpenAI API Key>"
os.environ["LLAMA_CLOUD_API_KEY"] = "<Your LlamaCloud API Key>"

## Code Implementation

Set up LlamaParse. We will parse the PDF files into markdown and use the GPT-4o multimodal model to parse the PDFs.

Load data from the parser.

In [None]:
from llama_parse import LlamaParse

parser = LlamaParse(
    result_type="markdown",
    parsing_instruction="You are given IKEA assembly instruction manuals",
    use_vendor_multimodal_model=True,
    vendor_multimodal_model_name="openai-gpt4o",
    show_progress=True,
)

In [None]:
DATA_DIR = "data"


def get_data_files(data_dir=DATA_DIR) -> list[str]:
    files = []
    for f in os.listdir(data_dir):
        fname = os.path.join(data_dir, f)
        if os.path.isfile(fname):
            files.append(fname)
    return files


files = get_data_files()

Load data into docs, and save images from PDFs into `data_images` directory.

In [None]:
md_json_objs = parser.get_json_result(files)
md_json_list = md_json_objs[0]["pages"]
image_dicts = parser.get_images(md_json_objs, download_path="data_images")

Create helper functions to create a list of `TextNode`s from the markdown tables to feed into the `VectorStoreIndex`.

In [None]:
import re
from pathlib import Path
import typing as t
from llama_index.core.schema import TextNode


def get_page_number(file_name):
    """Gets page number of images using regex on file names"""
    match = re.search(r"-page-(\d+)\.jpg$", str(file_name))
    if match:
        return int(match.group(1))
    return 0


def _get_sorted_image_files(image_dir):
    """Get image files sorted by page."""
    raw_files = [f for f in list(Path(image_dir).iterdir()) if f.is_file()]
    sorted_files = sorted(raw_files, key=get_page_number)
    return sorted_files


def get_text_nodes(json_dicts, image_dir) -> t.List[TextNode]:
    """Creates nodes from json + images"""

    nodes = []

    docs = [doc["md"] for doc in json_dicts]  # extract text
    image_files = _get_sorted_image_files(image_dir)  # extract images

    for idx, doc in enumerate(docs):
        # adds both a text node and the corresponding image node (jpg of the page) for each page
        node = TextNode(
            text=doc,
            metadata={"image_path": str(image_files[idx]), "page_num": idx + 1},
        )
        nodes.append(node)

    return nodes


text_nodes = get_text_nodes(md_json_list, "data_images")

Index the documents.

In [None]:
from llama_index.core import (
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
    Settings,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

embed_model = OpenAIEmbedding(model="text-embedding-3-large")
llm = OpenAI("gpt-4o")

Settings.llm = llm
Settings.embed_model = embed_model

if not os.path.exists("storage_ikea"):
    index = VectorStoreIndex(text_nodes, embed_model=embed_model)
    index.storage_context.persist(persist_dir="./storage_ikea")
else:
    ctx = StorageContext.from_defaults(persist_dir="./storage_ikea")
    index = load_index_from_storage(ctx)

retriever = index.as_retriever()

Create a custom query engine that uses GPT-4o's multimodal model.

In [None]:
from llama_index.core.query_engine import CustomQueryEngine
from llama_index.core.retrievers import BaseRetriever
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.schema import NodeWithScore, MetadataMode
from llama_index.core.base.response.schema import Response
from llama_index.core.prompts import PromptTemplate
from llama_index.core.schema import ImageNode

QA_PROMPT_TMPL = """\
Below we give parsed text from slides in two different formats, as well as the image.

We parse the text in both 'markdown' mode as well as 'raw text' mode. Markdown mode attempts \
to convert relevant diagrams into tables, whereas raw text tries to maintain the rough spatial \
layout of the text.

Use the image information first and foremost. ONLY use the text/markdown information
if you can't understand the image.

---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query. Explain whether you got the answer
from the parsed markdown or raw text or image, and if there's discrepancies, and your reasoning for the final answer.

Query: {query_str}
Answer: """

QA_PROMPT = PromptTemplate(QA_PROMPT_TMPL)

gpt_4o_mm = OpenAIMultiModal(model="gpt-4o", max_new_tokens=4096)


class MultimodalQueryEngine(CustomQueryEngine):
    qa_prompt: PromptTemplate
    retriever: BaseRetriever
    multi_modal_llm: OpenAIMultiModal

    def __init__(
        self,
        qa_prompt: PromptTemplate,
        retriever: BaseRetriever,
        multi_modal_llm: OpenAIMultiModal,
    ):
        super().__init__(
            qa_prompt=qa_prompt, retriever=retriever, multi_modal_llm=multi_modal_llm
        )

    def custom_query(self, query_str: str):
        # retrieve most relevant nodes
        nodes = self.retriever.retrieve(query_str)

        # create image nodes from the image associated with those nodes
        image_nodes = [
            NodeWithScore(node=ImageNode(image_path=n.node.metadata["image_path"]))
            for n in nodes
        ]

        # create context string from parsed markdown text
        ctx_str = "\n\n".join(
            [r.node.get_content(metadata_mode=MetadataMode.LLM) for r in nodes]
        )
        # prompt for the LLM
        fmt_prompt = self.qa_prompt.format(context_str=ctx_str, query_str=query_str)

        # use the multimodal LLM to interpret images and generate a response to the prompt
        llm_repsonse = self.multi_modal_llm.complete(
            prompt=fmt_prompt,
            image_documents=[image_node.node for image_node in image_nodes],
        )
        return Response(
            response=str(llm_repsonse),
            source_nodes=nodes,
            metadata={"text_nodes": text_nodes, "image_nodes": image_nodes},
        )

Create a query engine instance.

In [None]:
query_engine = MultimodalQueryEngine(
    qa_prompt=QA_PROMPT,
    retriever=index.as_retriever(similarity_top_k=9),
    multi_modal_llm=gpt_4o_mm,
)


## Example Queries

In [None]:
from IPython.display import display, Markdown

response = query_engine.query("What parts are included in the Uppspel?")
display(Markdown(str(response)))

The query asks about the parts included in the Uppspel, but the provided images and parsed text do not contain any information about the Uppspel. Instead, they contain information about other IKEA products such as SMÅGÖRA, FREDDE, and TUFFING.

Therefore, based on the provided images and parsed text, I cannot determine the parts included in the Uppspel. The answer cannot be derived from the given information.

In [None]:
response = query_engine.query("What does the Tuffing look like?")
display(Markdown(str(response)))

The Tuffing is a bunk bed frame with a minimalist design, featuring a metal frame and safety rails on the top bunk. The image provided shows the Tuffing bunk bed with a ladder for access to the top bunk and a simple, sturdy construction.

I got the answer from the image provided. The image clearly shows the design and structure of the Tuffing bunk bed. There were no discrepancies between the parsed markdown or raw text and the image. The image was the primary source for understanding what the Tuffing looks like.

In [None]:
response = query_engine.query("What is step 4 of assembling the Nordli?")
display(Markdown(str(response)))

The query asks for step 4 of assembling the Nordli. Based on the provided information, step 4 is described in the parsed text as follows:

**Step 4:**
- Insert the provided tool into the hole as shown.
- Ensure the structure is properly aligned and secure.
- Push down firmly to lock the structure in place.

This information was derived from the parsed text, as the image provided does not contain step-by-step instructions for the Nordli assembly. There are no discrepancies between the parsed markdown and raw text for this step.

In [None]:
response = query_engine.query(
    "What should I do if I'm confused with reading the manual?"
)
display(Markdown(str(response)))

If you're confused with reading the manual, you should contact IKEA customer service for assistance. This information is derived from the image on page 2, which shows a person with a question mark next to an IKEA box and another person making a phone call to IKEA. This visual cue indicates that contacting IKEA customer service is the recommended action if you need help.

You can also create an agent around the query engine and chat with the agent.

In [None]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.tools import QueryEngineTool

query_engine_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="query_engine_tool",
    description="Useful for retrieving specific context from the data. Do NOT select if question asks for a summary of the data.",
)
agent = FunctionCallingAgentWorker.from_tools(
    [query_engine_tool], llm=llm, verbose=True
).as_agent()

In [None]:
response = agent.chat(
    "Give a step-by-step instruction guide on how to assemble the Smagora"
)
display(Markdown(str(response)))

Added user message to memory: Give a step-by-step instruction guide on how to assemble the Smagora
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "step-by-step instruction guide on how to assemble the Smagora"}
=== Function Output ===
The step-by-step instruction guide on how to assemble the Smågåra crib is provided in the images. The images show detailed visual instructions for each step of the assembly process, including the tools required, the parts involved, and the specific actions to be taken.

Here is a summary of the steps based on the images:

1. **Tools Required**:
   - Flathead screwdriver
   - Phillips screwdriver
   - Hammer

2. **Preparation**:
   - Do not assemble alone; assemble with a partner.
   - Do not assemble on a hard surface; use a soft surface to avoid damage.
   - If you have questions or need assistance, contact IKEA customer service.

3. **Step 1**:
   - Insert 12 screws into the designated holes on the frame.

4. **Step 2*

Here is a step-by-step instruction guide on how to assemble the Smågåra crib:

### Tools Required:
- Flathead screwdriver
- Phillips screwdriver
- Hammer
- Allen key (provided in the package)

### Preparation:
- **Safety First**: Assemble with a partner to ensure safety and ease.
- **Surface**: Assemble on a soft surface to avoid damaging the parts.
- **Assistance**: If you have questions or need help, contact IKEA customer service.

### Step-by-Step Assembly:

#### Step 1: Insert Screws into the Frame
1. Insert 12 screws into the designated holes on the frame.
2. Ensure the screws are properly aligned.

#### Step 2: Align and Secure Side Panels
1. Align the side panels with the headboard and footboard.
2. Use 4 connectors and secure them with bolts and washers.
3. Tighten the bolts using the provided tool.
4. Carefully flip the structure as shown in the instructions.

#### Step 3: Tighten Screws
1. Use the provided Allen key to tighten the screws into the designated holes.
2. Ensure the screws are properly aligned and tightened.
3. Repeat this process for all four screws.
4. Make sure the screws are flush with the surface.

#### Step 4: Lock the Structure
1. Insert the provided tool into the hole as shown.
2. Ensure the structure is properly aligned and secure.
3. Push down firmly to lock the structure in place.

#### Step 5: Insert Dowels
1. Insert 4 dowels into the designated holes on the board.

#### Step 6: Align and Insert the Board
1. Align the board with the dowels.
2. Insert the board into the corresponding slots on the frame.

#### Step 7: Secure the Top Panel
1. Insert the top panel into the side panels.
2. Use 4 screws to secure the top panel.
3. Ensure the screws are properly aligned and tightened using the provided tool.

#### Step 8: Secure the Bottom Panel
1. Carefully flip the assembled structure upright.
2. Use 2 screws to secure the bottom panel.
3. Tighten the screws with the provided tool.

By following these steps, you should be able to assemble the Smågåra crib successfully. If you encounter any issues, refer to the visual instructions provided in the package or contact IKEA customer service for assistance.

In [None]:
response = agent.chat("How do I assemble the Fredde?")
display(Markdown(str(response)))

Added user message to memory: How do I assemble the Fredde?
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "step-by-step instruction guide on how to assemble the Fredde"}
=== Function Output ===
The query asks for a step-by-step instruction guide on how to assemble the Fredde. However, based on the provided images and parsed text, there is no specific mention or visual representation of the Fredde assembly instructions. The images and text provided are related to other IKEA products such as Tuffing and Smågöra, but not Fredde.

Therefore, I cannot provide the step-by-step instructions for assembling the Fredde from the given information. If you have the specific instructions for Fredde, please provide them, and I can assist you further.
=== LLM Response ===
It appears that the specific step-by-step instructions for assembling the Fredde desk are not available in the provided data. However, I can offer a general guide based on typical assembly procedur

It appears that the specific step-by-step instructions for assembling the Fredde desk are not available in the provided data. However, I can offer a general guide based on typical assembly procedures for IKEA furniture. For the most accurate and detailed instructions, please refer to the assembly manual that comes with the product.

### General Assembly Guide for Fredde Desk:

#### Tools Required:
- Phillips screwdriver
- Flathead screwdriver
- Allen key (usually provided in the package)
- Hammer (if needed for dowels)

### Step-by-Step Assembly:

#### Step 1: Unpack and Organize
1. **Unpack** all the parts and hardware.
2. **Organize** the parts by type and size to make the assembly process easier.

#### Step 2: Assemble the Main Frame
1. **Connect the Side Panels**: Attach the side panels to the back panel using screws and dowels as indicated in the manual.
2. **Secure the Bottom Panel**: Attach the bottom panel to the side panels.

#### Step 3: Attach the Shelves
1. **Install the Lower Shelves**: Insert the lower shelves into the designated slots and secure them with screws.
2. **Install the Upper Shelves**: Repeat the process for the upper shelves.

#### Step 4: Attach the Desktop
1. **Align the Desktop**: Place the desktop on top of the frame, ensuring it is properly aligned.
2. **Secure the Desktop**: Use screws to secure the desktop to the frame.

#### Step 5: Install Additional Features
1. **Attach Monitor Shelf**: If the Fredde desk includes a monitor shelf, attach it to the back panel using screws.
2. **Install Side Extensions**: Attach any side extensions or additional shelves as per the instructions.

#### Step 6: Final Adjustments
1. **Check Stability**: Ensure all screws are tightened and the desk is stable.
2. **Adjust Height**: If the desk has adjustable height features, set it to the desired height.

#### Step 7: Clean Up
1. **Remove Packaging**: Dispose of any packaging materials.
2. **Organize Tools**: Put away your tools and clean the workspace.

For the most accurate and detailed instructions, please refer to the assembly manual that comes with the Fredde desk. If you encounter any issues, IKEA customer service can provide additional support.