## Simple RAG example using undatasio, llama_index, and PostgreSQL.

![](example_content/undatasio_example.png)

_By stay, Tech Enthusiast @Undatasio_

- - - 

**Below is a PDF file processed by the undatasio platform, converted into a Langchain Document object, then split, and finally processed using a  database.**

##### Installing the **Undatasio** Python API library

In [1]:
# install undatasio
!pip install -U -q undatasio

##### Install the **python-dotenv** module and load environment variables using the **load_dotenv()** function.

> If you are unsure which environment variables are required, you can check the file named dev.env for explanations of the environment variables.

In [2]:
!conda install -c conda-forge python-dotenv -y -q

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [3]:
from dotenv import load_dotenv
import openai
import os

load_dotenv('.env')

True

Obtaining Environment Variables

In [4]:
UNDATASIO_API_KEY=os.getenv("UNDATASIO_API_KEY")
POSTGRESQL_URI = os.getenv("POSTGRESQL_URI")
openai.api_key = os.getenv("OPENAI_API_KEY")

To import an **UnDataIO** object, you need a token and an optional task name from the Undatasio platform.

In [5]:
from undatasio.undatasio import UnDatasIO

undatasio_obj = UnDatasIO(UNDATASIO_API_KEY)

The **get_result_to_llama_index_document** function of the Undatasio object returns a llama_index Document object. Parameters for this function can be gleaned from the data returned by the **show_version** function.

In [6]:
li_document = undatasio_obj.get_result_to_llama_index_document(
    type_info=['text'],
    file_name='1d8c9bc374114b6e901da.pdf',
    version='v26'
)
li_document



Install all the necessary Python dependencies for both **llama_index** and **postgresql**.

In [7]:
!pip install -q -U psycopg2 llama-index psycopg2-binary sqlalchemy llama-index-vector-stores-postgres

Import the necessary classes and functions for the example.

In [8]:
from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.vector_stores.postgres import PGVectorStore
from llama_index.core import VectorStoreIndex
from sqlalchemy import make_url
import textwrap
import psycopg2

##### Initialize **PostgreSQL** database information.

_You can deploy a simple **PostgreSQL** instance using **Docker** to run the following example._
_Example of running PostgreSQL with Docker._
> docker pull pgvector/pgvector:pg17
> 
> docker run -dit --name postgresql -p 5432:5432 -e POSTGRES_PASSWORD=123456 -e LANG=C.UTF-8 pgvector/pgvector:pg17

In [9]:
db_name = "vector_db"
conn = psycopg2.connect(POSTGRESQL_URI)
conn.autocommit = True

with conn.cursor() as c:
    c.execute(f"DROP DATABASE IF EXISTS {db_name}")
    c.execute(f"CREATE DATABASE {db_name}")
    c.execute("CREATE EXTENSION vector")

Initialize the vector store, generate the index, and create the search engine.

In [10]:
url = make_url(POSTGRESQL_URI)
vector_store = PGVectorStore.from_params(
    database=db_name,
    host=url.host,
    password=url.password,
    port=url.port,
    user=url.username,
    table_name="paul_graham_essay",
    embed_dim=1536,  # openai embedding dimension
    hnsw_kwargs={
        "hnsw_m": 16,
        "hnsw_ef_construction": 64,
        "hnsw_ef_search": 40,
        "hnsw_dist_method": "vector_cosine_ops",
    },
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    [li_document], storage_context=storage_context, show_progress=True
)
query_engine = index.as_query_engine()

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/8 [00:00<?, ?it/s]

You can query the search engine.

In [11]:
print(index.as_query_engine().query("When was the Third Plenary Session of the Communist Party of China held?"))

The Third Plenum of the Chinese Communist Party was scheduled to be held from July 15 to 18.
