# Database Setup for RAG 

Setting up a data store for a Retrieval-Augmented Generation (RAG) system involves some unique considerations. The most important is whether the database can store vector embeddings. The second is whether it supports native vector similarity search, or if that functionality must be handled by a separate service or module.

For this project, I've selected PostrgreSQL because it:
- Offers native vector storage and search with the `pgvector` extension 
- Meets developers where they already are — Postgres is widely adopted and familiar
- Preserves the strengths of a relational database, such as flexible querying, indexing, and data integrity
    -  In contrast, vector-native solutions like FAISS or Pinecone are powerful for search, but lack the full querying capabilities of traditional databases

In [None]:
-- create a database 
CREATE DATABASE resume_rag;

Now that we have a basic database created, we need to enable `pgvector` if we want to be able to store and query vectors. 

In [None]:
-- enable pgvector extrension
CREATE EXTENSION vector; 

-- enable a uuid extension 
CREATE EXTENSION "uuid-ossp";

Next, define the columns and data type we want present in our embeddings table

In [None]:
CREATE TABLE content_embeddings (
    uid UUID PRIMARY KEY DEFAULT uuid_generate_v4(), 
    document_id TEXT,
    tags TEXT[],  
    clean_text TEXT,
    embedding VECTOR(384)
);