https://docs.llamaindex.ai/en/stable/examples/low_level/oss_ingestion_retrieval.html

- Sentence Transformers as the embedding model

- Postgres as the vector store (we support many other vector stores too!)

- Llama 2 as the LLM (through llama.cpp)

In [1]:
import configparser

config = configparser.ConfigParser()
config.read('../env/pinecone.conf')

api_key = config["DEFAULT"]["PINECONE_API_KEY"]
environment = config["DEFAULT"]["PINECONE_ENVIRONMENT"]
openai_api_key = config["DEFAULT"]["OPENAI_API_KEY"]

In [2]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.embeddings.openai import OpenAIEmbedding   # https://docs.llamaindex.ai/en/latest/examples/embeddings/OpenAI.html

# huggingface
#embed_model = HuggingFaceEmbedding(model_name="snunlp/KR-SBERT-V40K-klueNLI-augSTS")
#embed_model = HuggingFaceEmbedding(model_name="kakaobank/kf-deberta-base")

# openai
import os
os.environ["OPENAI_API_KEY"] = openai_api_key

#embed_model = OpenAIEmbedding(model="text-embedding-3-small")




In [3]:
from llama_index.llms.llama_cpp import LlamaCPP

#model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf"
model_url = "https://huggingface.co/TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF/resolve/main/solar-10.7b-instruct-v1.0.Q4_0.gguf"
#model_url = "https://huggingface.co/davidkim205/komt-mistral-7b-v1-gguf/resolve/main/ggml-model-q4_0.gguf"
#model_url = "https://huggingface.co/davidkim205/komt-Llama-2-13b-hf-ggml/resolve/main/ggml-model-q4_0.bin"  # gguf_init_from_file: invalid magic characters 'tjgg'
#model_url = "https://huggingface.co/StarFox7/Llama-2-ko-7B-chat-gguf/resolve/main/Llama-2-ko-7B-chat-gguf-q4_0.bin" # gguf_init_from_file: GGUFv1 is no longer supported. please use a more up-to-date version

llm = LlamaCPP(
    # You can pass in the URL to a GGUF model to download it automatically
    model_url=model_url,
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path=None,
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 1},
    verbose=True,
)

llama_model_loader: loaded meta data with 24 key-value pairs and 435 tensors from /Users/parkhyerin/Library/Caches/llama_index/models/solar-10.7b-instruct-v1.0.Q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 48
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - k

### Initialize Postgres

In [4]:
import psycopg2

db_name = "vector_db"
host = "localhost"
password = "1234"
port = "5432"
user = "local"

conn = psycopg2.connect(
    dbname="postgres",
    host=host,
    password=password,
    port=port,
    user=user,
)
conn.autocommit = True

with conn.cursor() as c:
    c.execute(f"DROP DATABASE IF EXISTS {db_name}")
    c.execute(f"CREATE DATABASE {db_name}")

In [5]:
from sqlalchemy import make_url
from llama_index.vector_stores.postgres import PGVectorStore

vector_store = PGVectorStore.from_params(
    database=db_name,
    host=host,
    password=password,
    port=port,
    user=user,
    table_name="table",
    #embed_dim=384,  # openai embedding dimension
    #embed_dim=768,  # mistral embedding dimension
    embed_dim=1536,  # openai embedding dimension
)

### Build an Ingestion Pipeline from Scratch

#### 1. Load Data

In [6]:
from pathlib import Path
from llama_index.readers.file import PyMuPDFReader

In [7]:
loader = PyMuPDFReader()
documents = loader.load(file_path="../data/sample.pdf")

#### 2. Use a Text Splitter to Split Documents

In [8]:
from llama_index.core.node_parser import SentenceSplitter

In [9]:
text_parser = SentenceSplitter(
    chunk_size=1024,
    # separator=" ",
)

In [10]:
text_chunks = []
# maintain relationship with source doc index, to help inject doc metadata in (3)
doc_idxs = []
for doc_idx, doc in enumerate(documents):
    cur_text_chunks = text_parser.split_text(doc.text)
    text_chunks.extend(cur_text_chunks)
    doc_idxs.extend([doc_idx] * len(cur_text_chunks))

#### 3. Manually Construct Nodes from Text Chunks

In [11]:
from llama_index.core.schema import TextNode

nodes = []
for idx, text_chunk in enumerate(text_chunks):
    node = TextNode(
        text=text_chunk,
    )
    src_doc = documents[doc_idxs[idx]]
    node.metadata = src_doc.metadata
    nodes.append(node)

#### 4. Generate Embeddings for each Node

In [12]:
for node in nodes:
    node_embedding = embed_model.get_text_embedding(
        node.get_content(metadata_mode="all")
    )
    node.embedding = node_embedding

#### 5. Load Nodes into a Vector Store

In [13]:
vector_store.add(nodes)

['f297e8c8-c7e4-4d40-98b6-726dcffb333f',
 '1ddeeee7-79c1-40af-a04d-1802326f7afe',
 '5d4e0c79-c614-429c-8333-654c9c186856',
 'a37e8bed-0901-443c-a420-2c8040c5700d',
 '2d4c217c-c444-46a3-82c7-8d54711e9b77',
 '9d2161ff-d571-4026-8ab3-6fa314c47924',
 '88079323-a289-47af-9656-221c872fe2b3',
 '4a53c00f-22e9-4846-b50c-6bfb17fab466',
 '364a3e03-b81d-4dab-aea9-cadb34c0b899',
 '10f3f2f4-c07f-4426-9231-b3ae81bede19',
 'abb5c02f-8329-4b9e-ba88-058b5cfade40',
 'bfd4da67-76bd-4e3a-bfb8-e3b72313a56d',
 'f22517ba-f163-4a1e-8a41-a469f9e17f32',
 '0a68ce91-7dec-485d-a25b-8db60fcac275',
 'cf7c1533-e84d-426c-82b0-ff92514eaec3',
 '5f617d1f-0609-4173-8dea-cfc996ef5373',
 '7406f12c-e521-407e-8c6b-54f0ad67fcd9',
 '3f287dfe-6c57-42db-a437-959effabd8ce',
 'b0e18dd2-f7e8-4a09-86ff-8cb8d3827d78',
 'd1d1dcc2-f445-4d9b-9c74-2ac800689b21',
 '4929561d-337c-49c4-87b9-391d9a11e891',
 'c5901f46-eef5-4cd3-b33c-93525689cf27',
 '2f67fb8f-0ff7-4a1c-93c8-315ca7c82321',
 '4fa1b667-cfe9-4cff-beaa-5338e44b786b']

### Build Retrieval Pipeline from Scratch

In [14]:
query_str = "Can you tell me about the key concepts for safety finetuning"

#### 1. Generate a Query Embedding

In [15]:
query_embedding = embed_model.get_query_embedding(query_str)

#### 2. Query the Vector Database

In [16]:
# construct vector store query
from llama_index.core.vector_stores import VectorStoreQuery

query_mode = "default"
# query_mode = "sparse"
# query_mode = "hybrid"

vector_store_query = VectorStoreQuery(
    query_embedding=query_embedding, similarity_top_k=2, mode=query_mode
)

In [17]:
# returns a VectorStoreQueryResult
query_result = vector_store.query(vector_store_query)
print(query_result.nodes[0].get_content())

80 [대학 교수-학습 연구] 제16권-3호(2023.9.30.)
*p<0.05, **p<0.01 ,***p<0.01
*p<0.05, **p<0.01 ,***p<0.01


#### 3. Parse Result into a Set of Nodes

In [18]:
from llama_index.core.schema import NodeWithScore
from typing import Optional

nodes_with_scores = []
for index, node in enumerate(query_result.nodes):
    score: Optional[float] = None
    if query_result.similarities is not None:
        score = query_result.similarities[index]
    nodes_with_scores.append(NodeWithScore(node=node, score=score))

### 4. Put into a Retriever

In [19]:
from llama_index.core import QueryBundle
from llama_index.core.retrievers import BaseRetriever
from typing import Any, List


class VectorDBRetriever(BaseRetriever):
    """Retriever over a postgres vector store."""

    def __init__(
        self,
        vector_store: PGVectorStore,
        embed_model: Any,
        query_mode: str = "default",
        similarity_top_k: int = 2,
    ) -> None:
        """Init params."""
        self._vector_store = vector_store
        self._embed_model = embed_model
        self._query_mode = query_mode
        self._similarity_top_k = similarity_top_k
        super().__init__()

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve."""
        query_embedding = embed_model.get_query_embedding(
            query_bundle.query_str
        )
        vector_store_query = VectorStoreQuery(
            query_embedding=query_embedding,
            similarity_top_k=self._similarity_top_k,
            mode=self._query_mode,
        )
        query_result = vector_store.query(vector_store_query)

        nodes_with_scores = []
        for index, node in enumerate(query_result.nodes):
            score: Optional[float] = None
            if query_result.similarities is not None:
                score = query_result.similarities[index]
            nodes_with_scores.append(NodeWithScore(node=node, score=score))

        return nodes_with_scores

In [20]:
retriever = VectorDBRetriever(
    vector_store, embed_model, query_mode="default", similarity_top_k=2
)

### Plug this into our RetrieverQueryEngine to synthesize a response

In [21]:
from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(retriever, llm=llm)

In [22]:
#query_str = "How does Llama 2 perform compared to other open-source models?"
query_str = "Chat GPT 활용 수업을 통한 대학생의 생성형 AI 에 대한 인식이 어떤가요 ?"

response = query_engine.query(query_str)


llama_print_timings:        load time =   15647.51 ms
llama_print_timings:      sample time =      11.23 ms /   137 runs   (    0.08 ms per token, 12196.21 tokens per second)
llama_print_timings: prompt eval time =   15646.46 ms /   229 tokens (   68.33 ms per token,    14.64 tokens per second)
llama_print_timings:        eval time =    8743.70 ms /   136 runs   (   64.29 ms per token,    15.55 tokens per second)
llama_print_timings:       total time =   24564.18 ms /   365 tokens


In [23]:
print(str(response))

24 page 된  'sample.pdf'  (source: 3 or source: 23) 에 따르면, Chat GPT를 활용한 수업은 대학생들의 생성형 AI에 대한 인식과 자기주도학습 역량이 변화했음. 

Specifically, the recognition and self-directed learning capacity of generative AI among university students improved from 73% (source: 3) to 93% (source: 23) after the Chat GPT-utilizing class.


In [24]:
print(response.source_nodes[0].get_content())

Chat GPT 활용 수업을 통한 대학생의 생성형 AI에 대한 인식 및 자기주도학습 역량의 변화 73
