Load env file

In [3]:
from dotenv import load_dotenv
import os

load_dotenv()

True

Initialize the model

In [4]:
from torch import cuda, bfloat16
import transformers

In [5]:
import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.current_device())
print(torch.cuda.get_device_name(0))

True
1
0
NVIDIA GeForce RTX 4060 Laptop GPU


In [6]:
device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'
print(device)

cuda:0


In [7]:
# Initialize HF items, need auth token for these
hf_auth  = os.getenv('HASH_KEY')
model_id = 'meta-llama/Meta-Llama-3-8B-Instruct'

model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

# Set quantization configuration to load a large model with less GPU memory
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    llm_int8_enable_fp32_cpu_offload=True  # Enable FP32 CPU offloading
)

# Load the model with quantization configuration
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)
model.eval()
print(f"Model loaded on {device}")

Loading checkpoint shards: 100%|██████████| 4/4 [00:26<00:00,  6.64s/it]


Model loaded on cuda:0


Set up the LLM pipeline

In [8]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [9]:
generate_text = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    return_full_text=True,
    task='text-generation',
    temperature       = 0.1, # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens    = 512, # max number of tokens to generate in the output
    repetition_penalty= 1.1,  # without this output begins repeating
)

In [10]:
res = generate_text("What is a  a well-specified computational problem") 
print(res[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  attn_output = torch.nn.functional.scaled_dot_product_attention(


What is a  a well-specified computational problem? A well-specified computational problem is one that has the following properties:

1. **Input**: The problem takes an input, which can be thought of as a set of data or parameters.
2. **Output**: The problem produces an output, which is a function of the input.
3. **Computational resources**: The problem requires a finite amount of computational resources (e.g., time and memory) to solve.
4. **Well-defined solution**: There exists a unique solution to the problem for any given input.

Examples of well-specified computational problems include:

* Sorting a list of numbers
* Finding the shortest path between two points on a map
* Solving a linear system of equations

On the other hand, a poorly specified computational problem may lack one or more of these properties. For example:

* "Find the most beautiful image" - This problem lacks a clear definition of what makes an image "beautiful", making it difficult to specify a solution.
* "Solv

In [11]:
from langchain.llms import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=generate_text)

  warn_deprecated(


In [27]:
answer = llm(prompt="What is a  a well-specified computational problem")
print(answer)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


What is a  a well-specified computational problem? A well-specified computational problem is one that has a clear definition, and the solution to which can be verified by a computer program. In other words, it is a problem where you can write a program that checks whether a given answer is correct or not.

For example, the problem of determining whether a number is prime is a well-specified computational problem. You can define what it means for a number to be prime (it has no divisors except for 1 and itself), and then write a program that checks whether a given number satisfies this condition.

On the other hand, some problems are not well-specified because they do not have a clear definition, or the solution to which cannot be verified by a computer program. For example, the problem of "write a poem that is beautiful" is not a well-specified computational problem, because there is no clear definition of what makes a poem beautiful, and it would be difficult to write a program that c

Initialize the embedding model

In [12]:
from torch import cuda
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

embed_model_id = 'sentence-transformers/all-MiniLM-L6-v2'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

embed_model = HuggingFaceEmbeddings(
    model_name=embed_model_id,
    model_kwargs={'device': device},
    encode_kwargs={'device': device, 'batch_size': 32}
)



In [13]:
docs = [
    "this is one document",
    "and another document"
]

embeddings = embed_model.embed_documents(docs)

print(f"We have {len(embeddings)} doc embeddings, each with "
      f"a dimensionality of {len(embeddings[0])}.")

We have 2 doc embeddings, each with a dimensionality of 384.


Connect to the pinecone vector DB

In [14]:
'''Connect to pinecone server'''
import os
from pinecone import Pinecone

api_key = os.getenv('PINECONE_API_KEY')
pc = Pinecone(api_key=api_key)

# configure client
pc = Pinecone(api_key=api_key)

In [15]:
from pinecone import ServerlessSpec

cloud  = 'aws'
region = 'us-east-1'

spec = ServerlessSpec(cloud=cloud, region=region)

In [16]:
index_name = 'rag-test'

import time

if index_name not in pc.list_indexes().names():
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=len(embeddings[0]),
        metric='cosine',
        spec=spec
    )
    # wait for index to be initialized
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

In [17]:
# connect to index
index = pc.Index(index_name)
# view index stats
print(index_name)
index.describe_index_stats()

rag-test


{'dimension': 384,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 14772}},
 'total_vector_count': 14772}

Load the data

In [None]:
from langchain.document_loaders import DirectoryLoader, PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

DATA_PATH = 'Data/'

def load_documents():
    loader = DirectoryLoader(DATA_PATH,  glob=f"**/*{'.pdf'}", show_progress=True, loader_cls=PyMuPDFLoader)
    documents = loader.load()
    return documents


def split_text(documents):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=300,
        chunk_overlap=100,
        length_function=len,
        add_start_index=True,
    )
    chunks = text_splitter.split_documents(documents)
    print(f"Split {len(documents)} pages into {len(chunks)} chunks.")

    document = chunks[0]
    print(type(chunks))
    print(type(document))
    print(dir(document))

    return chunks

documents = load_documents()
chunks = split_text(documents)

100%|██████████| 1/1 [00:04<00:00,  4.28s/it]


Split 1312 pages into 14772 chunks.
<class 'list'>
<class 'langchain_core.documents.base.Document'>
['Config', '__abstractmethods__', '__annotations__', '__class__', '__class_vars__', '__config__', '__custom_root_type__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__exclude_fields__', '__fields__', '__fields_set__', '__format__', '__ge__', '__get_validators__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__include_fields__', '__init__', '__init_subclass__', '__iter__', '__json_encoder__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__post_root_validators__', '__pre_root_validators__', '__pretty__', '__private_attributes__', '__reduce__', '__reduce_ex__', '__repr__', '__repr_args__', '__repr_name__', '__repr_str__', '__rich_repr__', '__schema_cache__', '__setattr__', '__setstate__', '__signature__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__try_update_forward_refs__', '__validators__', '_abc_impl', '_calculate_keys', '_copy_

In [None]:
import pandas as pd
data = pd.DataFrame(columns=['page_content', 'metadata', 'type'])
for i in range(len(chunks)):
    data.loc[-1] = [chunks[i].page_content, chunks[i].metadata, chunks[i].type]
    data.index = data.index + 1
    data = data.sort_index()
print(data.head)

<bound method NDFrame.head of                                             page_content  \
0      interpreted as a key, 290–291 \nto the wee has...   
1      van Emde Boas tree, 478 \nVar Œ c, see varianc...   
2      upper bound, 54 \nupper-bound property, 611, 6...   
3      universe, 273, 1155 \nunmatched vertex, 693, 7...   
4      unit upper-triangular matrix, 1216 \nunit vect...   
...                                                  ...   
14767  We acknowledge with gratitude the contribution...   
14768  publisher. \nThe MIT Press would like to thank...   
14769  c \n 2022 Massachusetts Institute of Technolo...   
14770  Thomas H. Cormen \nCharles E. Leiserson \nRona...   
14771        Introduction to Algorithms \nFourth Edition   

                                                metadata      type  
0      {'source': 'Data\!Introduction.to.Algorithms.p...  Document  
1      {'source': 'Data\!Introduction.to.Algorithms.p...  Document  
2      {'source': 'Data\!Introduction.to.A

Update the data into pinecone DB

In [None]:
''' Index the dataset and store it(Processed in batch_size)'''
from tqdm import tqdm

batch_size = 32

for i in tqdm(range(0, int(len(data)), batch_size)):
    i_end = min(len(data), i+batch_size)
    batch = data.iloc[i:i_end]
    ids = [f"{i}" for i, x in batch.iterrows()]
    texts = [x['page_content'] for i, x in batch.iterrows()]
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['page_content'],
         'source': x['metadata']['source'],
         'title': x['metadata']['title']} for i, x in batch.iterrows()
    ]
    index.upsert(vectors=zip(ids, embeds, metadata))

100%|██████████| 462/462 [02:16<00:00,  3.38it/s]


In [18]:
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 14772}},
 'total_vector_count': 14772}

In [19]:
from langchain_pinecone import PineconeVectorStore
vectorstore = PineconeVectorStore(index_name=index_name, embedding=embed_model)

In [20]:
retriever = vectorstore.as_retriever()
result =retriever.invoke("What is a well-specified computational problem")
print(type(retriever))
print(type(result))
print(len(result))
print(result[1])

<class 'langchain_core.vectorstores.VectorStoreRetriever'>
<class 'list'>
4
page_content='theory of NP-completeness. If you can establish a problem as NP-complete, you \nprovide good evidence for its intractability. As an engineer, you would then do \nbetter to spend your time developing an approximation algorithm (see Chapter 35)' metadata={'source': 'Data\\!Introduction.to.Algorithms.pdf', 'title': ''}


Set up the RAG Pipeline

In [21]:
prompt = "What is a well-specified computational problem"
answer = llm(prompt=prompt + retriever.invoke(prompt)[1].page_content)
print(answer)

  warn_deprecated(
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


What is a well-specified computational problemtheory of NP-completeness. If you can establish a problem as NP-complete, you 
provide good evidence for its intractability. As an engineer, you would then do 
better to spend your time developing an approximation algorithm (see Chapter 35) or 
a heuristic search method (see Chapter 36) rather than trying to find an exact 
solution.

NP-completeness is not the only way to show that a problem is hard. There are other 
techniques, such as showing that a problem has no known polynomial-time solution 
or that it is equivalent to another known NP-complete problem. However, NP-
completeness provides a powerful tool for establishing the hardness of many 
problems.

In this chapter, we will explore the theory of NP-completeness and learn how to 
use it to establish the hardness of problems. We will also discuss some of the 
consequences of NP-completeness, including the implications for cryptography and 
the possibility of solving NP-complete probl

In [22]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain import hub

retrieval_qa_chat_prompt = hub.pull("langchain-ai/retrieval-qa-chat")

combine_docs_chain = create_stuff_documents_chain(
    llm, retrieval_qa_chat_prompt
)
rag_pipeline = create_retrieval_chain(retriever, combine_docs_chain)

In [23]:
rag_answer = rag_pipeline.invoke({"input": "What is a  a well-specified computational problem"})

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [24]:
print(rag_answer['answer'])


System: Answer any use questions based solely on the context below:

<context>
The algorithm describes a speciûc computational procedure for achieving that in- 
put/output relationship for all problem instances. 
As an example, suppose that you need to sort a sequence of numbers into mono- 
tonically increasing order. This problem arises frequently in practice and provides

identical, to problems for which we do know of efûcient algorithms. Computer 
scientists are intrigued by how a small change to the problem statement can cause 
a big change to the efûciency of the best known algorithm. 
You should know about NP-complete problems because some of them arise sur-

theory of NP-completeness. If you can establish a problem as NP-complete, you 
provide good evidence for its intractability. As an engineer, you would then do 
better to spend your time developing an approximation algorithm (see Chapter 35)

topics most relevant to you. 
Since most of the algorithms we discuss have great pra