<a href="https://colab.research.google.com/github/rajatguptakgp/generative_ai/blob/main/llama_index_2_querying_custom_BLOOM_1B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install llama_index torch transformers
!pip install accelerate pypdf
!pip install sentence_transformers



In [None]:
import torch
from llama_index.llms import HuggingFaceLLM
from llama_index.prompts.prompts import SimpleInputPrompt

# This will wrap the default prompts that are internal to llama-index
# taken from https://huggingface.co/Writer/camel-5b-hf
query_wrapper_prompt = SimpleInputPrompt(
    "Below is an instruction that describes a task. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n{query_str}\n\n### Response:"
)

llm = HuggingFaceLLM(
    context_window=2048,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.25, "do_sample": False},
    tokenizer_name="bigscience/bloom-1b7",
    model_name="bigscience/bloom-1b7",
    device_map="auto",
    tokenizer_kwargs={"max_length": 2048},
    # uncomment this if using CUDA to reduce memory usage
    # model_kwargs={"torch_dtype": torch.float16, "offload_folder":"offload"}
)

Some weights of BloomForCausalLM were not initialized from the model checkpoint at bigscience/bloom-1b7 and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index import LangchainEmbedding, ServiceContext
from llama_index import StorageContext, load_index_from_storage
from langchain.embeddings import HuggingFaceBgeEmbeddings

embed_model = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-base-en")

In [None]:
# put your documents (TXT, PDF) in this folder
data_folder = 'data/'

# Load documents from a directory
documents = SimpleDirectoryReader(data_folder).load_data()

service_context = ServiceContext.from_defaults(llm= llm, embed_model=embed_model)

# Create an index from the documents
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

In [None]:
# saving index to disk
persist_dir = './llamaindex'
index.storage_context.persist(persist_dir=persist_dir)

In [None]:
# Create a query engine from the index
query_engine = index.as_query_engine()

# Query the engine
response = query_engine.query("How is Rakesh Agrawal related to Association Rules?")

In [None]:
print(response.response)

 Rakesh Agrawal is a software engineer and a researcher at IBM. He is also a member of the
IBM Watson Research Institute. He is also a member of the IBM Watson Research Institute.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: How is Rakesh Agrawal related to Association Rules?
Answer:  Rakesh Agrawal is a software engineer and a researcher at IBM. He is also a member of the
IBM Watson Research Institute. He is also a member of the IBM Watson Research Institute.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: How is Rakesh Agrawal related to Association Rules?
Answer:  Rakesh Agrawal is a software engineer and a researcher at IBM. He is also a member of the
IBM Watson Research Institute. He is also a member of the IBM Watson Research Institute.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: How is Rakesh Agrawal related to A