# Installing dependencies

In [7]:
!pip install langchain
!pip install -U langchain-community
!pip install chromadb
!pip install pypdf
!pip install -qU langchain-cohere



In [8]:
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
# from langchain_cohere import ChatCohere

# Loading and Splitting Documents

In [9]:
# In case you wish to load several PDFs together
# loaders = [
#    PyPDFLoader('path_to_doc1.pdf'),
#    PyPDFLoader('path_to_doc2.pdf'),
#    PyPDFLoader('path_to_doc3.pdf'),
#]
# docs = []
# for loader in loaders:
#    docs.extend(loader.load())
loader = PyPDFLoader("/content/Generative-AI-and-LLMs-for-Dummies.pdf")
pages = loader.load()
page = pages[25]
print("Content:",page.page_content[0:500])
print("Metadata:",page.metadata)

Content: 20      Generative AI and LLMs For Dummies, Snowflake Special Edition
These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Next, determine what proprietary data you will use to custom -
ize or contextualize the model effectively. Many foundation LLMs 
contain massive amounts of information learned from the Inter -
net, which gives the models their knowledge of language as well 
as many aspects of the world around us. More
Metadata: {'source': '/content/Generative-AI-and-LLMs-for-Dummies.pdf', 'page': 25}


In [10]:
r_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = r_splitter.split_documents(pages)

In [11]:
print(len(docs))
print(docs[100])

127
page_content='» Web apps are the most common type of front end for gen AI 
apps because they’re relatively easy to develop and can be 
accessed from any device with a web browser.
 » Mobile apps tailored to specific devices, such as tablets and 
smartphones, offer a more immersive and engaging 
experience for users. They can take advantage of the unique 
aspects of each platform and can cache data for offline use.' metadata={'source': '/content/Generative-AI-and-LLMs-for-Dummies.pdf', 'page': 38}


# Vector DB

In [12]:
vectordb = Chroma.from_documents(
    documents = docs,
    embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"),
    persist_directory = "/content/chroma/"
)

  embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"),


In [13]:
print(vectordb._collection.count()==len(docs))

False


In [14]:
question = "How can we select the right llm for our use case?"

# fetch-k -> number of documents fetched inititially using vector similarity search
# k -> number of documents output after applying the MMR Algorithm
relevent_points = vectordb.max_marginal_relevance_search(question,k=2, fetch_k=3)
print(relevent_points[0])

page_content='data and user experience requirements.
Selecting the right LLM
Many different language models are available for public use 
(for more on this, see Chapter  2). Hosted LLMs such as Chat-
GPT and Bard are provided as a service that anybody can access, 
via a user interface or via APIs. This makes interacting with the 
LLMs very easy because there’s no overhead to host and scale 
the  infrastructure where it runs. Open-source LLMs like Llama 
are freely available for download and modification, and you can 
deploy them in your own environment. Although this gives you 
more control, it also requires you to set up and maintain the 
underlying infrastructure. Not sending data to an external envi -
ronment and having more control over the model may be of high 
importance for sensitive data, but the additional control puts the 
compute infrastructure management in your hands.
FIGURE 3-1: The gen AI project lifecycle and its players at a glance.' metadata={'page': 25, 'source': '/c

# Setting up LLM

In [15]:
question = "How can we select the right llm for our use case?"

In [16]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-1.5B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

context = vectordb.max_marginal_relevance_search(question,k=2, fetch_k=3)

prompt = f"""
You are a helpful assistant. You use the provided context and your knowledge base to answer the question.
context = {context}
question = {question}
answer =
"""
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

To select the right Language Model (LLM) for your use case, consider both the computational resources required and the trade-offs between model size and accuracy. Small LLMs typically have fewer parameters, which means lower resource consumption and deployment time, making them suitable for running specific tasks in a cost-effective manner. However, larger LLMs offer greater flexibility and the ability to handle more complex scenarios, although they require more training resources.

When selecting an LLM, it's important to balance these factors against the specific demands of your application. For instance:

1. **Resource Constraints**: If your system has limited computing power or storage capacity, smaller LLMs might be preferable due to their lighter footprint.
   
2. **Scalability Needs**: Larger LLMs are ideal if scalability and adaptability are critical, especially when dealing with diverse datasets or complex interactions.

3. **Accuracy Requirements**: Depending on the precision