#**1: Install All the Required Packages**

In [1]:
!pip -q install langchain
!pip -q install bitsandbytes accelerate transformers
!pip -q install datasets loralib sentencepiece
!pip -q install pypdf
!pip -q install sentence_transformers
!pip -q install -U langchain-community

In [2]:
!pip -q install unstructured
!pip -q install tokenizers
!pip -q install chromadb

#**2: Import All the Required Libraries**

In [3]:
from langchain.document_loaders import UnstructuredURLLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.vectorstores import Chroma
import chromadb
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.embeddings import HuggingFaceEmbeddings
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
from langchain import HuggingFacePipeline
from huggingface_hub import notebook_login
import textwrap
import sys
import os
import torch

In [4]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

#**3: Pass the URLs and extract the data from these URLs**

In [5]:
URLs=[
    'https://blog.gopenai.com/paper-review-llama-2-open-foundation-and-fine-tuned-chat-models-23e539522acb',
    'https://www.mosaicml.com/blog/mpt-7b',
    'https://stability.ai/blog/stability-ai-launches-the-first-of-its-stablelm-suite-of-language-models',
    'https://lmsys.org/blog/2023-03-30-vicuna/'

]

In [6]:
loaders = UnstructuredURLLoader(urls=URLs)
data = loaders.load()

In [7]:
data[0]

Document(metadata={'source': 'https://blog.gopenai.com/paper-review-llama-2-open-foundation-and-fine-tuned-chat-models-23e539522acb'}, page_content='Open in app\n\nSign up\n\nSign in\n\nWrite\n\nSign up\n\nSign in\n\nGoPenAI\n\nFollow publication\n\nGoPenAI\n\nWhere the ChatGPT community comes together to share insights and stories.\n\nFollow publication\n\nPaper Review\n\nPaper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models\n\nLlama 2: one of the best open source models\n\nAndrew Lukyanenko\n\nGoPenAI\n\nAndrew Lukyanenko\n\nFollow\n\nPublished in\n\nGoPenAI\n\n15 min read\n\nJul 20, 2023\n\n--\n\nProject link\n\nModel link\n\nPaper link\n\nThe authors of the work present Llama 2, an assortment of pretrained and fine-tuned large language models (LLMs) with sizes varying from 7 billion to 70 billion parameters. The fine-tuned versions, named Llama 2-Chat, are specifically designed for dialogue applications. These models surpass the performance of existing open-source chat 

In [8]:
len(data)

4

#**4: Split the Text into Chunks**

In [9]:
text_splitter=CharacterTextSplitter(chunk_size=1000,chunk_overlap=200)

In [10]:
text_chunks=text_splitter.split_documents(data)

In [11]:
text_chunks[0]

Document(metadata={'source': 'https://blog.gopenai.com/paper-review-llama-2-open-foundation-and-fine-tuned-chat-models-23e539522acb'}, page_content='Open in app\n\nSign up\n\nSign in\n\nWrite\n\nSign up\n\nSign in\n\nGoPenAI\n\nFollow publication\n\nGoPenAI\n\nWhere the ChatGPT community comes together to share insights and stories.\n\nFollow publication\n\nPaper Review\n\nPaper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models\n\nLlama 2: one of the best open source models\n\nAndrew Lukyanenko\n\nGoPenAI\n\nAndrew Lukyanenko\n\nFollow\n\nPublished in\n\nGoPenAI\n\n15 min read\n\nJul 20, 2023\n\n--\n\nProject link\n\nModel link\n\nPaper link')

In [12]:
text_chunks[1]

Document(metadata={'source': 'https://blog.gopenai.com/paper-review-llama-2-open-foundation-and-fine-tuned-chat-models-23e539522acb'}, page_content='Llama 2: one of the best open source models\n\nAndrew Lukyanenko\n\nGoPenAI\n\nAndrew Lukyanenko\n\nFollow\n\nPublished in\n\nGoPenAI\n\n15 min read\n\nJul 20, 2023\n\n--\n\nProject link\n\nModel link\n\nPaper link\n\nThe authors of the work present Llama 2, an assortment of pretrained and fine-tuned large language models (LLMs) with sizes varying from 7 billion to 70 billion parameters. The fine-tuned versions, named Llama 2-Chat, are specifically designed for dialogue applications. These models surpass the performance of existing open-source chat models on most benchmarks, and according to human evaluations for usefulness and safety, they could potentially replace closed-source models. The authors also detail their approach to fine-tuning and safety enhancements for Llama 2-Chat to support the community in further developing and responsi

#**5: Download the Hugging Face Embeddings**

In [13]:
#embeddings=HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
embeddings=HuggingFaceEmbeddings()

  embeddings=HuggingFaceEmbeddings()
  embeddings=HuggingFaceEmbeddings()
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [14]:
embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='sentence-transformers/all-mpnet-base-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [15]:
query_result = embeddings.embed_query("Who are you?")
len(query_result)

768

#**06: Convert the Text Chunks into Embeddings and Create a Knowledge Base**

In [16]:
vectorstore = Chroma.from_documents(documents=text_chunks, embedding=embeddings, persist_directory="./chroma_db")
vectorstore.persist()

  vectorstore.persist()


#**07: Create a Large Language Model (LLM) Wrapper - Llama**

In [17]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [18]:
#model1 = "meta-llama/Llama-2-7b-chat-hf"
model_name = "meta-llama/Llama-3.2-1B"

In [19]:
tokenizer = AutoTokenizer.from_pretrained(model_name,
                                          use_auth_token=True)




In [20]:
pipe = pipeline("text-generation",
                model=model_name,
                tokenizer= tokenizer,
                torch_dtype=torch.bfloat16,
                device_map="auto",
                max_new_tokens = 512,
                do_sample=True,
                top_k=30,
                num_return_sequences=1,
                eos_token_id=tokenizer.eos_token_id
                )

Device set to use cpu


In [21]:
llm=HuggingFacePipeline(pipeline=pipe, model_kwargs={'temperature':0})

  llm=HuggingFacePipeline(pipeline=pipe, model_kwargs={'temperature':0})


In [22]:
llm.predict("Please provide a concise summary of the anime One Piece")

  llm.predict("Please provide a concise summary of the anime One Piece")
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


'Please provide a concise summary of the anime One Piece: Stampede in 150 words or less.\nThe story revolves around Luffy, the Straw Hat Pirates, and their quest to become the Pirate King. They encounter various challenges and obstacles along the way, and the stakes are high as they fight to save the world from the evil forces of the Pirate King. The anime is a highly anticipated series, and fans are eagerly waiting for its release.\nWhat is the plot of One Piece: Stampede?\nThe plot of One Piece: Stampede is a highly anticipated anime series that is set to release on October 6, 2023. The plot revolves around the Straw Hat Pirates, a group of adventurers who seek to become the Pirate King. They face various challenges and obstacles along the way, and the stakes are high as they fight to save the world from the evil forces of the Pirate King. The anime is a highly anticipated series, and fans are eagerly waiting for its release.\nWhat is the release date of One Piece: Stampede?\nThe rel

#**08: Initialize the Retrieval QA with Source Chain**

In [23]:
from langchain.chains import RetrievalQA

In [24]:
query = "How good is Vicuna?"

In [25]:
docs = vectorstore.similarity_search(query, k=3)

In [26]:
docs

[Document(metadata={'source': 'https://lmsys.org/blog/2023-03-30-vicuna/'}, page_content='Vicuna (generated by stable diffusion 2.1)\n\nAccording to a fun and non-scientific evaluation with GPT-4. Further rigorous evaluation is needed.\n\nHow Good is Vicuna?\n\nAfter fine-tuning Vicuna with 70K user-shared ChatGPT conversations, we discover that Vicuna becomes capable of generating more detailed and well-structured answers compared to Alpaca (see examples below), with the quality on par with ChatGPT.'),
 Document(metadata={'source': 'https://lmsys.org/blog/2023-03-30-vicuna/'}, page_content='Vicuna (generated by stable diffusion 2.1)\n\nAccording to a fun and non-scientific evaluation with GPT-4. Further rigorous evaluation is needed.\n\nHow Good is Vicuna?\n\nAfter fine-tuning Vicuna with 70K user-shared ChatGPT conversations, we discover that Vicuna becomes capable of generating more detailed and well-structured answers compared to Alpaca (see examples below), with the quality on par

In [27]:
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever())

In [28]:
query = "How good is Vicuna?"
qa.run(query)

  qa.run(query)
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nVicuna (generated by stable diffusion 2.1)\n\nAccording to a fun and non-scientific evaluation with GPT-4. Further rigorous evaluation is needed.\n\nHow Good is Vicuna?\n\nAfter fine-tuning Vicuna with 70K user-shared ChatGPT conversations, we discover that Vicuna becomes capable of generating more detailed and well-structured answers compared to Alpaca (see examples below), with the quality on par with ChatGPT.\n\nVicuna (generated by stable diffusion 2.1)\n\nAccording to a fun and non-scientific evaluation with GPT-4. Further rigorous evaluation is needed.\n\nHow Good is Vicuna?\n\nAfter fine-tuning Vicuna with 70K user-shared ChatGPT conversations, we discover that Vicuna becomes capable of generating more detailed and well-structured answers compared to Alpaca (see examples below), with the quality on par with ChatGPT

In [29]:
query = "How does Llama 2 outperforms other models"
qa.run(query)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


'Use the following pieces of context to answer the question at the end. If you don\'t know the answer, just say that you don\'t know, don\'t try to make up an answer.\n\nThe 70 billion-parameter Llama 2 model notably improves results on the MMLU and BBH benchmarks by roughly 5 and 8 points, respectively, when compared to the 65 billion-parameter Llama 1 model.\n\nLlama 2 models with 7 billion and 30 billion parameters outdo MPT models of similar size in all categories except code benchmarks.\n\nIn comparison with Falcon models, Llama 2’s 7 billion and 34 billion parameter models outperform the 7 billion and 40 billion parameter Falcon models in all benchmark categories.\n\nMoreover, the Llama 2 70B model surpasses all open-source models.\n\nComparatively, the Llama 2 70B model performs similarly to the closed-source GPT-3.5 (OpenAI, 2023) on the MMLU and GSM8K benchmarks but shows a significant deficit on coding benchmarks. It matches or exceeds the performance of PaLM (540 billion par

In [30]:
while True:
  user_input = input(f"Input Prompt: ")
  if user_input == 'exit':
    print('Exiting')
    break
  if user_input == '':
    continue
  result = qa({'query': user_input})
  print(f"Answer: {result['result']}")

Input Prompt: What is Stable LM


  result = qa({'query': user_input})
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Answer: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Stability AI Launches the First of its Stable LM Suite of Language Models

Product

Apr 19

Written By Joshua Lopez

Today, Stability AI released a new open source language model, Stable LM. The Alpha version of the model is available in 3 billion and 7 billion parameters, with 15 billion to 65 billion parameter models to follow. Developers can freely inspect, use, and adapt our Stable LM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license.

In 2022, Stability AI drove the public release of Stable Diffusion, a revolutionary image model representing a transparent, open, and scalable alternative to proprietary AI. With the launch of the Stable LM suite of models, Stability AI is continuing to make foundational AI technology accessible to all. Our Stable LM models can genera