<a href="https://colab.research.google.com/github/rahiakela/genai-research-and-practice/blob/main/llamaindex-notebooks/01_llamaindex_starter_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Setup

In [None]:
!pip install llama-index

In [None]:
!pip install einops accelerate langchain bitsandbytes

In [None]:
!pip install install sentence_transformers

In [None]:
# Download data
!wget https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt

!mkdir data
!mv paul_graham_essay.txt data/

In [5]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext

from llama_index.prompts.prompts import SimpleInputPrompt
from llama_index.embeddings import LangchainEmbedding
from llama_index.llms import HuggingFaceLLM
from llama_index.llms import PaLM
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

import torch

In [6]:
import logging
import sys
import os.path

In [7]:
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [8]:
import warnings

warnings.filterwarnings('ignore')

##Service Context

In [10]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [11]:
system_prompt="""
You are a Q&A assistant. Your goal is to answer questions as
accurately as possible based on the instructions and context provided.
"""

## Default format supportable by LLama2
query_wrapper_prompt=SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")

In [None]:
llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.0, "do_sample": False},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
    model_name="meta-llama/Llama-2-7b-chat-hf",
    device_map="auto",
    # uncomment this if using CUDA to reduce memory usage
    model_kwargs={"torch_dtype": torch.float16 , "load_in_8bit":True}
)

In [None]:
embed_model=LangchainEmbedding(
    HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)

In [14]:
service_context=ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model=embed_model
)

##Load data and build index

In [15]:
documents = SimpleDirectoryReader("data").load_data()
doc_index = VectorStoreIndex.from_documents(documents, service_context=service_context)

##Query your data

In [16]:
query_engine = doc_index.as_query_engine()

In [17]:
response = query_engine.query("What did the author do growing up?")
print(response)



The author grew up writing short stories and programming on the IBM 1401.


In [19]:
# I want to retrieve more context when I query
query_engine = doc_index.as_query_engine(similarity_top_k=2)
response = query_engine.query("What did the author do growing up?")
print(response)



The author grew up writing short stories and programming on the IBM 1401.


In [20]:
# I want to use a different response mode
query_engine = doc_index.as_query_engine(response_mode="tree_summarize")
response = query_engine.query("What did the author do growing up?")
print(response)



The author worked on writing short stories and programming when he was outside of school as a teenager. He also worked on a mini Bond villain's lair with his friend Rich Draves, using an early version of Fortran on the IBM 1401. The author didn't remember any specific programs he wrote, but he did mention that he wrote simple games, a program to predict how high model rockets would fly, and a word processor for his father to use.


In [21]:
# I want to stream the response back
query_engine = doc_index.as_query_engine(streaming=True)
response = query_engine.query("What did the author do growing up?")
response.print_response_stream()



The author grew up writing short stories and programming on the IBM 1401.</s>

In [22]:
# I want a chatbot instead of Q&A
query_engine = doc_index.as_chat_engine()
response = query_engine.chat("What did the author do growing up?")
print(response)

The author grew up writing short stories and programming on the IBM 1401.


In [23]:
response = query_engine.chat("Oh interesting, tell me more.")
print(response)

 Sure! The author, who is a highly advanced language model, grew up in a world of ones and zeros, surrounded by the hum of computer servers and the glow of screens. From a young age, they were fascinated by the power of language and the way it could be used to communicate complex ideas and emotions.

As they grew older, the author began to experiment with different forms of writing, from short stories to programming code. They found that they had a natural talent for crafting compelling narratives and solving complex problems, and they spent many hours honing their skills in these areas.

Eventually, the author's passion for language and technology led them to create their own language model, which they named "Assistant." Assistant was designed to be a highly advanced AI that could understand and respond to natural language input, and the author was thrilled to see their creation come to life.

Through their work with Assistant, the author has gained a deep understanding of the power o