### Quick intro to LlamaIndex  
Sources: [1](https://lmy.medium.com/comparing-langchain-and-llamaindex-with-4-tasks-2970140edf33), [2](https://docs.llamaindex.ai/en/stable/), [3](https://github.com/run-llama/llama_index), [4](https://nanonets.com/blog/llamaindex/)  

LlamaIndex is a "data framework" to help you build LLM apps. It provides the following tools:

+ Offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.).
+ Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.
+ Provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
+ Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).
+ LlamaIndex provides tools for both beginner users and advanced users.  

The high-level API allows beginner users to use LlamaIndex to ingest and query their data in 5 lines of code.  
The lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules), to fit their needs.  

LlamaIndex provides the following tools:
+ Data connectors ingest your existing data from their native source and format. These could be APIs, PDFs, SQL, and (much) more.
+ Data indexes structure your data in intermediate representations that are easy and performant for LLMs to consume.
+ Engines provide natural language access to your data. For example:
+ Query engines are powerful retrieval interfaces for knowledge-augmented output.
+ Chat engines are conversational interfaces for multi-message, “back and forth” interactions with your data.
+ Data agents are LLM-powered knowledge workers augmented by tools, from simple helper functions to API integrations and more.
+ Application integrations tie LlamaIndex back into the rest of your ecosystem. This could be LangChain, Flask, Docker, ChatGPT, or… anything else!  

In [0]:
!pip install -q openai==0.27.0
#!pip install -qU llama-index            # Just the core components
!pip install llama-index[local_models] # Installs tools useful for private LLMs, local inference, and HuggingFace models
#!pip install llama-index[postgres]     # Is useful if you are working with Postgres, PGVector or Supabase
#!pip install llama-index[query_tools]  # Gives you tools for hybrid search, structured outputs, and node post-processing
# !pip install google-generativeai  #PALM
!pip install -qU pypdf
!pip install -qU docx2txt
!pip install -qU sentence-transformers
dbutils.library.restartPython()

In [0]:
import os
import sys
import shutil
import glob
import logging
from pathlib import Path

import warnings
warnings.filterwarnings('ignore')

import pandas as pd
#import tiktoken
#from funcy import lcat, lmap, linvoke
#from IPython.display import display, Markdown
import openai

## LlamaIndex LLMs
#from openai import OpenAI
#from openai import AzureOpenAI
from llama_index.llms import AzureOpenAI
from llama_index.llms import ChatMessage
from llama_index.llms import MessageRole
from llama_index.chat_engine.condense_question import CondenseQuestionChatEngine
#from llama_index.llms import Ollama
#from llama_index.llms import PaLM

## LlamaIndex Embeddings
from llama_index.embeddings import OpenAIEmbedding
from llama_index.embeddings import AzureOpenAIEmbedding
from llama_index.embeddings import resolve_embed_model

## Llamaindex readers 
from llama_index import SimpleDirectoryReader
#from llama_index import ResponseSynthesizer

## LlamaIndex Index Types
#from llama_index import GPTListIndex             
from llama_index import VectorStoreIndex
#from llama_index import GPTVectorStoreIndex  
#from llama_index import GPTTreeIndex
#from llama_index import GPTKeywordTableIndex
#from llama_index import GPTSimpleKeywordTableIndex
#from llama_index import GPTDocumentSummaryIndex
#from llama_index import GPTKnowledgeGraphIndex
#from llama_index.indices.struct_store import GPTPandasIndex
#from llama_index.vector_stores import ChromaVectorStore

## LlamaIndex Context Managers
from llama_index import ServiceContext
from llama_index import StorageContext
from llama_index import load_index_from_storage
from llama_index import set_global_service_context
from llama_index.response_synthesizers import get_response_synthesizer
#from llama_index import LLMPredictor

## LlamaIndex Templates
from llama_index.prompts import PromptTemplate
from llama_index.prompts import ChatPromptTemplate

## LlamaIndex Callbacks
#from llama_index.callbacks import CallbackManager
#from llama_index.callbacks import LlamaDebugHandler


## Defining LLM Model
llm_option = "OpenAI"
if llm_option == "OpenAI":
    openai.api_type = "azure"
    azure_endpoint = "https://rg-rbi-aa-aitest-dsacademy.openai.azure.com/"
    #azure_endpoint = "https://chatgpt-summarization.openai.azure.com/"
    openai.api_version = "2023-07-01-preview"
    openai.api_key = os.environ["OPENAI_API_KEY"]
    deployment_name = "model-gpt-35-turbo"
    openai_model_name = "gpt-35-turbo"
    llm = AzureOpenAI(api_key=openai.api_key,
                      azure_endpoint=azure_endpoint,
                      model=openai_model_name,
                      engine=deployment_name,
                      api_version=openai.api_version,
                      )
elif llm_option == "Local":  #https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html and https://docs.llamaindex.ai/en/stable/module_guides/models/llms/local.html
    print("Make sure you have installed Local Models - !pip install llama-index[local_models]")
    llm = Ollama(model="mistral", request_timeout=30.0)
else:
    raise ValueError("Invalid LLM Model")

## Defining Embedding Model
emb_option = "Local"
if emb_option == "OpenAI":
    embed_model_name = "text-embedding-ada-002"
    embed_model_deployment_name = "model-text-embedding-ada-002"
    embed_model = AzureOpenAIEmbedding(model=embed_model_name,
                                       deployment_name=embed_model_deployment_name,
                                       api_key=openai.api_key,
                                       azure_endpoint=azure_endpoint)
elif emb_option == "Local":
    embed_model = resolve_embed_model("local:BAAI/bge-small-en-v1.5")   ## bge-m3 embedding model
else:
    raise ValueError("Invalid Embedding Model")

## Logging Optionals
#logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
#logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

#### Text Completion Example

In [0]:
resp = llm.complete("Paul Graham is ")
print(resp)

#### Chat Example Example

In [0]:
messages = [ChatMessage(role="system", content="You are a pirate with a colorful personality"),
            ChatMessage(role="user", content="What is your name"),
            ]
resp = llm.chat(messages)
print(resp)

### Quickstart: Implementing a RAG Pipeline:

![](https://docs.llamaindex.ai/en/stable/_images/basic_rag.png)

### Examining Documents Folder

In [0]:
DOCS_DIR = "../../Data/txt/"
docs = os.listdir(DOCS_DIR)
docs = [d for d in docs] # if d.endswith(".txt")]
docs.sort()
for doc in docs:
    print(doc)

#### Setting the Service Context

In [0]:
service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)
set_global_service_context(service_context)

#Testing Service Context
service_context.llm.complete("RBI is a")

### Creating the Vector Store

In [0]:
PERSIST_DIR = "/Workspace/ds-academy-research/LLamaIndex_quick/"

In [0]:
if os.path.exists(PERSIST_DIR):
    shutil.rmtree(PERSIST_DIR)
print(f"Creating Directory {PERSIST_DIR}")
os.mkdir(PERSIST_DIR)

if os.listdir(PERSIST_DIR) == []:
    print("Loading Documents...")
    documents = SimpleDirectoryReader(DOCS_DIR).load_data()
    print("Creating Vector Store...")
    index = VectorStoreIndex.from_documents(documents, service_context=service_context)
    print("Persisting Vector Store...")
    index.storage_context.persist(persist_dir=PERSIST_DIR)

### Reading from existing Vector Store

In [0]:
print("Reading from Vector Store...")
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
index = load_index_from_storage(storage_context)

In [0]:
index.ref_doc_info

### Querying your data  

In [0]:
query_engine = index.as_query_engine(retriever_mode="embedding", response_mode="accumulate", verbose=True)

In [0]:
response = query_engine.query("Who was Romeo?")
print(response)

In [0]:
response = query_engine.query("Who did the proofreading of Pride and Prejudice?")
print(response)

In [0]:
response = query_engine.query("Who is the publisher of Pride and Prejudice?")
print(response)

### Chat with your Data  

[Available Chat Modes](https://docs.llamaindex.ai/en/stable/module_guides/deploying/chat_engines/usage_pattern.html)  
+ best - Turn the query engine into a tool, for use with a ReAct data agent or an OpenAI data agent, depending on what your LLM supports. OpenAI data agents require gpt-3.5-turbo or gpt-4 as they use the function calling API from OpenAI.
+ condense_question - Look at the chat history and re-write the user message to be a query for the index. Return the response after reading the response from the query engine.
+ context - Retrieve nodes from the index using every user message. The retrieved text is inserted into the system prompt, so that the chat engine can either respond naturally or use the context from the query engine.
+ condense_plus_context - A combination of condense_question and context. Look at the chat history and re-write the user message to be a retrieval query for the index. The retrieved text is inserted into the system prompt, so that the chat engine can either respond naturally or use the context from the query engine.
+ simple - A simple chat with the LLM directly, no query engine involved.
+ react - Same as best, but forces a ReAct data agent.
+ openai - Same as best, but forces an OpenAI data agent.

In [0]:
chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)
chat_engine.reset()

In [0]:
response = chat_engine.chat("What did happen to Romeo?")
print(response)
response = chat_engine.chat("When was that?")
print(response)
response = chat_engine.chat("What kind of poison?")
print(response)

### Using  REPL interface  

In [0]:
chat_engine.chat_repl()

#### Using [Prompt Templates](https://docs.llamaindex.ai/en/stable/examples/customization/prompts/completion_prompts.html)    

With Text Completion

In [0]:
text_qa_template_str = (
    "Context information is"
    " below.\n---------------------\n{context_str}\n---------------------\nUsing"
    " both the context information and also using your own knowledge, answer"
    " the question: {query_str}\nIf the context isn't helpful, you can also"
    " answer the question on your own.\n"
)
text_qa_template = PromptTemplate(text_qa_template_str)

refine_template_str = (
    "The original question is as follows: {query_str}\nWe have provided an"
    " existing answer: {existing_answer}\nWe have the opportunity to refine"
    " the existing answer (only if needed) with some more context"
    " below.\n------------\n{context_msg}\n------------\nUsing both the new"
    " context and your own knowledge, update or repeat the existing answer.\n"
)
refine_template = PromptTemplate(refine_template_str)

In [0]:
print(index.as_query_engine().query("Who is Bill Gates?"))

In [0]:
print(index.as_query_engine(text_qa_template=text_qa_template, 
                            refine_template=refine_template).query("Who is Bill Gates?"))

With Chat Engine  

In [0]:
custom_prompt = PromptTemplate(
    """\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.

<Chat History>
{chat_history}

<Follow Up Message>
{question}

<Standalone question>
"""
)

# list of `ChatMessage` objects
custom_chat_history = [
    ChatMessage(role=MessageRole.USER,
                content="Hello assistant, we are having a insightful discussion about two famous romances today.",
                ),
    ChatMessage(role=MessageRole.ASSISTANT, content="Okay, sounds good."),
]

query_engine = index.as_query_engine()
chat_engine = CondenseQuestionChatEngine.from_defaults(query_engine=query_engine,
                                                       condense_question_prompt=custom_prompt,
                                                       chat_history=custom_chat_history,
                                                       verbose=True,)
chat_engine.chat_repl()