## Setting LLM Model

In [1]:
import nest_asyncio

nest_asyncio.apply()

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()
#print("Open AI - ",os.getenv("LITELLM_URL"),os.getenv("OPENAI_API_MODEL"), os.getenv("OPENAI_API_EMBEDDING"))
#print("OLLAMA  - ",os.getenv("OLLAMA_URL"),os.getenv("OLLAMA_MODEL"))
#print("Local OLLAMA - ",os.getenv("OLLAMA_LOCAL_URL"),os.getenv("OLLAMA_LOCAL_MODEL"))

True

In [2]:
from llama_index.core import Settings

### RUN LLM AS OLLAMA 

In [None]:
#install OPEN AI LLM, skip if already installed
!pipenv install llama-index-llms-ollama

In [None]:
from llama_index.llms.ollama import Ollama
api_base=os.getenv("OLLAMA_LOCAL_URL")
model=os.getenv("OLLAMA_LOCAL_MODEL")
llm = Ollama(model=model, base_url=api_base,request_timeout=120.0)

# use remote ollam
"""
api_base=os.getenv("OLLAMA_URL")
model=os.getenv("OLLAMA_MODEL")
llm = Ollama(model=model, base_url=api_base,request_timeout=180.0)
"""
#test run
response = llm.complete("What is the capital of France?")
print(response)

### RUN LLM AS OPEN AI 

In [None]:
#install OPEN AI LLM, skip if already installed
!pipenv install llama-index-llms-openai

In [3]:
from llama_index.llms.openai import OpenAI
api_base=os.getenv("LITELLM_URL")
model=os.getenv("OPENAI_API_MODEL")

Settings.llm = OpenAI(
    model=model,
    api_base = api_base,
    temperature=0.3
)

resp = Settings.llm.complete("What is the capital of France?")
print(resp)

The capital of France is Paris.


## Setting Embedding Model

In [4]:
# use open AI embedding
from llama_index.embeddings.openai import OpenAIEmbedding
api_base=os.getenv("LITELLM_URL")
embedding_model=os.getenv("OPENAI_API_EMBEDDING")

Settings.embed_model = OpenAIEmbedding(
    model_name=embedding_model,
    api_base = api_base,
)

# embed_text = Settings.embed_model.get_text_embedding("hello")
# print(f"{len(embed_text)}, {embed_text}")

In [5]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul").load_data()

In [6]:
import logging
import sys

#logging.basicConfig(stream=sys.stdout, level=logging.INFO)
#logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from IPython.display import Markdown, display

### Chat Engine
it is a stateful analogy of a Query Engine. By keeping track of the conversation history, it can answer questions with past context in mind.

- best - Turn the query engine into a tool, for use with a ReAct data agent or an OpenAI data agent, depending on what your LLM supports. OpenAI data agents require gpt-3.5-turbo or gpt-4 as they use the function calling API from OpenAI.
- condense_question - Look at the chat history and re-write the user message to be a query for the index. Return the response after reading the response from the query engine.
- context - Retrieve nodes from the index using every user message. The retrieved text is inserted into the system prompt, so that the chat engine can either respond naturally or use the context from the query engine.
- condense_plus_context - A combination of condense_question and context. Look at the chat history and re-write the user message to be a retrieval query for the index. The retrieved text is inserted into the system prompt, so that the chat engine can either respond naturally or use the context from the query engine.
- simple - A simple chat with the LLM directly, no query engine involved.
- react - Same as best, but forces a ReAct data agent.
- openai - Same as best, but forces an OpenAI data agent.

In [7]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul").load_data()

In [8]:
from llama_index.core.node_parser import SentenceSplitter

parser = SentenceSplitter(chunk_size=1024, chunk_overlap=20)
nodes = parser.get_nodes_from_documents(documents)
index = VectorStoreIndex.from_documents(documents)

In [9]:
chat_engine = index.as_chat_engine()

In [10]:
response = chat_engine.chat("Tell me a joke.")
print(response)

Why don't scientists trust atoms?

Because they make up everything!


In [11]:
chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.



Human:  hi


Assistant: Hello! How can I assist you today?



Human:  exit


In [12]:
chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)
chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.



Human:  Hi


Querying with: Hi
Assistant: Hello! How can I assist you today?



Human:  Exit


Querying with: Could you please help me with how to exit or close this conversation?
Assistant: To exit or close the conversation, you might say something like, "Thank you for the discussion; I appreciate your insights. I need to wrap things up now." Alternatively, you could express gratitude and indicate that you have other commitments, such as, "I really enjoyed our conversation, but I have to attend to some other matters now. Let's catch up later."



Human:  exit


In [13]:
chat_engine = index.as_chat_engine(chat_mode="simple", verbose=True)
chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.



Human:  hi


Assistant: Hello! How can I assist you today?



Human:  exit


In [14]:
chat_engine = index.as_chat_engine(chat_mode="context", verbose=True)
chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.



Human:  hoi


Assistant: Hoi! How can I assist you today?



Human:  exit


In [15]:
chat_engine = index.as_chat_engine(chat_mode="react", verbose=True)
chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.



Human:  hi


Added user message to memory: hi
Assistant: Hello! How can I assist you today?



Human:  exit


In [16]:
from llama_index.core import PromptTemplate
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core.chat_engine import CondenseQuestionChatEngine

custom_prompt = PromptTemplate(
    """\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.

<Chat History>
{chat_history}

<Follow Up Message>
{question}

<Standalone question>
"""
)

# list of `ChatMessage` objects
custom_chat_history = [
    ChatMessage(
        role=MessageRole.USER,
        content="Hello assistant, we are having a insightful discussion about Paul Graham today.",
    ),
    ChatMessage(role=MessageRole.ASSISTANT, content="Okay, sounds good."),
]

query_engine = index.as_query_engine()
chat_engine = CondenseQuestionChatEngine.from_defaults(
    query_engine=query_engine,
    condense_question_prompt=custom_prompt,
    chat_history=custom_chat_history,
    verbose=True,
)
chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.



Human:  hio


Querying with: hio
Assistant: Hello! How can I assist you today?



Human:  exit
