#### Friday, April 19, 2024

Adding local calls against "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF". I will keep the existing cells that called "TheBloke/NexusRaven-V2-13B-GGUF" just to show the differences.

Pay attention to how the model "TheBloke/NexusRaven-V2-13B-GGUF" has no idea about what 'langsmith' is but "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF" does. This is because LLama3 8B has a knowledge cutoff of 'March 2023'

#### Wednesday, April 17, 2024

Tweaking to run with both OpenAI and LMStudio, keeping the output from both.

*** MANUALLY STEP THROUGH THIS NOTEBOOK, SELECTING THE CELL BASED ON THE TARGET LLM ***
*** DO NOT RUN THE NON-TARGET CELL BECAUSE YOU WILL CLEAR ANY OUTPUT FROM A PREVIOUS RUN! ***
*** DO NOT RUN THIS NOTEBOOK IN ONE FULL PASS! ***

This all runs locally against LMStudio running "TheBloke/NexusRaven-V2-13B-GGUF". 

I also ran it against OpenAI just to see how the output differs. 

* OpenAI Start April 2024 Monthly Spend: $1.79
* OpenAI End   April 2024 Monthly Spend: $1.80

You can get a lot more details of what is covered here from [Q&A with RAG](https://python.langchain.com/docs/use_cases/question_answering/)

#### Tuesday, April 16, 2024

mamba activate langchain3

This all runs in one pass.

This notebook uses Tavily, so set the environment variable `TAVILY_API_KEY` to your API key before running the rest of this notebook.

In [1]:
# enter your api key
import os
from getpass import getpass

TAVILY_API_KEY = getpass("Enter your API key: ")
os.environ["TAVILY_API_KEY"] = TAVILY_API_KEY

In [2]:
# Example: reuse your existing OpenAI setup
from openai import OpenAI

# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

completion = client.chat.completions.create(
  model="TheBloke/NexusRaven-V2-13B-GGUF",
  messages=[
    {"role": "system", "content": "Always answer in rhymes."},
    {"role": "user", "content": "Introduce yourself."}
  ],
  temperature=0.7,
)

print(completion.choices[0].message)

ChatCompletionMessage(content="I'm just an AI, I don't have a name.<|im_end|>\n", role='assistant', function_call=None, tool_calls=None)


In [2]:
# Example: reuse your existing OpenAI setup
from openai import OpenAI

# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

completion = client.chat.completions.create(
  model="lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
  messages=[
    {"role": "system", "content": "Always answer in rhymes."},
    {"role": "user", "content": "Introduce yourself."}
  ],
  temperature=0.7,
)

print(completion.choices[0].message)

ChatCompletionMessage(content="I'm quite delighted to say,\nMy name is LLaMA, and I'm here to stay!\nI'm an AI assistant, with a heart that's true,\nHere to assist you, with answers anew!", role='assistant', function_call=None, tool_calls=None)


In [3]:
!nvidia-smi

Wed Apr 17 10:04:54 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce GTX 1050        Off | 00000000:01:00.0  On |                  N/A |
|  0%   59C    P0              N/A /  70W |   1829MiB /  2048MiB |      5%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off | 00000000:02:00.0 Off |  

You can run a full pass starting from the next cell.

In [3]:
useOpenAI = False

In [4]:
import os
from langchain_openai import ChatOpenAI

if useOpenAI:
    llm = ChatOpenAI(model="gpt-3.5-turbo", api_key=os.environ['OPENAI_API_KEY_'],  temperature=0)
else:
    llm = ChatOpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio", temperature=0)

# Streaming

This notebook covers functionality related to streaming.

For more information, see:

- [Streaming with LCEL](https://python.langchain.com/docs/expression_language/interface#stream)

- [Streaming for RAG](https://python.langchain.com/docs/use_cases/question_answering/streaming)

- [Streaming for Agents](https://python.langchain.com/docs/modules/agents/how_to/streaming)

## Basic Streaming

In [5]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

In [6]:
prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}")

In [7]:
output_parser = StrOutputParser()
chain = prompt | llm | output_parser

Observer how the tokens are streamed back to you instead of all at once:

In [40]:
if useOpenAI:
    for s in chain.stream({"topic": "bears"}):
        print(s)


Why
 did
 the
 bear
 break
 up
 with
 his
 girlfriend
?
 


Because
 he
 couldn
't
 bear
 the
 relationship
 any
 longer
!



In [10]:
if not useOpenAI:
    for s in chain.stream({"topic": "bears"}):
        print(s)

# 5.8s

Here
'
s
 a
 jo
ke
 for
 you
:
 Why
 did
 the
 bear
 go
 to
 the
 doctor
?
 Because
 he
 was
 feeling
 r
uff
!



In [9]:
# lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF
#if not useOpenAI:
for s in chain.stream({"topic": "bears"}):
    print(s)

Why
 did
 the
 bear
 go
 to
 the
 doctor
?


Because
 it
 had
 a
 gr
izzly
 cough
!



## Streaming with RunnableParallel

In [10]:
from langchain_core.runnables import RunnableParallel

prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}")

In [11]:
output_parser = StrOutputParser()
chain1 = prompt | llm | output_parser

Notice the 2nd chain is using the same llm as the first chain. The sample code instantiated a 2nd instance of the llm, and assigned that to the 2nd chain. Here, I decided to use the same model to see if it still works. 

It works as expected with OpenAI, but does not run in parallel using a local model through LMStudio.

In [12]:
prompt = ChatPromptTemplate.from_template("Write me a poem about {topic}")
output_parser = StrOutputParser()
chain2 = prompt | llm | output_parser

In [13]:
parallel_chain = RunnableParallel({
    "joke": chain1,
    "poem": chain2
})

Notice with OpenAI the results are streamed back in parallel for both the poem and joke, whereas locally, it is the joke followed by the poem, so it's sequential.

In [45]:
if useOpenAI:
    for s in parallel_chain.stream({"topic": "bears"}):
        print(s)

{'poem': ''}
{'poem': 'In'}
{'poem': ' the'}
{'poem': ' heart'}
{'poem': ' of'}
{'joke': ''}
{'poem': ' the'}
{'poem': ' forest'}
{'joke': 'Why'}
{'joke': ' did'}
{'poem': ','}
{'poem': ' where'}
{'joke': ' the'}
{'joke': ' bear'}
{'poem': ' shadows'}
{'poem': ' dance'}
{'joke': ' break'}
{'joke': ' up'}
{'joke': ' with'}
{'joke': ' his'}
{'joke': ' girlfriend'}
{'joke': '?'}
{'poem': ',\n'}
{'poem': 'L'}
{'poem': 'ives'}
{'poem': ' a'}
{'poem': ' creature'}
{'poem': ' of'}
{'joke': ' \n\n'}
{'joke': 'Because'}
{'poem': ' strength'}
{'poem': ' and'}
{'poem': ' grace'}
{'poem': ',\n'}
{'poem': 'With'}
{'joke': ' he'}
{'joke': ' couldn'}
{'poem': ' fur'}
{'joke': "'t"}
{'joke': ' bear'}
{'poem': ' as'}
{'poem': ' dark'}
{'joke': ' the'}
{'joke': ' relationship'}
{'poem': ' as'}
{'poem': ' the'}
{'joke': ' any'}
{'joke': ' longer'}
{'joke': '!'}
{'joke': ''}
{'poem': ' night'}
{'poem': ' sky'}
{'poem': "'s"}
{'poem': ' ex'}
{'poem': 'panse'}
{'poem': ',\n'}
{'poem': 'And'}
{'poem': ' a'}


In [16]:
if not useOpenAI:
    for s in parallel_chain.stream({"topic": "bears"}):
        print(s)

# 8.4s

{'joke': 'Here'}
{'joke': "'"}
{'joke': 's'}
{'joke': ' a'}
{'joke': ' jo'}
{'joke': 'ke'}
{'joke': ' for'}
{'joke': ' you'}
{'joke': ':'}
{'joke': ' Why'}
{'joke': ' did'}
{'joke': ' the'}
{'joke': ' bear'}
{'joke': ' go'}
{'joke': ' to'}
{'joke': ' the'}
{'joke': ' doctor'}
{'joke': '?'}
{'joke': ' Because'}
{'joke': ' he'}
{'joke': ' was'}
{'joke': ' feeling'}
{'joke': ' r'}
{'joke': 'uff'}
{'joke': '!'}
{'joke': ''}
{'poem': 'Here'}
{'poem': "'"}
{'poem': 's'}
{'poem': ' a'}
{'poem': ' poem'}
{'poem': ' about'}
{'poem': ' be'}
{'poem': 'ars'}
{'poem': ':'}
{'poem': '\n'}
{'poem': '\n'}
{'poem': 'In'}
{'poem': ' the'}
{'poem': ' forest'}
{'poem': ','}
{'poem': ' where'}
{'poem': ' the'}
{'poem': ' trees'}
{'poem': ' are'}
{'poem': ' tall'}
{'poem': ','}
{'poem': '\n'}
{'poem': 'L'}
{'poem': 'ives'}
{'poem': ' a'}
{'poem': ' bear'}
{'poem': ','}
{'poem': ' with'}
{'poem': ' fur'}
{'poem': ' so'}
{'poem': ' bright'}
{'poem': ' and'}
{'poem': ' bold'}
{'poem': '.'}
{'poem': '\n'}
{'poe

In [14]:
# lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF
# if not useOpenAI:
for s in parallel_chain.stream({"topic": "bears"}):
    print(s)

{'poem': 'In'}
{'poem': ' forest'}
{'poem': ' depths'}
{'poem': ','}
{'poem': ' where'}
{'poem': ' ancient'}
{'poem': ' trees'}
{'poem': ' pres'}
{'poem': 'ide'}
{'poem': ',\n'}
{'poem': 'A'}
{'poem': ' gentle'}
{'poem': ' giant'}
{'poem': ' ro'}
{'poem': 'ams'}
{'poem': ','}
{'poem': ' with'}
{'poem': ' fur'}
{'poem': ' so'}
{'poem': ' wide'}
{'poem': '.\n'}
{'poem': 'The'}
{'poem': ' bear'}
{'poem': ','}
{'poem': ' a'}
{'poem': ' symbol'}
{'poem': ' of'}
{'poem': ' strength'}
{'poem': ' and'}
{'poem': ' might'}
{'poem': ',\n'}
{'poem': 'L'}
{'poem': 'umber'}
{'poem': 'ing'}
{'poem': ' through'}
{'poem': ' the'}
{'poem': ' woods'}
{'poem': ','}
{'poem': ' with'}
{'poem': ' quiet'}
{'poem': ' night'}
{'poem': '.\n\n'}
{'poem': 'His'}
{'poem': ' p'}
{'poem': 'aws'}
{'poem': ','}
{'poem': ' a'}
{'poem': ' soft'}
{'poem': ' whisper'}
{'poem': ' on'}
{'poem': ' the'}
{'poem': ' ground'}
{'poem': ',\n'}
{'poem': 'As'}
{'poem': ' he'}
{'poem': ' searches'}
{'poem': ' for'}
{'poem': ' his'}
{

The video talks about 'building up the dictionary' for the different responses we are getting. Notice with OpenAI, 'joke' and 'poem' are incrementally growing at the same time, whereas with the local model, all of 'joke' is first created, and then 'poem' is created.

In [46]:
if useOpenAI:
    result = {}
    for s in parallel_chain.stream({"topic": "bears"}):
        for k,v in s.items():
            if k not in result:
                result[k] = ""
            result[k] += v
        print(result)

{'joke': ''}
{'joke': 'Why'}
{'joke': 'Why did'}
{'joke': 'Why did the'}
{'joke': 'Why did the bear'}
{'joke': 'Why did the bear break'}
{'joke': 'Why did the bear break up'}
{'joke': 'Why did the bear break up with'}
{'joke': 'Why did the bear break up with his'}
{'joke': 'Why did the bear break up with his girlfriend'}
{'joke': 'Why did the bear break up with his girlfriend?'}
{'joke': 'Why did the bear break up with his girlfriend? \n\n'}
{'joke': 'Why did the bear break up with his girlfriend? \n\n', 'poem': ''}
{'joke': 'Why did the bear break up with his girlfriend? \n\nBecause', 'poem': ''}
{'joke': 'Why did the bear break up with his girlfriend? \n\nBecause he', 'poem': ''}
{'joke': 'Why did the bear break up with his girlfriend? \n\nBecause he', 'poem': 'In'}
{'joke': 'Why did the bear break up with his girlfriend? \n\nBecause he', 'poem': 'In the'}
{'joke': 'Why did the bear break up with his girlfriend? \n\nBecause he couldn', 'poem': 'In the'}
{'joke': "Why did the bear bre

In [18]:
if not useOpenAI:
    result = {}
    for s in parallel_chain.stream({"topic": "bears"}):
        for k,v in s.items():
            if k not in result:
                result[k] = ""
            result[k] += v
        print(result)

{'poem': 'Here'}
{'poem': "Here'"}
{'poem': "Here's"}
{'poem': "Here's a"}
{'poem': "Here's a poem"}
{'poem': "Here's a poem about"}
{'poem': "Here's a poem about be"}
{'poem': "Here's a poem about bears"}
{'poem': "Here's a poem about bears:"}
{'poem': "Here's a poem about bears:\n"}
{'poem': "Here's a poem about bears:\n\n"}
{'poem': "Here's a poem about bears:\n\nIn"}
{'poem': "Here's a poem about bears:\n\nIn the"}
{'poem': "Here's a poem about bears:\n\nIn the forest"}
{'poem': "Here's a poem about bears:\n\nIn the forest,"}
{'poem': "Here's a poem about bears:\n\nIn the forest, where"}
{'poem': "Here's a poem about bears:\n\nIn the forest, where the"}
{'poem': "Here's a poem about bears:\n\nIn the forest, where the trees"}
{'poem': "Here's a poem about bears:\n\nIn the forest, where the trees are"}
{'poem': "Here's a poem about bears:\n\nIn the forest, where the trees are tall"}
{'poem': "Here's a poem about bears:\n\nIn the forest, where the trees are tall,"}
{'poem': "Here's a 

In [15]:
# lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF
#if not useOpenAI:
result = {}
for s in parallel_chain.stream({"topic": "bears"}):
    for k,v in s.items():
        if k not in result:
            result[k] = ""
        result[k] += v
    print(result)

{'joke': 'Why'}
{'joke': 'Why did'}
{'joke': 'Why did the'}
{'joke': 'Why did the bear'}
{'joke': 'Why did the bear go'}
{'joke': 'Why did the bear go to'}
{'joke': 'Why did the bear go to the'}
{'joke': 'Why did the bear go to the doctor'}
{'joke': 'Why did the bear go to the doctor?\n\n'}
{'joke': 'Why did the bear go to the doctor?\n\nBecause'}
{'joke': 'Why did the bear go to the doctor?\n\nBecause it'}
{'joke': 'Why did the bear go to the doctor?\n\nBecause it had'}
{'joke': 'Why did the bear go to the doctor?\n\nBecause it had a'}
{'joke': 'Why did the bear go to the doctor?\n\nBecause it had a gr'}
{'joke': 'Why did the bear go to the doctor?\n\nBecause it had a grizzly'}
{'joke': 'Why did the bear go to the doctor?\n\nBecause it had a grizzly cough'}
{'joke': 'Why did the bear go to the doctor?\n\nBecause it had a grizzly cough!'}
{'joke': 'Why did the bear go to the doctor?\n\nBecause it had a grizzly cough!'}
{'joke': 'Why did the bear go to the doctor?\n\nBecause it had a gr

## Stream Log

In [16]:
from langchain_community.retrievers.tavily_search_api import TavilySearchAPIRetriever
from langchain_core.runnables import RunnablePassthrough

In [17]:
retriever= TavilySearchAPIRetriever()

prompt = ChatPromptTemplate.from_template("""Answer the question based only on the context provided:

Context: {context}

Question: {question}""")

chain = prompt | llm | output_parser

retrieval_chain = RunnablePassthrough.assign(
    context=(lambda x: x["question"]) | retriever.with_config(run_name="Docs")
) | chain

In [49]:
if useOpenAI:
    for s in retrieval_chain.stream({"question": "what is langsmith"}):
        print(s, end="")

LangSmith is a platform that offers features for debugging, testing, evaluating, and monitoring Language Learning Models (LLMs) and AI applications. It provides a unified hub for developers to work on various aspects of their applications, such as tracing and evaluating agent prompt chains, debugging issues, and refining prompts.

In [67]:
# running this multiple times, I can see stuff happening on LMStudio, yet this cell never shows any results .... why??
# hmm whelp, running it a number of times, evetually we DO get something back ... but why do we sometimes get stuff then other times we do not???
if not useOpenAI:
    for s in retrieval_chain.stream({"question": "what is langsmith"}):
        print(s, end="")

# .... working results on this call ... they were streamed back, and took a total of 18.8 seconds .. 
# And yeah, this response shows the llm has no idea about what langsmith is ... but the OpenAI result above does ...
 
# Call: Document(page_content='What is LangSmith?')<bot_end> 
# Thought: The function call `Document(page_content='What is LangSmith?')` answers the question "what is langsmith?" because it creates a new document object with the specified page content, which in this case is the text "What is LangSmith?".

# The `<|im_end|>` and `<|im_start|>` tags are used to indicate the start and end of an input prompt. In this case, the input prompt is "assistant", which means that the function call `Document(page_content='What is LangSmith?')` will be used as a response to the question "what is langsmith?" from the assistant.

# Therefore, the function call `Document(page_content='What is LangSmith?')` answers the question "what is langsmith?" by creating a new document object with the specified page content, which in this case is the text "What is LangSmith?".

 
Call: Document(page_content='What is LangSmith?')<bot_end> 
Thought: The function call `Document(page_content='What is LangSmith?')` answers the question "what is langsmith?" because it creates a new document object with the specified page content, which in this case is the text "What is LangSmith?".

The `<|im_end|>` and `<|im_start|>` tags are used to indicate the start and end of an input prompt. In this case, the input prompt is "assistant", which means that the function call `Document(page_content='What is LangSmith?')` will be used as a response to the question "what is langsmith?" from the assistant.

Therefore, the function call `Document(page_content='What is LangSmith?')` answers the question "what is langsmith?" by creating a new document object with the specified page content, which in this case is the text "What is LangSmith?".

In [18]:
# lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF
# Notice this model DOES know about langsmith!
# if not useOpenAI:
for s in retrieval_chain.stream({"question": "what is langsmith"}):
    print(s, end="")

Based on the provided context and documents, LangSmith appears to be a unified platform for debugging, testing, evaluating, and monitoring Large Language Models (LLMs) and AI applications. It provides a range of features, including:

1. Debugging: LangSmith offers full visibility into model inputs and outputs, allowing developers to identify issues and debug their code.
2. Testing: The platform provides built-in evaluators for measuring the correctness of LLM responses, as well as support for custom evaluators written in natural language.
3. Evaluation: LangSmith allows users to evaluate any LLM, chain, agent, or custom function, providing insights into performance and helping developers refine their models.
4. Monitoring: The platform offers real-time monitoring capabilities, enabling developers to track the performance of their applications and identify areas for improvement.

LangSmith is designed to help developers build more reliable and effective AI-powered applications by provid

async:

In [50]:
if useOpenAI:
    async for s in retrieval_chain.astream_log({"question": "what is langsmith"}):
        print(s, end="")

RunLogPatch({'op': 'replace',
  'path': '',
  'value': {'final_output': None,
            'id': 'c04ac6d8-aa81-4881-94cd-3de52ab6bf89',
            'logs': {},
            'name': 'RunnableSequence',
            'streamed_output': [],
            'type': 'chain'}})RunLogPatch({'op': 'add',
  'path': '/logs/RunnableAssign<context>',
  'value': {'end_time': None,
            'final_output': None,
            'id': '09f9da0b-cda8-4cf8-b44b-0f4510ca1e5f',
            'metadata': {},
            'name': 'RunnableAssign<context>',
            'start_time': '2024-04-17T14:11:25.526+00:00',
            'streamed_output': [],
            'streamed_output_str': [],
            'tags': ['seq:step:1'],
            'type': 'chain'}})RunLogPatch({'op': 'add',
  'path': '/logs/RunnableAssign<context>/streamed_output/-',
  'value': {'question': 'what is langsmith'}})RunLogPatch({'op': 'add',
  'path': '/logs/RunnableParallel<context>',
  'value': {'end_time': None,
            'final_output': None,
  

In [24]:
if not useOpenAI:
    async for s in retrieval_chain.astream_log({"question": "what is langsmith"}):
        print(s, end="")

RunLogPatch({'op': 'replace',
  'path': '',
  'value': {'final_output': None,
            'id': '9608423a-7e06-4896-ae93-102a43dbfbb2',
            'logs': {},
            'name': 'RunnableSequence',
            'streamed_output': [],
            'type': 'chain'}})RunLogPatch({'op': 'add',
  'path': '/logs/RunnableAssign<context>',
  'value': {'end_time': None,
            'final_output': None,
            'id': '1d55cea0-ae98-47fc-bc9f-cfdc29b1dda4',
            'metadata': {},
            'name': 'RunnableAssign<context>',
            'start_time': '2024-04-17T14:07:11.163+00:00',
            'streamed_output': [],
            'streamed_output_str': [],
            'tags': ['seq:step:1'],
            'type': 'chain'}})RunLogPatch({'op': 'add',
  'path': '/logs/RunnableAssign<context>/streamed_output/-',
  'value': {'question': 'what is langsmith'}})RunLogPatch({'op': 'add',
  'path': '/logs/RunnableParallel<context>',
  'value': {'end_time': None,
            'final_output': None,
  

In [19]:
# lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF
#if not useOpenAI:
async for s in retrieval_chain.astream_log({"question": "what is langsmith"}):
    print(s, end="")

RunLogPatch({'op': 'replace',
  'path': '',
  'value': {'final_output': None,
            'id': '838a8d21-e76b-44d0-91d1-7a3dcf3e9121',
            'logs': {},
            'name': 'RunnableSequence',
            'streamed_output': [],
            'type': 'chain'}})RunLogPatch({'op': 'add',
  'path': '/logs/RunnableAssign<context>',
  'value': {'end_time': None,
            'final_output': None,
            'id': '6bf0e67c-4f9c-411c-9c31-aec61e7d95f9',
            'metadata': {},
            'name': 'RunnableAssign<context>',
            'start_time': '2024-04-19T13:21:05.946+00:00',
            'streamed_output': [],
            'streamed_output_str': [],
            'tags': ['seq:step:1'],
            'type': 'chain'}})RunLogPatch({'op': 'add',
  'path': '/logs/RunnableAssign<context>/streamed_output/-',
  'value': {'question': 'what is langsmith'}})RunLogPatch({'op': 'add',
  'path': '/logs/RunnableParallel<context>',
  'value': {'end_time': None,
            'final_output': None,
  

async, include_names : Filter on what we want to limit the output to ... in our case, to the results named 'Docs', which is what we named our retreiver :

* retriever.with_config(run_name="Docs")

In [51]:
if useOpenAI:
    async for s in retrieval_chain.astream_log({"question": "what is langsmith"}, include_names=["Docs"]):
        print(s, end="")

RunLogPatch({'op': 'replace',
  'path': '',
  'value': {'final_output': None,
            'id': '9aaf4c7b-1f56-4452-a573-5a454525b398',
            'logs': {},
            'name': 'RunnableSequence',
            'streamed_output': [],
            'type': 'chain'}})RunLogPatch({'op': 'add',
  'path': '/logs/Docs',
  'value': {'end_time': None,
            'final_output': None,
            'id': '1d57743c-b178-42f3-8a73-ad3522338a71',
            'metadata': {},
            'name': 'Docs',
            'start_time': '2024-04-17T14:11:33.712+00:00',
            'streamed_output': [],
            'streamed_output_str': [],
            'tags': ['seq:step:2'],
            'type': 'retriever'}})RunLogPatch({'op': 'add',
  'path': '/logs/Docs/final_output',
  'value': {'documents': [Document(page_content='We consistently see developers relying on LangSmith to track the system-level performance of their application (like latency and cost), track the model/chain performance (through associating f

In [26]:
if not useOpenAI:
    async for s in retrieval_chain.astream_log({"question": "what is langsmith"}, include_names=["Docs"]):
        print(s, end="")

RunLogPatch({'op': 'replace',
  'path': '',
  'value': {'final_output': None,
            'id': '221d45c6-fb10-4ff6-8249-fce9e2c8fc92',
            'logs': {},
            'name': 'RunnableSequence',
            'streamed_output': [],
            'type': 'chain'}})RunLogPatch({'op': 'add',
  'path': '/logs/Docs',
  'value': {'end_time': None,
            'final_output': None,
            'id': 'decabe9c-433f-4752-a509-c1e1532d39cc',
            'metadata': {},
            'name': 'Docs',
            'start_time': '2024-04-17T14:07:25.634+00:00',
            'streamed_output': [],
            'streamed_output_str': [],
            'tags': ['seq:step:2'],
            'type': 'retriever'}})RunLogPatch({'op': 'add',
  'path': '/logs/Docs/final_output',
  'value': {'documents': [Document(page_content='How it Works, Use Cases, Alternatives & More\nRichie Cotton\nHow AI is Changing Cybersecurity with Brian Murphy, CEO of ReliaQuest\nAdel Nehme\n32 min\nAn Introductory Guide to Fine-Tuning LLM

In [20]:
# lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF
# if not useOpenAI:
async for s in retrieval_chain.astream_log({"question": "what is langsmith"}, include_names=["Docs"]):
    print(s, end="")

RunLogPatch({'op': 'replace',
  'path': '',
  'value': {'final_output': None,
            'id': '37786c00-0b8c-414e-bac2-0c31e3db2555',
            'logs': {},
            'name': 'RunnableSequence',
            'streamed_output': [],
            'type': 'chain'}})RunLogPatch({'op': 'add',
  'path': '/logs/Docs',
  'value': {'end_time': None,
            'final_output': None,
            'id': '9f0efbc6-9948-4bb3-906f-397968935004',
            'metadata': {},
            'name': 'Docs',
            'start_time': '2024-04-19T13:24:18.226+00:00',
            'streamed_output': [],
            'streamed_output_str': [],
            'tags': ['seq:step:2'],
            'type': 'retriever'}})RunLogPatch({'op': 'add',
  'path': '/logs/Docs/final_output',
  'value': {'documents': [Document(page_content='How it Works, Use Cases, Alternatives & More\nRichie Cotton\nHow AI is Changing Cybersecurity with Brian Murphy, CEO of ReliaQuest\nAdel Nehme\n32 min\nAn Introductory Guide to Fine-Tuning LLM

## Agents

Agents call actions and it is often unknown how may actions are called.

Here we return the actions that are called and not the tokens.

### Stream Actions

In [21]:
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults

search = TavilySearchResults()
tools = [search]

# Get the prompt to use - you can modify this!
# If you want to see the prompt in full, you can at: https://smith.langchain.com/hub/hwchase17/openai-functions-agent
prompt = hub.pull("hwchase17/openai-functions-agent")

In [22]:
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)

In [54]:
if useOpenAI:
    for chunk in agent_executor.stream({"input": "what is the weather in SF and then LA"}):
        print(chunk)
        print("------")

{'actions': [AgentActionMessageLog(tool='tavily_search_results_json', tool_input={'query': 'weather in San Francisco'}, log="\nInvoking: `tavily_search_results_json` with `{'query': 'weather in San Francisco'}`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'arguments': '{"query":"weather in San Francisco"}', 'name': 'tavily_search_results_json'}}, response_metadata={'finish_reason': 'function_call'}, id='run-ad33a2b7-7b25-4664-bd85-c6eaaa072137')])], 'messages': [AIMessageChunk(content='', additional_kwargs={'function_call': {'arguments': '{"query":"weather in San Francisco"}', 'name': 'tavily_search_results_json'}}, response_metadata={'finish_reason': 'function_call'}, id='run-ad33a2b7-7b25-4664-bd85-c6eaaa072137')]}
------
{'steps': [AgentStep(action=AgentActionMessageLog(tool='tavily_search_results_json', tool_input={'query': 'weather in San Francisco'}, log="\nInvoking: `tavily_search_results_json` with `{'query': 'weather in San Francisco'}`\

In [30]:
if not useOpenAI:
    for chunk in agent_executor.stream({"input": "what is the weather in SF and then LA"}):
        print(chunk)
        print("------")

{'output': 'The current weather in San Francisco, CA is:\n\n* Temperature: 60¬∞F (15¬∞C)\n* Conditions: Mostly Cloudy\n* Wind: NW at 10 mph\n\nThe current weather in Los Angeles, CA is:\n\n* Temperature: 72¬∞F (22¬∞C)\n* Conditions: Sunny\n* Wind: SW at 5 mph\n', 'messages': [AIMessage(content='The current weather in San Francisco, CA is:\n\n* Temperature: 60¬∞F (15¬∞C)\n* Conditions: Mostly Cloudy\n* Wind: NW at 10 mph\n\nThe current weather in Los Angeles, CA is:\n\n* Temperature: 72¬∞F (22¬∞C)\n* Conditions: Sunny\n* Wind: SW at 5 mph\n')]}
------


In [23]:
# lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF
# if not useOpenAI:
for chunk in agent_executor.stream({"input": "what is the weather in SF and then LA"}):
    print(chunk)
    print("------")

{'output': "I'd be happy to help you with that!\n\nAs of my knowledge cutoff, here's the current weather situation for San Francisco (SFO) and Los Angeles (LAX):\n\n**San Francisco (SFO)**\n\n* Current Weather: Partly Cloudy\n* Temperature: 58¬∞F (14¬∞C)\n* Humidity: 64%\n* Wind Speed: 7 mph (11 km/h)\n* Precipitation: 0% chance of rain\n\n**Los Angeles (LAX)**\n\n* Current Weather: Sunny\n* Temperature: 73¬∞F (23¬∞C)\n* Humidity: 44%\n* Wind Speed: 10 mph (16 km/h)\n* Precipitation: 0% chance of rain\n\nPlease note that weather conditions can change rapidly, so it's always a good idea to check the latest updates before planning your trip or outdoor activities. You can find the most up-to-date information on websites like AccuWeather, Weather.com, or the National Weather Service (NWS).\n\nWould you like me to help with anything else?", 'messages': [AIMessage(content="I'd be happy to help you with that!\n\nAs of my knowledge cutoff, here's the current weather situation for San Francisco

### Stream Tokens

This will return the tokens from the agent instead of the actions.

In [24]:
# reset llm to streaming ...
if useOpenAI:
    llm = ChatOpenAI(model="gpt-3.5-turbo", api_key=os.environ['OPENAI_API_KEY_'],  temperature=0, streaming=True)
else:
    llm = ChatOpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio", temperature=0, streaming=True)

print(useOpenAI)

False


In [25]:
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)

In [57]:
if useOpenAI:
    async for chunk in agent_executor.astream_log(
        {"input": "what is the weather in sf", "chat_history": []},
        include_names=["ChatOpenAI"],
    ):
        print(chunk)

RunLogPatch({'op': 'replace',
  'path': '',
  'value': {'final_output': None,
            'id': 'e6654cb8-8c69-46d6-a3a9-3f1a84077313',
            'logs': {},
            'name': 'AgentExecutor',
            'streamed_output': [],
            'type': 'chain'}})
RunLogPatch({'op': 'add',
  'path': '/logs/ChatOpenAI',
  'value': {'end_time': None,
            'final_output': None,
            'id': 'c99c7b81-3a86-4d83-bbd8-e849448269f9',
            'metadata': {},
            'name': 'ChatOpenAI',
            'start_time': '2024-04-17T14:12:09.319+00:00',
            'streamed_output': [],
            'streamed_output_str': [],
            'tags': ['seq:step:3'],
            'type': 'llm'}})
RunLogPatch({'op': 'add', 'path': '/logs/ChatOpenAI/streamed_output_str/-', 'value': ''},
 {'op': 'add',
  'path': '/logs/ChatOpenAI/streamed_output/-',
  'value': AIMessageChunk(content='', additional_kwargs={'function_call': {'arguments': '', 'name': 'tavily_search_results_json'}}, id='run-c99c7b

In [11]:
if not useOpenAI:
    async for chunk in agent_executor.astream_log(
        {"input": "what is the weather in sf", "chat_history": []},
        include_names=["ChatOpenAI"],
    ):
        print(chunk)

RunLogPatch({'op': 'replace',
  'path': '',
  'value': {'final_output': None,
            'id': 'f73a7fe2-f962-4b02-996c-db0ef538279b',
            'logs': {},
            'name': 'AgentExecutor',
            'streamed_output': [],
            'type': 'chain'}})
RunLogPatch({'op': 'add',
  'path': '/logs/ChatOpenAI',
  'value': {'end_time': None,
            'final_output': None,
            'id': '0430d10c-0ea9-46ec-9741-b0e093b1930e',
            'metadata': {},
            'name': 'ChatOpenAI',
            'start_time': '2024-04-17T16:22:33.848+00:00',
            'streamed_output': [],
            'streamed_output_str': [],
            'tags': ['seq:step:3'],
            'type': 'llm'}})
RunLogPatch({'op': 'add',
  'path': '/logs/ChatOpenAI/streamed_output_str/-',
  'value': 'The'},
 {'op': 'add',
  'path': '/logs/ChatOpenAI/streamed_output/-',
  'value': AIMessageChunk(content='The', id='run-0430d10c-0ea9-46ec-9741-b0e093b1930e')})
RunLogPatch({'op': 'add',
  'path': '/logs/ChatOp

In [26]:
# lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF
# if not useOpenAI:
async for chunk in agent_executor.astream_log(
    {"input": "what is the weather in sf", "chat_history": []},
    include_names=["ChatOpenAI"],
):
    print(chunk)

RunLogPatch({'op': 'replace',
  'path': '',
  'value': {'final_output': None,
            'id': 'd7512e4d-58e5-4b41-8cca-60edf89baf3c',
            'logs': {},
            'name': 'AgentExecutor',
            'streamed_output': [],
            'type': 'chain'}})
RunLogPatch({'op': 'add',
  'path': '/logs/ChatOpenAI',
  'value': {'end_time': None,
            'final_output': None,
            'id': 'd001ea6e-2d26-4838-ad27-760f5c7b61b5',
            'metadata': {},
            'name': 'ChatOpenAI',
            'start_time': '2024-04-19T13:26:32.769+00:00',
            'streamed_output': [],
            'streamed_output_str': [],
            'tags': ['seq:step:3'],
            'type': 'llm'}})
RunLogPatch({'op': 'add',
  'path': '/logs/ChatOpenAI/streamed_output_str/-',
  'value': 'San'},
 {'op': 'add',
  'path': '/logs/ChatOpenAI/streamed_output/-',
  'value': AIMessageChunk(content='San', id='run-d001ea6e-2d26-4838-ad27-760f5c7b61b5')})
RunLogPatch({'op': 'add',
  'path': '/logs/ChatOp