# Local LLM

Here, we'll use a server that's serving a local LLM.

**Attention** This is OK for prototyping / dev usage, but should not be used for production cases when there might be concurrent requests from different users. As of the time of writing, Ollama is designed for single user and cannot handle concurrent requests see this issue: https://github.com/ollama/ollama/issues/358

In [1]:
from langchain_core.prompts import ChatPromptTemplate

In [7]:
from langserve import RemoteRunnable

model = RemoteRunnable("http://localhost:8000/ollama/")

Let's test out the standard interface of a chat model.

In [17]:
prompt = "Tell me a joke about cats"

In [18]:
model.invoke(prompt)

AIMessage(content='<think>\nAlright, so I need to come up with a joke about cats. Hmm, where do I start? Maybe I can think of something common that people like to say about cats. Oh, wait, "Do cats really have hearts?" That\'s always been on my mind.\n\nI should explain why that\'s funny or appropriate. So, if you cut their tail, they lose their tails, right? But in a cat\'s case, it\'s more about losing their mius rather than their heart. That\'s actually an interesting point because not all animals have hearts, so cats definitely don\'t either.\n\nBut wait, the joke is trying to play on words there. It\'s a pun on "tail" and "heart," so that creates some humor. I need to make sure it makes sense but also isn\'t obvious. The setup gives a reason for the punchline, making it relatable and funny.\n\nI think this works because it uses a wordplay that people often find amusing—mixing animal-related words in a humorous way. It\'s simple enough for anyone to understand but still maintains t

In [5]:
await model.ainvoke(prompt)

AIMessage(content='<think>\nOkay, so I need to come up with a three-sentence story about a cat. Let me think about how to approach this. First, the topic is cats, which are pretty common pets but can have various personalities and traits. Maybe I should focus on something relatable or interesting to make it engaging.\n\nI remember that cats are often depicted as loyal and protective, so perhaps I can center around their loyalty or maybe something like a problem they face. The three-sentence limit is tight, so each sentence needs to be concise yet impactful.\n\nLet me outline possible elements: the cat\'s appearance, their personality traits, and how they interact with others. For example, if I focus on their protective nature, showing them caring for a friend or family member. That could work well because it introduces a problem (the cat acting alone) that needs solving through the cat\'s actions.\n\nNow, structuring this into three sentences: first introducing the situation where the 

Batched API works, but b/c ollama does not support parallelism, it's no faster than using .invoke twice.

In [21]:
%%time
model.batch([prompt, prompt])

CPU times: user 7.65 ms, sys: 6.57 ms, total: 14.2 ms
Wall time: 5.51 s


[AIMessage(content='\nSure! Here is a three sentence story about a cat:\n\nMr. Whiskers was a sleek black cat with bright green eyes. He spent his days lounging in the sunbeams that streamed through the living room window, chasing the occasional fly, and purring contentedly. Despite his lazy demeanor, Mr. Whiskers was always on the lookout for a warm lap to curl up in.'),
 AIMessage(content='\nSure! Here is a three sentence story about a cat:\n\nMittens the cat purred contentedly on the windowsill, basking in the warm sunlight that streamed through the glass. Suddenly, a tiny bird perched on the ledge outside, tweeting nervously as it eyed the cat with suspicion. Without hesitation, Mittens pounced, snatching the bird in mid-air and devouring it in one quick motion.')]

In [22]:
%%time
for _ in range(2):
    model.invoke(prompt)

CPU times: user 9.72 ms, sys: 7.59 ms, total: 17.3 ms
Wall time: 5.56 s


In [23]:
await model.abatch([prompt, prompt])

[AIMessage(content="\nSure, here's a three sentence story about a cat:\n\nMittens the cat purred contentedly on the windowsill, basking in the warm sunlight that streamed through the glass. Her bright green eyes sparkled as she watched a bird flit and flutter outside, wishing she could join it in its flight. Just then, her owner entered the room with a bowl of creamy milk, causing Mittens to jump down from the windowsill and rub against their legs in excitement."),
 AIMessage(content='\nSure! Here is a three sentence story about a cat:\n\nMittens the cat purred contentedly on my lap, her soft fur a soothing balm for my frazzled nerves. As I stroked her back, she gazed up at me with big, round eyes, purring even louder. It was hard to resist the charm of this little ball of fluff, and I found myself smiling and scratching behind her ears.')]

Streaming is available by default

In [24]:
for chunk in model.stream(prompt):
    print(chunk.content, end="|", flush=True)


|S|ure|,| here| is| a| |3| sentence| story| about| a| cat|:|
|
|M|itt|ens| the| cat| pur|red| content|edly| on| the| windows|ill|,| her| tail| tw|itch|ing| as| she| watched| the| birds| outside|.| Sud|den|ly|,| a| squ|ir|rel| sc|am|per|ed| by| and| Mitt|ens| was| on| high| alert|,| her| ears| per|ked| up| and| ready| to| p|ounce|.| With| light|ning| quick| ref|lex|es|,| Mitt|ens| jump|ed| from| the| windows|ill| and| ch|ased| after| the| squ|ir|rel|,| her| tail| streaming| behind| her|.||

In [25]:
async for chunk in model.astream(prompt):
    print(chunk.content, end="|", flush=True)


|The| cat| pur|red| content|edly| on| my| lap|,| its| soft| fur| a| so|othing| bal|m| for| my| fra|zz|led| n|erves|.| As| I| st|rok|ed| its| back|,| it| gaz|ed| up| at| me| with| soul|ful| eyes|,| p|urr|ing| loud|ly| in| appro|val|.| In| that| moment|,| all| was| right| with| the| world|.||

And so is the event stream API

In [26]:
i = 0
async for event in model.astream_events(prompt, version='v1'):
    print(event)
    if i > 10:
        print('...')
        break
    i += 1

{'event': 'on_chat_model_start', 'run_id': '6c2bdfc1-d482-4861-886c-c737a50771c3', 'name': '/ollama', 'tags': [], 'metadata': {}, 'data': {'input': 'Tell me a 3 sentence story about a cat.'}}
{'event': 'on_chat_model_stream', 'run_id': '6c2bdfc1-d482-4861-886c-c737a50771c3', 'tags': [], 'metadata': {}, 'name': '/ollama', 'data': {'chunk': AIMessageChunk(content='\n')}}
{'event': 'on_chat_model_stream', 'run_id': '6c2bdfc1-d482-4861-886c-c737a50771c3', 'tags': [], 'metadata': {}, 'name': '/ollama', 'data': {'chunk': AIMessageChunk(content='S')}}
{'event': 'on_chat_model_stream', 'run_id': '6c2bdfc1-d482-4861-886c-c737a50771c3', 'tags': [], 'metadata': {}, 'name': '/ollama', 'data': {'chunk': AIMessageChunk(content='ure')}}
{'event': 'on_chat_model_stream', 'run_id': '6c2bdfc1-d482-4861-886c-c737a50771c3', 'tags': [], 'metadata': {}, 'name': '/ollama', 'data': {'chunk': AIMessageChunk(content=',')}}
{'event': 'on_chat_model_stream', 'run_id': '6c2bdfc1-d482-4861-886c-c737a50771c3', 'tags