Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
fa23e9d
SimpleChatToolCalling: Test/Explore srvr initial hs using cmdline
hanishkvc Oct 10, 2025
75ce9e4
SimpleChatTools: Add boolean to allow user control of tools use
hanishkvc Oct 10, 2025
68fc28f
SimpleChatTC: Update test shell script a bit
hanishkvc Oct 10, 2025
85845a0
SimpleChatTC: Add skeleton for a javascript interpretor tool call
hanishkvc Oct 10, 2025
bbaae70
SimpleChatTC: More generic tooljs, SimpCalc, some main skeleton
hanishkvc Oct 10, 2025
f091568
SimpleChatTC: Bring in the tools meta into the main flow
hanishkvc Oct 10, 2025
9d8be85
SimpleChatTC: use tcpdump to dbg hs; check if ai aware of tools
hanishkvc Oct 10, 2025
2e4693c
SimpleChatTC: Skeleton to handle diff fields when streaming
hanishkvc Oct 10, 2025
27161cb
SimpleChatTC: Extract streamed field - assume only 1f at any time
hanishkvc Oct 10, 2025
788d56a
SimpleChatTC: Avoid null content, Fix oversight wrt finish_reason
hanishkvc Oct 10, 2025
4cbe1d2
SimpleChatTC: Show toolcall being generated by ai - Temp
hanishkvc Oct 10, 2025
174b0b1
SimpleChatTC: AssistantResponse class initial go
hanishkvc Oct 10, 2025
10b1013
SimpleChatTC: AssistantResponse everywhere initial go
hanishkvc Oct 10, 2025
e4e29a2
SimpleChatTC: twins wrt streamed response handling
hanishkvc Oct 10, 2025
d7f612f
SimpleChatTC: Saner/Robust AssistantResponse content_equiv
hanishkvc Oct 10, 2025
2a27697
SimpleChatTC:tooljs: Trap console.log and store in new result key
hanishkvc Oct 11, 2025
92b82ae
SimpleChatTC: Implement a simple toolcall handling flow
hanishkvc Oct 11, 2025
d8b1b36
SimpleChatTC: Cleanup initial/1st go toolcall flow
hanishkvc Oct 11, 2025
7a2bcfb
SimpleChatTC: Trap any exception raised during tool call
hanishkvc Oct 11, 2025
f10ab96
SimpleChatTC: More clearer description of toolcalls execution env
hanishkvc Oct 12, 2025
a1f1776
SimpleChatTC: Clarify some type definitions to avoid warnings
hanishkvc Oct 12, 2025
4ac6f0a
SimpleChatTC: Move tool calling to tools, try trap async failures
hanishkvc Oct 12, 2025
3796306
SimpleChatTC: Pass toolname to the tool handler
hanishkvc Oct 12, 2025
0ed8329
SimpleChatTC: Cleanup the function description a bit
hanishkvc Oct 12, 2025
aa81f51
SimpleChatTC: Update the readme.md wrt tool calling a bit
hanishkvc Oct 12, 2025
5ed2bc3
SimpleChatTC: ToolCall hs info in normal assistant-user chat flow
hanishkvc Oct 12, 2025
619d64d
SimpleChatTC: Add ui elements for tool call verify and trigger
hanishkvc Oct 12, 2025
226aa7d
SimpleChatTC: Let user trigger tool call, instead of automatic
hanishkvc Oct 12, 2025
2aabca2
SimpleChatTC: Update readme with bit more details, Cleaner UI
hanishkvc Oct 12, 2025
90b2491
SimpleChatTC: Tool Calling UI elements use up horizontal space
hanishkvc Oct 12, 2025
a8eadc4
SimpleChatTC: Update readme wrt --jinja argument and bit more
hanishkvc Oct 12, 2025
70bc1b4
SimpleChatTC: Move console.log trapping into its own module
hanishkvc Oct 13, 2025
f8ebe8f
SimpleChatTC:ToolsConsole:Cleanup a bit, add basic set of notes
hanishkvc Oct 13, 2025
cc60600
SimpleChatTC: Initial skeleton of a simple toolsworker
hanishkvc Oct 13, 2025
7ea9bf6
SimpleChatTC: Pass around structured objects wrt tool worker
hanishkvc Oct 13, 2025
4664748
SimpleChatTC: Actual tool call implementations simplified
hanishkvc Oct 13, 2025
5933b28
SimpleChatTC: Get ready for decoupled tool call response
hanishkvc Oct 13, 2025
50be171
SimpleChatTC: Web worker flow initial go cleanup
hanishkvc Oct 13, 2025
44cfebc
SimpleChatTC: Increase the sliding window context to Last4 QA
hanishkvc Oct 13, 2025
6f137f2
SimpleChatTC: Update readme.md wrt latest updates. 2k maxtokens
hanishkvc Oct 13, 2025
dbf050c
SimpleChatTC: update descs to indicate use of web workers
hanishkvc Oct 13, 2025
39c1c01
SimpleChatTC:ChatMessage: AssistantResponse into chat message class
hanishkvc Oct 14, 2025
e9a7871
SimpleChatTC:ChatMessageEx: UpdateStream logic
hanishkvc Oct 14, 2025
340ae0c
SimpleChatTC:ChatMessageEx:cleanup, HasToolCalls, ContentEquiv
hanishkvc Oct 14, 2025
0629f79
SimpleChatTC:ChatMessage: remove ResponseExtractStream
hanishkvc Oct 14, 2025
bb25aa0
SimpleChatTC:ChatMessageEx: add update_oneshot
hanishkvc Oct 14, 2025
ae00cb2
SimpleChatTC:ChatMessageEx: ods load, system prompt related
hanishkvc Oct 14, 2025
e1e1d42
SimpleChatTC:ChatMessageEx: RecentChat, GetSystemLatest
hanishkvc Oct 14, 2025
3b73b00
SimpleChatTC:ChatMessageEx: Upd Add, rm sysPromptAtBeginOnly hlpr
hanishkvc Oct 14, 2025
aa80bf0
SimpleChatTC:ChatMessageEx: Recent chat users upd
hanishkvc Oct 14, 2025
755505e
SimpleChatTC:ChatMessageEx: Cleanup remaining stuff
hanishkvc Oct 14, 2025
7fb5526
SimpleChatTC:Load allows old and new ChatMessage(Ex) formats
hanishkvc Oct 14, 2025
69cbc81
SimpleChatTC:ChatMessageEx: send tool_calls, only if needed
hanishkvc Oct 14, 2025
f379e65
SimpleChatTC:Propogate toolcall id through tool call chain
hanishkvc Oct 14, 2025
4efa232
SimpleChatTC:ChatMessageEx: Build tool role result fully
hanishkvc Oct 14, 2025
a644cb3
SimpleChatTC:ChatMessageEx:While at it also ns_delete
hanishkvc Oct 14, 2025
61c2314
SimpleChatTC:Readme: Updated wrt new relativelyProper toolCallsHS
hanishkvc Oct 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions tools/server/public_simplechat/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,19 @@
<p> You need to have javascript enabled.</p>
</div>

<hr>
<div id="tool-div">
<div>
<div class="sameline">
<textarea id="toolname-in" class="flex-grow" rows="1" placeholder="name of tool to run"></textarea>
<button id="tool-btn">run tool</button>
</div>
</div>
<div class="sameline">
<textarea id="toolargs-in" class="flex-grow" rows="2" placeholder="arguments to pass to the specified tool"></textarea>
</div>
</div>

<hr>
<div class="sameline">
<textarea id="user-in" class="flex-grow" rows="2" placeholder="enter your query to the ai model here" ></textarea>
Expand Down
137 changes: 129 additions & 8 deletions tools/server/public_simplechat/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ by Humans for All.

To run from the build dir

bin/llama-server -m path/model.gguf --path ../tools/server/public_simplechat
bin/llama-server -m path/model.gguf --path ../tools/server/public_simplechat --jinja

Continue reading for the details.

Expand All @@ -33,6 +33,10 @@ Allows developer/end-user to control some of the behaviour by updating gMe membe
console. Parallely some of the directly useful to end-user settings can also be changed using the provided
settings ui.

For GenAi/LLM models supporting tool / function calling, allows one to interact with them and explore use of
ai driven augmenting of the knowledge used for generating answers by using the predefined tools/functions.
The end user is provided control over tool calling and response submitting.

NOTE: Current web service api doesnt expose the model context length directly, so client logic doesnt provide
any adaptive culling of old messages nor of replacing them with summary of their content etal. However there
is a optional sliding window based chat logic, which provides a simple minded culling of old messages from
Expand Down Expand Up @@ -64,6 +68,16 @@ next run this web front end in tools/server/public_simplechat
* cd ../tools/server/public_simplechat
* python3 -m http.server PORT

### for tool calling

remember to

* pass --jinja to llama-server to enable tool calling support from the server ai engine end.

* enable bTools in the settings page of the client side gui.

* use a GenAi/LLM model which supports tool calling.

### using the front end

Open this simple web front end from your local browser
Expand All @@ -78,6 +92,7 @@ Once inside
* try trim garbage in response or not
* amount of chat history in the context sent to server/ai-model
* oneshot or streamed mode.
* use built in tool calling or not

* In completion mode
* one normally doesnt use a system prompt in completion mode.
Expand Down Expand Up @@ -116,6 +131,17 @@ Once inside
* the user input box will be disabled and a working message will be shown in it.
* if trim garbage is enabled, the logic will try to trim repeating text kind of garbage to some extent.

* tool calling flow when working with ai models which support tool / function calling
* if tool calling is enabled and the user query results in need for one of the builtin tools to be
called, then the ai response might include request for tool call.
* the SimpleChat client will show details of the tool call (ie tool name and args passed) requested
and allow the user to trigger it as is or after modifying things as needed.
NOTE: Tool sees the original tool call only, for now
* inturn returned / generated result is placed into user query entry text area with approriate tags
ie <tool_response> generated result with meta data </tool_response>
* if user is ok with the tool response, they can click submit to send the same to the GenAi/LLM.
User can even modify the response generated by the tool, if required, before submitting.

* just refresh the page, to reset wrt the chat history and or system prompt and start afresh.

* Using NewChat one can start independent chat sessions.
Expand Down Expand Up @@ -158,6 +184,19 @@ It is attached to the document object. Some of these can also be updated using t
inturn the machine goes into power saving mode or so, the platform may stop network connection,
leading to exception.

bTools - control whether tool calling is enabled or not

remember to enable this only for GenAi/LLM models which support tool/function calling.

the builtin tools' meta data is sent to the ai model in the requests sent to it.

inturn if the ai model requests a tool call to be made, the same will be done and the response
sent back to the ai model, under user control.

as tool calling will involve a bit of back and forth between ai assistant and end user, it is
recommended to set iRecentUserMsgCnt to 10 or more, so that enough context is retained during
chatting with ai models with tool support.

apiEP - select between /completions and /chat/completions endpoint provided by the server/ai-model.

bCompletionFreshChatAlways - whether Completion mode collates complete/sliding-window history when
Expand Down Expand Up @@ -201,10 +240,10 @@ It is attached to the document object. Some of these can also be updated using t
be set if needed using the settings ui.

iRecentUserMsgCnt - a simple minded SlidingWindow to limit context window load at Ai Model end.
This is disabled by default. However if enabled, then in addition to latest system message, only
the last/latest iRecentUserMsgCnt user messages after the latest system prompt and its responses
from the ai model will be sent to the ai-model, when querying for a new response. IE if enabled,
only user messages after the latest system message/prompt will be considered.
This is set to 10 by default. So in addition to latest system message, last/latest iRecentUserMsgCnt
user messages after the latest system prompt and its responses from the ai model will be sent
to the ai-model, when querying for a new response. Note that if enabled, only user messages after
the latest system message/prompt will be considered.

This specified sliding window user message count also includes the latest user query.
<0 : Send entire chat history to server
Expand Down Expand Up @@ -244,9 +283,11 @@ full chat history. This way if there is any response with garbage/repeatation, i
mess with things beyond the next question/request/query, in some ways. The trim garbage
option also tries to help avoid issues with garbage in the context to an extent.

Set max_tokens to 1024, so that a relatively large previous reponse doesnt eat up the space
available wrt next query-response. However dont forget that the server when started should
also be started with a model context size of 1k or more, to be on safe side.
Set max_tokens to 2048, so that a relatively large previous reponse doesnt eat up the space
available wrt next query-response. While parallely allowing a good enough context size for
some amount of the chat history in the current session to influence future answers. However
dont forget that the server when started should also be started with a model context size of
2k or more, to be on safe side.

The /completions endpoint of tools/server doesnt take max_tokens, instead it takes the
internal n_predict, for now add the same here on the client side, maybe later add max_tokens
Expand Down Expand Up @@ -281,6 +322,86 @@ NOTE: Not tested, as there is no free tier api testing available. However logica
work.


### Tool Calling

ALERT: The simple minded way in which this is implemented, it can be dangerous in the worst case,
Always remember to verify all the tool calls requested and the responses generated manually to
ensure everything is fine, during interaction with ai models with tools support.

#### Builtin Tools

The following tools/functions are currently provided by default
* simple_calculator - which can solve simple arithmatic expressions
* run_javascript_function_code - which can be used to run some javascript code in the browser
context.

Currently the generated code / expression is run through a simple minded eval inside a web worker
mechanism. Use of WebWorker helps avoid exposing browser global scope to the generated code directly.
However any shared web worker scope isnt isolated. Either way always remember to cross check the tool
requests and generated responses when using tool calling.

May add
* web_fetch along with a corresponding simple local web proxy/caching server logic that can bypass
the CORS restrictions applied if trying to directly fetch from the browser js runtime environment.
Inturn maybe with a white list of allowed sites to access or so.


#### Extending with new tools

Provide a descriptive meta data explaining the tool / function being provided for tool calling,
as well as its arguments.

Provide a handler which should implement the specified tool / function call or rather constructs
the code to be run to get the tool / function call job done, and inturn pass the same to the
provided web worker to get it executed. Remember to use console.log while generating any response
that should be sent back to the ai model, in your constructed code.

Update the tc_switch to include a object entry for the tool, which inturn includes
* the meta data as well as
* a reference to the handler and also
the handler should take toolCallId, toolName and toolArgs and pass these along to
web worker as needed.
* the result key (was used previously, may use in future, but for now left as is)

#### OLD: Mapping tool calls and responses to normal assistant - user chat flow

Instead of maintaining tool_call request and resultant response in logically seperate parallel
channel used for requesting tool_calls by the assistant and the resulstant tool role response,
the SimpleChatTC pushes it into the normal assistant - user chat flow itself, by including the
tool call and response as a pair of tagged request with details in the assistant block and inturn
tagged response in the subsequent user block.

This allows the GenAi/LLM to be aware of the tool calls it made as well as the responses it got,
so that it can incorporate the results of the same in the subsequent chat / interactions.

NOTE: This flow tested to be ok enough with Gemma-3N-E4B-it-Q8_0 LLM ai model for now. Logically
given the way current ai models work, most of them should understand things as needed, but need
to test this with other ai models later.

TODO:OLD: Need to think later, whether to continue this simple flow, or atleast use tool role wrt
the tool call responses or even go further and have the logically seperate tool_calls request
structures also.

DONE: rather both tool_calls structure wrt assistant messages and tool role based tool call
result messages are generated as needed.


#### ToDo

WebFetch and Local web proxy/caching server

Try and trap promises based flows to ensure all generated results or errors if any are caught
before responding back to the ai model.

Trap error responses.

### Debuging the handshake

When working with llama.cpp server based GenAi/LLM running locally

sudo tcpdump -i lo -s 0 -vvv -A host 127.0.0.1 and port 8080 | tee /tmp/td.log


## At the end

Also a thank you to all open source and open model developers, who strive for the common good.
3 changes: 3 additions & 0 deletions tools/server/public_simplechat/simplechat.css
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@
.role-user {
background-color: lightgray;
}
.role-tool {
background-color: lightyellow;
}
.role-trim {
background-color: lightpink;
}
Expand Down
Loading