ggml-org · hanishkvc · Oct 10, 2025 · Oct 10, 2025 · Oct 10, 2025 · Oct 10, 2025
diff --git a/tools/server/public_simplechat/index.html b/tools/server/public_simplechat/index.html
@@ -40,6 +40,19 @@
                 <p> You need to have javascript enabled.</p>
             </div>
 
+            <hr>
+            <div id="tool-div">
+                <div>
+                <div class="sameline">
+                    <textarea id="toolname-in" class="flex-grow" rows="1" placeholder="name of tool to run"></textarea>
+                    <button id="tool-btn">run tool</button>
+                </div>
+                </div>
+                <div class="sameline">
+                <textarea id="toolargs-in" class="flex-grow" rows="2" placeholder="arguments to pass to the specified tool"></textarea>
+                </div>
+            </div>
+
             <hr>
             <div class="sameline">
                 <textarea id="user-in" class="flex-grow" rows="2" placeholder="enter your query to the ai model here" ></textarea>

diff --git a/tools/server/public_simplechat/readme.md b/tools/server/public_simplechat/readme.md
@@ -7,7 +7,7 @@ by Humans for All.
 
 To run from the build dir
 
-bin/llama-server -m path/model.gguf --path ../tools/server/public_simplechat
+bin/llama-server -m path/model.gguf --path ../tools/server/public_simplechat --jinja
 
 Continue reading for the details.
 
@@ -33,6 +33,10 @@ Allows developer/end-user to control some of the behaviour by updating gMe membe
 console. Parallely some of the directly useful to end-user settings can also be changed using the provided
 settings ui.
 
+For GenAi/LLM models supporting tool / function calling, allows one to interact with them and explore use of
+ai driven augmenting of the knowledge used for generating answers by using the predefined tools/functions.
+The end user is provided control over tool calling and response submitting.
+
 NOTE: Current web service api doesnt expose the model context length directly, so client logic doesnt provide
 any adaptive culling of old messages nor of replacing them with summary of their content etal. However there
 is a optional sliding window based chat logic, which provides a simple minded culling of old messages from
@@ -64,6 +68,16 @@ next run this web front end in tools/server/public_simplechat
 * cd ../tools/server/public_simplechat
 * python3 -m http.server PORT
 
+### for tool calling
+
+remember to
+
+* pass --jinja to llama-server to enable tool calling support from the server ai engine end.
+
+* enable bTools in the settings page of the client side gui.
+
+* use a GenAi/LLM model which supports tool calling.
+
 ### using the front end
 
 Open this simple web front end from your local browser
@@ -78,6 +92,7 @@ Once inside
   * try trim garbage in response or not
   * amount of chat history in the context sent to server/ai-model
   * oneshot or streamed mode.
+  * use built in tool calling or not
 
 * In completion mode
   * one normally doesnt use a system prompt in completion mode.
@@ -116,6 +131,17 @@ Once inside
   * the user input box will be disabled and a working message will be shown in it.
   * if trim garbage is enabled, the logic will try to trim repeating text kind of garbage to some extent.
 
+* tool calling flow when working with ai models which support tool / function calling
+  * if tool calling is enabled and the user query results in need for one of the builtin tools to be
+    called, then the ai response might include request for tool call.
+  * the SimpleChat client will show details of the tool call (ie tool name and args passed) requested
+    and allow the user to trigger it as is or after modifying things as needed.
+    NOTE: Tool sees the original tool call only, for now
+  * inturn returned / generated result is placed into user query entry text area with approriate tags
+    ie <tool_response> generated result with meta data </tool_response>
+  * if user is ok with the tool response, they can click submit to send the same to the GenAi/LLM.
+    User can even modify the response generated by the tool, if required, before submitting.
+
 * just refresh the page, to reset wrt the chat history and or system prompt and start afresh.
 
 * Using NewChat one can start independent chat sessions.
@@ -158,6 +184,19 @@ It is attached to the document object. Some of these can also be updated using t
       inturn the machine goes into power saving mode or so, the platform may stop network connection,
       leading to exception.
 
+  bTools - control whether tool calling is enabled or not
+
+    remember to enable this only for GenAi/LLM models which support tool/function calling.
+
+    the builtin tools' meta data is sent to the ai model in the requests sent to it.
+
+    inturn if the ai model requests a tool call to be made, the same will be done and the response
+    sent back to the ai model, under user control.
+
+    as tool calling will involve a bit of back and forth between ai assistant and end user, it is
+    recommended to set iRecentUserMsgCnt to 10 or more, so that enough context is retained during
+    chatting with ai models with tool support.
+
   apiEP - select between /completions and /chat/completions endpoint provided by the server/ai-model.
 
   bCompletionFreshChatAlways - whether Completion mode collates complete/sliding-window history when
@@ -201,10 +240,10 @@ It is attached to the document object. Some of these can also be updated using t
   be set if needed using the settings ui.
 
   iRecentUserMsgCnt - a simple minded SlidingWindow to limit context window load at Ai Model end.
-  This is disabled by default. However if enabled, then in addition to latest system message, only
-  the last/latest iRecentUserMsgCnt user messages after the latest system prompt and its responses
-  from the ai model will be sent to the ai-model, when querying for a new response. IE if enabled,
-  only user messages after the latest system message/prompt will be considered.
+  This is set to 10 by default. So in addition to latest system message, last/latest iRecentUserMsgCnt
+  user messages after the latest system prompt and its responses from the ai model will be sent
+  to the ai-model, when querying for a new response. Note that if enabled, only user messages after
+  the latest system message/prompt will be considered.
 
     This specified sliding window user message count also includes the latest user query.
     <0 : Send entire chat history to server
@@ -244,9 +283,11 @@ full chat history. This way if there is any response with garbage/repeatation, i
 mess with things beyond the next question/request/query, in some ways. The trim garbage
 option also tries to help avoid issues with garbage in the context to an extent.
 
-Set max_tokens to 1024, so that a relatively large previous reponse doesnt eat up the space
-available wrt next query-response. However dont forget that the server when started should
-also be started with a model context size of 1k or more, to be on safe side.
+Set max_tokens to 2048, so that a relatively large previous reponse doesnt eat up the space
+available wrt next query-response. While parallely allowing a good enough context size for
+some amount of the chat history in the current session to influence future answers. However
+dont forget that the server when started should also be started with a model context size of
+2k or more, to be on safe side.
 
   The /completions endpoint of tools/server doesnt take max_tokens, instead it takes the
   internal n_predict, for now add the same here on the client side, maybe later add max_tokens
@@ -281,6 +322,86 @@ NOTE: Not tested, as there is no free tier api testing available. However logica
 work.
 
 
+### Tool Calling
+
+ALERT: The simple minded way in which this is implemented, it can be dangerous in the worst case,
+Always remember to verify all the tool calls requested and the responses generated manually to
+ensure everything is fine, during interaction with ai models with tools support.
+
+#### Builtin Tools
+
+The following tools/functions are currently provided by default
+* simple_calculator - which can solve simple arithmatic expressions
+* run_javascript_function_code - which can be used to run some javascript code in the browser
+  context.
+
+Currently the generated code / expression is run through a simple minded eval inside a web worker
+mechanism. Use of WebWorker helps avoid exposing browser global scope to the generated code directly.
+However any shared web worker scope isnt isolated. Either way always remember to cross check the tool
+requests and generated responses when using tool calling.
+
+May add
+* web_fetch along with a corresponding simple local web proxy/caching server logic that can bypass
+  the CORS restrictions applied if trying to directly fetch from the browser js runtime environment.
+  Inturn maybe with a white list of allowed sites to access or so.
+
+
+#### Extending with new tools
+
+Provide a descriptive meta data explaining the tool / function being provided for tool calling,
+as well as its arguments.
+
+Provide a handler which should implement the specified tool / function call or rather constructs
+the code to be run to get the tool / function call job done, and inturn pass the same to the
+provided web worker to get it executed. Remember to use console.log while generating any response
+that should be sent back to the ai model, in your constructed code.
+
+Update the tc_switch to include a object entry for the tool, which inturn includes
+* the meta data as well as
+* a reference to the handler and also
+  the handler should take toolCallId, toolName and toolArgs and pass these along to
+  web worker as needed.
+* the result key (was used previously, may use in future, but for now left as is)
+
+#### OLD: Mapping tool calls and responses to normal assistant - user chat flow
+
+Instead of maintaining tool_call request and resultant response in logically seperate parallel
+channel used for requesting tool_calls by the assistant and the resulstant tool role response,
+the SimpleChatTC pushes it into the normal assistant - user chat flow itself, by including the
+tool call and response as a pair of tagged request with details in the assistant block and inturn
+tagged response in the subsequent user block.
+
+This allows the GenAi/LLM to be aware of the tool calls it made as well as the responses it got,
+so that it can incorporate the results of the same in the subsequent chat / interactions.
+
+NOTE: This flow tested to be ok enough with Gemma-3N-E4B-it-Q8_0 LLM ai model for now. Logically
+given the way current ai models work, most of them should understand things as needed, but need
+to test this with other ai models later.
+
+TODO:OLD: Need to think later, whether to continue this simple flow, or atleast use tool role wrt
+the tool call responses or even go further and have the logically seperate tool_calls request
+structures also.
+
+DONE: rather both tool_calls structure wrt assistant messages and tool role based tool call
+result messages are generated as needed.
+
+
+#### ToDo
+
+WebFetch and Local web proxy/caching server
+
+Try and trap promises based flows to ensure all generated results or errors if any are caught
+before responding back to the ai model.
+
+Trap error responses.
+
+### Debuging the handshake
+
+When working with llama.cpp server based GenAi/LLM running locally
+
+sudo tcpdump -i lo -s 0 -vvv -A host 127.0.0.1 and port 8080 | tee /tmp/td.log
+
+
 ## At the end
 
 Also a thank you to all open source and open model developers, who strive for the common good.
diff --git a/tools/server/public_simplechat/simplechat.css b/tools/server/public_simplechat/simplechat.css
@@ -21,6 +21,9 @@
 .role-user {
     background-color: lightgray;
 }
+.role-tool {
+    background-color: lightyellow;
+}
 .role-trim {
     background-color: lightpink;
 }