In [4]:
from llama_stack_client import LlamaStackClient
from IPython.display import display

client = LlamaStackClient(
    base_url="http://localhost:5001",
)

response = client.inference.chat_completion(
    model_id="Llama3.2-3B-Instruct",
    messages=[
        {"role": "user", "content": "Hello, world client!"},
    ],
    # stream=True,
)

display(response)

# for chunk in response:
    # print(chunk)


ChatCompletionResponse(completion_message=CompletionMessage(content='Hello!', role='assistant', stop_reason='end_of_turn', tool_calls=[]), logprobs=None)

In [5]:
client.providers.list()

{'inference': [ProviderInfo(provider_id='groq', provider_type='remote::groq')],
 'memory': [ProviderInfo(provider_id='faiss', provider_type='inline::faiss')],
 'safety': [ProviderInfo(provider_id='llama-guard', provider_type='inline::llama-guard')],
 'agents': [ProviderInfo(provider_id='meta-reference', provider_type='inline::meta-reference')],
 'telemetry': [ProviderInfo(provider_id='meta-reference', provider_type='inline::meta-reference')],
 'eval': [ProviderInfo(provider_id='meta-reference', provider_type='inline::meta-reference')],
 'datasetio': [ProviderInfo(provider_id='localfs', provider_type='inline::localfs')],
 'scoring': [ProviderInfo(provider_id='basic', provider_type='inline::basic')]}

In [6]:
response = client.inference.chat_completion(
    model_id="Llama3.2-3B-Instruct",
    messages=[
        # {"role": "user", "content": "Explain to me how ASGI in python works"},
        {"role": "user", "content": "Hello World"},
    ],
    stream=True,
)

for chunk in response:
    print(chunk.event.delta, end='')


Hello World!

In [7]:
from llama_models.datatypes import SamplingParams

response = client.inference.chat_completion(
    model_id="Llama3.2-3B-Instruct",
    messages=[
        {"role": "user", "content": "Explain to me how ASGI in python works"},
    ],
    stream=True,
    sampling_params=SamplingParams(
        temperature=0,
    ),
)

for chunk in response:
    print(chunk.event.delta, end='')

ASGI (Asynchronous Server Gateway Interface) is a specification for building asynchronous web servers and applications in Python. It provides a standard way for web frameworks to communicate with web servers, allowing for efficient and scalable handling of HTTP requests.

Here's a high-level overview of how ASGI works:

**The ASGI Spec**

The ASGI spec defines a simple, text-based protocol for communication between a web server and a web application. It consists of three main components:

1. **ASGI protocol**: A text-based protocol that defines the format of messages exchanged between the web server and the web application.
2. **ASGI application**: A Python callable that implements the ASGI protocol and handles incoming HTTP requests.
3. **ASGI server**: A Python module that implements the ASGI protocol and manages the communication between the web server and the ASGI application.

**The ASGI Request-Response Cycle**

Here's a step-by-step breakdown of the ASGI request-response cycle:


In [8]:
response = client.inference.chat_completion(
    model_id="Llama3.2-3B-Instruct",
    messages=[
        {"role": "user", "content": "Explain to me how ASGI in python works"},
    ],
    stream=True,
    sampling_params=SamplingParams(
        top_p=1
    ),
)

for chunk in response:
    print(chunk.event.delta, end='')

ASGI (Asynchronous Server Gateway Interface) is a standard for Python web servers that allows them to communicate with web frameworks and other applications in an asynchronous manner. It's designed to be a replacement for the older WSGI (Web Server Gateway Interface) standard, which was synchronous.

Here's a high-level overview of how ASGI works:

**The Problem with WSGI**

WSGI was introduced in the early 2000s as a way for web frameworks to communicate with web servers. However, WSGI is synchronous, meaning that it blocks the execution of the web server until the web framework has finished processing the request. This can lead to performance issues and scalability problems, especially in modern web applications that require handling many concurrent requests.

**The Solution: ASGI**

ASGI addresses the limitations of WSGI by introducing an asynchronous programming model. Instead of blocking the execution of the web server, ASGI allows the web framework to yield control back to the we

In [9]:
response = client.inference.chat_completion(
    model_id="Llama3.2-3B-Instruct",
    messages=[
        {"role": "user", "content": "Explain to me how ASGI in python works"},
    ],
    stream=True,
    sampling_params=SamplingParams(
        max_tokens=50
    ),
)

for chunk in response:
    print(chunk.event.delta, end='')

ASGI (Asynchronous Server Gateway Interface) is a specification for building asynchronous web servers and applications in Python. It's similar to the WSGI (Web Server Gateway Interface) specification, but designed for use with asynchronous code.

Here's a high

In [10]:
response = client.inference.chat_completion(
    model_id="Llama3.2-3B-Instruct",
    messages=[
        {"role": "user", "content": "Explain to me how ASGI in python works"},
    ],
    stream=True,
    sampling_params=SamplingParams(
        max_tokens=50
    ),
    logprobs={
        "top_k": 10,
    },
)

for chunk in response:
    print(chunk.event.delta, end='')


ASGI (Asynchronous Server Gateway Interface) is a standard for Python web servers that allows them to communicate with web frameworks and other applications in an asynchronous manner. It's designed to be a replacement for the older WSGI (Web Server Gateway Interface

In [11]:
# response = client.inference.chat_completion(
#     model_id="Llama3.2-3B-Instruct",
#     messages=[
#         {"role": "user", "content": "When's the next flight from Adelaide to Sydney?"},
#     ],
#     # stream=True,
#     tools=[
#         {
#             "tool_name": "get_flight_info",
#             "description": "Get the flight information for a given origin and destination",
#             "parameters": {
#                 "origin": {
#                     "param_type": "string",
#                     "description": "The origin airport code. E.g., AU",
#                     "required": True,
#                 },
#                 "destination": {
#                     "param_type": "string",
#                     "description": "The destination airport code. E.g., 'LAX'",
#                     "required": True,
#                 },
#             }
#         }
#     ]
# )

# # for chunk in response:
# #     print(chunk.event.delta, end='')
# response

```
ChatCompletion(
    id='chatcmpl-7f14606b-d091-4b12-9d13-e95831f04301',
    choices=[
        Choice(
            finish_reason='tool_calls',
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content=None,
                role='assistant',
                function_call=None,
                tool_calls=[ChatCompletionMessageToolCall(id='call_4qg1', function=Function(arguments='{"origin":"ADL","destination":"SYD"}', name='get_flight_info'), type='function')]
            )
        )
    ],
    created=1733917567,
    model='llama3-8b-8192',
    object='chat.completion',
    system_fingerprint='fp_a97cfe35ae',
    usage=CompletionUsage(completion_tokens=76, prompt_tokens=972, total_tokens=1048, completion_time=0.063333333, prompt_time=0.11611327, queue_time=0.0061331509999999895, total_time=0.179446603),
    x_groq={'id': 'req_01jetrmtcmfs89v7qyw8fdx1v0'}
)
```


In [12]:
# response = client.inference.chat_completion(
#     model_id="Llama3.2-3B-Instruct",
#     messages=[
#         {"role": "user", "content": "When's the next flight from Adelaide to Sydney?"},
#     ],
#     stream=True,
#     tools=[
#         {
#             "tool_name": "get_flight_info",
#             "description": "Get the flight information for a given origin and destination",
#             "parameters": {
#                 "origin": {
#                     "param_type": "string",
#                     "description": "The origin airport code. E.g., AU",
#                     "required": True,
#                 },
#                 "destination": {
#                     "param_type": "string",
#                     "description": "The destination airport code. E.g., 'LAX'",
#                     "required": True,
#                 },
#             }
#         }
#     ]
# )

# for chunk in response:
#     print(chunk.event.delta, end='')

```
ChatCompletionChunk(
    id='chatcmpl-189b0530-6bcb-4089-bad7-65f73104b182', 
    choices=[
        Choice(
            delta=ChoiceDelta(content=None, function_call=None, role='assistant', tool_calls=None), 
            finish_reason=None, 
            index=0, 
            logprobs=None
        )
    ], 
    created=1733955177, 
    model='llama3-8b-8192', 
    object='chat.completion.chunk', 
    system_fingerprint='fp_a97cfe35ae', 
    usage=None, 
    x_groq=XGroq(id='req_01jevwgjx2f3maj4rbzaaexagx', usage=None, error=None))
```
