llamastack
diff --git a/‎.github/workflows/publish-pypi.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/publish-pypi.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/release-doctor.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/release-doctor.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.stats.yml‎
Lines changed: 4 additions & 4 deletions b/‎.stats.yml‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎README.md‎
Lines changed: 124 additions & 53 deletions b/‎README.md‎
Lines changed: 124 additions & 53 deletions
diff --git a/‎SECURITY.md‎
Lines changed: 4 additions & 0 deletions b/‎SECURITY.md‎
Lines changed: 4 additions & 0 deletions
@@ -28,4 +28,4 @@ jobs:
         run: |
           bash ./bin/publish-pypi
         env:
-          PYPI_TOKEN: ${{ secrets.LLAMA_STACK_CLIENT_PYPI_TOKEN || secrets.PYPI_TOKEN }}
+          PYPI_TOKEN: ${{ secrets.LLAMA_STACK_PYPI_TOKEN || secrets.PYPI_TOKEN }}
@@ -18,4 +18,4 @@ jobs:
         run: |
           bash ./bin/check-release-environment
         env:
-          PYPI_TOKEN: ${{ secrets.LLAMA_STACK_CLIENT_PYPI_TOKEN || secrets.PYPI_TOKEN }}
+          PYPI_TOKEN: ${{ secrets.LLAMA_STACK_PYPI_TOKEN || secrets.PYPI_TOKEN }}
@@ -1,4 +1,4 @@
-configured_endpoints: 91
-openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/llamastack%2Fllama-stack-client-0e756984d87c3fd1eb96d486947b3bc2086d5afcf299e8119b6b89bbd86dbe75.yml
-openapi_spec_hash: 7c519a25bb9a094d4b4bda17bb20dd88
-config_hash: d1f21dfdbf5d9925eecf56b6c1fab755
+configured_endpoints: 96
+openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/llamastack%2Fllama-stack-client-df7a19394e9124c18ec4e888e2856d22b5ebfd6fe6fe6e929ff6cfadb2ae7e2a.yml
+openapi_spec_hash: 9428682672fdd7e2afee7af9ef849dc9
+config_hash: c2377844063fe8b7c43d8b79522fa6fc
@@ -10,7 +10,7 @@ It is generated with [Stainless](https://www.stainless.com/).
 
 ## Documentation
 
-The full API of this library can be found in [api.md](api.md).
+The REST API documentation can be found on [llama-stack.readthedocs.io](https://llama-stack.readthedocs.io/en/latest/). The full API of this library can be found in [api.md](api.md).
 
 ## Installation
 
@@ -27,43 +27,37 @@ pip install git+ssh://git@github.com/llamastack/llama-stack-client-python.git
 The full API of this library can be found in [api.md](api.md).
 
 ```python
-import os
 from llama_stack_client import LlamaStackClient
 
-client = LlamaStackClient(
-    api_key=os.environ.get("LLAMA_STACK_CLIENT_API_KEY"),  # This is the default and can be omitted
-)
+client = LlamaStackClient()
 
-client.datasetio.append_rows(
-    dataset_id="REPLACE_ME",
-    rows=[{"foo": True}],
+model = client.models.register(
+    model_id="model_id",
 )
+print(model.identifier)
 ```
 
 While you can provide an `api_key` keyword argument,
 we recommend using [python-dotenv](https://pypi.org/project/python-dotenv/)
-to add `LLAMA_STACK_CLIENT_API_KEY="My API Key"` to your `.env` file
+to add `LLAMA_STACK_API_KEY="My API Key"` to your `.env` file
 so that your API Key is not stored in source control.
 
 ## Async usage
 
 Simply import `AsyncLlamaStackClient` instead of `LlamaStackClient` and use `await` with each API call:
 
 ```python
-import os
 import asyncio
 from llama_stack_client import AsyncLlamaStackClient
 
-client = AsyncLlamaStackClient(
-    api_key=os.environ.get("LLAMA_STACK_CLIENT_API_KEY"),  # This is the default and can be omitted
-)
+client = AsyncLlamaStackClient()
 
 
 async def main() -> None:
-    await client.datasetio.append_rows(
-        dataset_id="REPLACE_ME",
-        rows=[{"foo": True}],
+    model = await client.models.register(
+        model_id="model_id",
     )
+    print(model.identifier)
 
 
 asyncio.run(main())
@@ -85,28 +79,68 @@ pip install 'llama_stack_client[aiohttp] @ git+ssh://git@github.com/llamastack/l
 Then you can enable it by instantiating the client with `http_client=DefaultAioHttpClient()`:
 
 ```python
-import os
 import asyncio
 from llama_stack_client import DefaultAioHttpClient
 from llama_stack_client import AsyncLlamaStackClient
 
 
 async def main() -> None:
     async with AsyncLlamaStackClient(
-        api_key=os.environ.get(
-            "LLAMA_STACK_CLIENT_API_KEY"
-        ),  # This is the default and can be omitted
         http_client=DefaultAioHttpClient(),
     ) as client:
-        await client.datasetio.append_rows(
-            dataset_id="REPLACE_ME",
-            rows=[{"foo": True}],
+        model = await client.models.register(
+            model_id="model_id",
         )
+        print(model.identifier)
 
 
 asyncio.run(main())
 ```
 
+## Streaming responses
+
+We provide support for streaming responses using Server Side Events (SSE).
+
+```python
+from llama_stack_client import LlamaStackClient
+
+client = LlamaStackClient()
+
+stream = client.inference.chat_completion(
+    messages=[
+        {
+            "content": "string",
+            "role": "user",
+        }
+    ],
+    model_id="model_id",
+    stream=True,
+)
+for chat_completion_response in stream:
+    print(chat_completion_response.completion_message)
+```
+
+The async client uses the exact same interface.
+
+```python
+from llama_stack_client import AsyncLlamaStackClient
+
+client = AsyncLlamaStackClient()
+
+stream = await client.inference.chat_completion(
+    messages=[
+        {
+            "content": "string",
+            "role": "user",
+        }
+    ],
+    model_id="model_id",
+    stream=True,
+)
+async for chat_completion_response in stream:
+    print(chat_completion_response.completion_message)
+```
+
 ## Using types
 
 Nested request parameters are [TypedDicts](https://docs.python.org/3/library/typing.html#typing.TypedDict). Responses are [Pydantic models](https://docs.pydantic.dev) which also provide helper methods for things like:
@@ -125,21 +159,37 @@ from llama_stack_client import LlamaStackClient
 
 client = LlamaStackClient()
 
-response = client.inference.batch_chat_completion(
-    messages_batch=[
-        [
-            {
-                "content": "string",
-                "role": "user",
-            }
-        ]
+chat_completion_response = client.inference.chat_completion(
+    messages=[
+        {
+            "content": "string",
+            "role": "user",
+        }
     ],
     model_id="model_id",
     logprobs={},
 )
-print(response.logprobs)
+print(chat_completion_response.logprobs)
 ```
 
+## File uploads
+
+Request parameters that correspond to file uploads can be passed as `bytes`, or a [`PathLike`](https://docs.python.org/3/library/os.html#os.PathLike) instance or a tuple of `(filename, contents, media type)`.
+
+```python
+from pathlib import Path
+from llama_stack_client import LlamaStackClient
+
+client = LlamaStackClient()
+
+client.files.create(
+    file=Path("/path/to/file"),
+    purpose="assistants",
+)
+```
+
+The async client uses the exact same interface. If you pass a [`PathLike`](https://docs.python.org/3/library/os.html#os.PathLike) instance, the file contents will be read asynchronously automatically.
+
 ## Handling errors
 
 When the library is unable to connect to the API (for example, due to network connection problems or a timeout), a subclass of `llama_stack_client.APIConnectionError` is raised.
@@ -156,9 +206,14 @@ from llama_stack_client import LlamaStackClient
 client = LlamaStackClient()
 
 try:
-    client.datasetio.append_rows(
-        dataset_id="REPLACE_ME",
-        rows=[{"foo": True}],
+    client.inference.chat_completion(
+        messages=[
+            {
+                "content": "string",
+                "role": "user",
+            }
+        ],
+        model_id="model_id",
     )
 except llama_stack_client.APIConnectionError as e:
     print("The server could not be reached")
@@ -202,9 +257,14 @@ client = LlamaStackClient(
 )
 
 # Or, configure per-request:
-client.with_options(max_retries=5).datasetio.append_rows(
-    dataset_id="REPLACE_ME",
-    rows=[{"foo": True}],
+client.with_options(max_retries=5).inference.chat_completion(
+    messages=[
+        {
+            "content": "string",
+            "role": "user",
+        }
+    ],
+    model_id="model_id",
 )
 ```
 
@@ -228,9 +288,14 @@ client = LlamaStackClient(
 )
 
 # Override per-request:
-client.with_options(timeout=5.0).datasetio.append_rows(
-    dataset_id="REPLACE_ME",
-    rows=[{"foo": True}],
+client.with_options(timeout=5.0).inference.chat_completion(
+    messages=[
+        {
+            "content": "string",
+            "role": "user",
+        }
+    ],
+    model_id="model_id",
 )
 ```
 
@@ -244,10 +309,10 @@ Note that requests that time out are [retried twice by default](#retries).
 
 We use the standard library [`logging`](https://docs.python.org/3/library/logging.html) module.
 
-You can enable logging by setting the environment variable `LLAMA_STACK_CLIENT_LOG` to `info`.
+You can enable logging by setting the environment variable `LLAMA_STACK_LOG` to `info`.
 
 ```shell
-$ export LLAMA_STACK_CLIENT_LOG=info
+$ export LLAMA_STACK_LOG=info
 ```
 
 Or to `debug` for more verbose logging.
@@ -272,16 +337,17 @@ The "raw" Response object can be accessed by prefixing `.with_raw_response.` to
 from llama_stack_client import LlamaStackClient
 
 client = LlamaStackClient()
-response = client.datasetio.with_raw_response.append_rows(
-    dataset_id="REPLACE_ME",
-    rows=[{
-        "foo": True
+response = client.inference.with_raw_response.chat_completion(
+    messages=[{
+        "content": "string",
+        "role": "user",
     }],
+    model_id="model_id",
 )
 print(response.headers.get('X-My-Header'))
 
-datasetio = response.parse()  # get the object that `datasetio.append_rows()` would have returned
-print(datasetio)
+inference = response.parse()  # get the object that `inference.chat_completion()` would have returned
+print(inference.completion_message)
 ```
 
 These methods return an [`APIResponse`](https://github.com/llamastack/llama-stack-client-python/tree/main/src/llama_stack_client/_response.py) object.
@@ -295,9 +361,14 @@ The above interface eagerly reads the full response body when you make the reque
 To stream the response body, use `.with_streaming_response` instead, which requires a context manager and only reads the response body once you call `.read()`, `.text()`, `.json()`, `.iter_bytes()`, `.iter_text()`, `.iter_lines()` or `.parse()`. In the async client, these are async methods.
 
 ```python
-with client.datasetio.with_streaming_response.append_rows(
-    dataset_id="REPLACE_ME",
-    rows=[{"foo": True}],
+with client.inference.with_streaming_response.chat_completion(
+    messages=[
+        {
+            "content": "string",
+            "role": "user",
+        }
+    ],
+    model_id="model_id",
 ) as response:
     print(response.headers.get("X-My-Header"))
 
@@ -354,7 +425,7 @@ import httpx
 from llama_stack_client import LlamaStackClient, DefaultHttpxClient
 
 client = LlamaStackClient(
-    # Or use the `LLAMA_STACK_CLIENT_BASE_URL` env var
+    # Or use the `LLAMA_STACK_BASE_URL` env var
     base_url="http://my.test.server.example.com:8083",
     http_client=DefaultHttpxClient(
         proxy="http://my.test.proxy.example.com",
 
@@ -18,6 +18,10 @@ before making any information public.
 If you encounter security issues that are not directly related to SDKs but pertain to the services
 or products provided by Llama Stack Client, please follow the respective company's security reporting guidelines.
 
+### Llama Stack Client Terms and Policies
+
+Please contact llamastack@meta.com for any questions or concerns regarding the security of our services.
+
 ---
 
 Thank you for helping us keep the SDKs and systems they interact with secure.