# Quick Model Download and Deployment with Kamiwaza SDK

This notebook demonstrates the simplified approach to download and deploy models using Kamiwaza SDK's all-in-one function. While the previous notebook showed the step-by-step process, Kamiwaza has packaged everything into a single convenient function to streamline the workflow.

The `download_and_deploy_model` function handles:
1. Finding the model
2. Downloading the appropriate files
3. Waiting for download completion
4. Deploying the model
5. Setting up the endpoint

All with just one line of code!

In [2]:
from kamiwaza_client import KamiwazaClient
client = KamiwazaClient("http://localhost:7777/api/")

## Search for a Model

Before downloading, we can search for the model to view available quantization options. This step is optional but helpful to see what's available.

In [13]:
hf_repo='bartowski/Llama-3-8B-Instruct-Coder-v2-GGUF'
client.models.search_models(hf_repo, exact=True)

[Model: Llama-3-8B-Instruct-Coder-v2-GGUF
 Repo ID: bartowski/Llama-3-8B-Instruct-Coder-v2-GGUF
 Files: 26 available
   GITATTRIBUTES files: 1
   GGUF files: 23
   IMATRIX files: 1
   MD files: 1
 Available quantizations:
   - fp16
   - iq1_m
   - iq1_s
   - iq2_m
   - iq2_s
   - iq2_xs
   - iq2_xxs
   - iq3_m
   - iq3_s
   - iq3_xs
   - iq3_xxs
   - iq4_nl
   - iq4_xs
   - q2_k
   - q3_k
   - q4_k
   - q5_k
   - q6_k
   - q8_0
 Files: 0 downloading]

## Download and Deploy in One Step

Now for the simplified approach - we can download and deploy the model in a single function call. This function:

- Initiates the download with the specified quantization
- Monitors download progress with real-time updates
- Automatically deploys the model once download is complete
- Returns complete information about the model and deployment

You can specify any quantization level shown in the search results, or omit the parameter to use the default best option for your hardware.


In [14]:
client.models.download_and_deploy_model(hf_repo, quantization = 'q6_k')

Initiating download for bartowski/Llama-3-8B-Instruct-Coder-v2-GGUF with quantization q6_k...
Model files for bartowski/Llama-3-8B-Instruct-Coder-v2-GGUF are already downloaded.
Deploying model bartowski/Llama-3-8B-Instruct-Coder-v2-GGUF...
Model bartowski/Llama-3-8B-Instruct-Coder-v2-GGUF successfully deployed!


{'model': Model: Llama-3-8B-Instruct-Coder-v2-GGUF
 Repo ID: bartowski/Llama-3-8B-Instruct-Coder-v2-GGUF
 ID: d4d8e9ef-c465-4213-b870-b1e193b74c0c
 Created: 2025-03-06 19:31:12.933542
 Files: 26 available
   GGUF files: 23
   GITATTRIBUTES files: 1
   IMATRIX files: 1
   MD files: 1
 Files: 0 downloading,
 'target_files': [ModelFile: Llama-3-8B-Instruct-Coder-v2-IQ2_S.gguf
  ID: 06a67c8f-23ee-4277-a59b-a3b96d17a800
  Size: 2.57 GB,
  ModelFile: Llama-3-8B-Instruct-Coder-v2-Q4_K_M.gguf
  ID: 258665e7-8530-478c-a77f-a3a420037118
  Size: 4.58 GB,
  ModelFile: Llama-3-8B-Instruct-Coder-v2-fp16.gguf
  ID: 25f19f25-4569-4fce-8c81-48cd89e96254
  Size: 14.97 GB,
  ModelFile: Llama-3-8B-Instruct-Coder-v2-Q5_K_M.gguf
  ID: 26417de8-92f9-445b-8fcb-689fd993ed2d
  Size: 5.34 GB,
  ModelFile: Llama-3-8B-Instruct-Coder-v2-IQ2_XS.gguf
  ID: 27ce0dd6-8773-4a05-8376-b4b08c740d65
  Size: 2.43 GB,
  ModelFile: Llama-3-8B-Instruct-Coder-v2-IQ3_S.gguf
  ID: 3138718d-e64c-4ed3-b4a4-d243032daaa3
  Size: 3.43 

## Set Up the OpenAI Client

Once the model is deployed, we can get an OpenAI-compatible client to interact with it, just like in the step-by-step approach.

In [15]:
openai_client = client.openai.get_client(repo_id=hf_repo)

In [18]:
# Create a streaming chat completion
response = openai_client.chat.completions.create(
    messages=[
        {"role": "user", "content": "How many r's are in the word 'strawberry'? ONLY RESPOND WITH A SINGLE NUMBER"}
    ],
    model="model",
    stream=True 
)

# display the stream
for chunk in response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)


2025-03-07 13:55:29,195 - httpx - INFO - HTTP Request: POST http://localhost:51123/v1/chat/completions "HTTP/1.1 200 OK"


3

## Summary

The `download_and_deploy_model` function provides a streamlined way to go from finding a model to using it for inference in minimal steps. This is especially useful for:

- Quick experimentation with different models
- Simplified deployment workflows
- Reducing boilerplate code in applications

When you're done, you can still stop the deployment using `client.serving.stop_deployment(repo_id=hf_repo)` as shown in the first notebook.