# Quick Model Download and Deployment with Kamiwaza SDK

This notebook demonstrates the simplified approach to download and deploy models using Kamiwaza SDK's all-in-one function. While the previous notebook showed the step-by-step process, Kamiwaza has packaged everything into a single convenient function to streamline the workflow.

The `download_and_deploy_model` function handles:
1. Finding the model
2. Downloading the appropriate files
3. Waiting for download completion
4. Deploying the model
5. Setting up the endpoint

All with just one line of code!

In [None]:
from kamiwaza_sdk import kamiwaza_sdk as kz
client = kz("http://localhost:7777/api/")

## Search for a Model

Before downloading, we can search for the model to view available quantization options. This step is optional but helpful to see what's available.

In [None]:
hf_repo='bartowski/Llama-3-8B-Instruct-Coder-v2-GGUF'
client.models.search_models(hf_repo, exact=True)

## Download and Deploy in One Step

Now for the simplified approach - we can download and deploy the model in a single function call. This function:

- Initiates the download with the specified quantization
- Monitors download progress with real-time updates
- Automatically deploys the model once download is complete
- Returns complete information about the model and deployment

You can specify any quantization level shown in the search results, or omit the parameter to use the default best option for your hardware.


In [None]:
client.models.download_and_deploy_model(hf_repo, quantization = 'q6_k')

## Set Up the OpenAI Client

Once the model is deployed, we can get an OpenAI-compatible client to interact with it, just like in the step-by-step approach.

In [None]:
openai_client = client.openai.get_client(repo_id=hf_repo)

In [None]:
# Create a streaming chat completion
response = openai_client.chat.completions.create(
    messages=[
        {"role": "user", "content": "How many r's are in the word 'strawberry'? ONLY RESPOND WITH A SINGLE NUMBER"}
    ],
    model="model",
    stream=True 
)

# display the stream
for chunk in response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)


## Summary

The `download_and_deploy_model` function provides a streamlined way to go from finding a model to using it for inference in minimal steps. This is especially useful for:

- Quick experimentation with different models
- Simplified deployment workflows
- Reducing boilerplate code in applications

When you're done, you can still stop the deployment using `client.serving.stop_deployment(repo_id=hf_repo)` as shown in the first notebook.