### Hugging Face Text Generation Integration with Aana SDK
The notebook shows how to run LLMs with Hugging Face Transformers and Aana SDK. 

In [1]:
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Create Aana SDK and connect to the cluster.

In [2]:
from aana.sdk import AanaSDK

aana_app = AanaSDK().connect()

  from .autonotebook import tqdm as notebook_tqdm
2025-04-22 11:33:41,911	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
2025-04-22 11:33:46,237	INFO worker.py:1832 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m
INFO 2025-04-22 11:33:48,978 serve 120992 -- Started Serve in namespace "serve".


Deploy Gemma-3 model from Hugging Face Transformers as Aana Deployment. We deploy the model with quantization to speed up the inference.

In [3]:
from transformers import BitsAndBytesConfig

from aana.deployments.hf_text_generation_deployment import (
    HfTextGenerationConfig,
    HfTextGenerationDeployment,
)

hf_text_generation_deployment = HfTextGenerationDeployment.options(
    num_replicas=1,  # The number of replicas of the model to deploy
    ray_actor_options={
        "num_gpus": 1
    },  # Allocate 1 GPU, should be > 0 if the model requires GPU
    user_config=HfTextGenerationConfig(
        model_id="google/gemma-3-1b-it",  # The model ID from the Hugging Face model hub
        model_kwargs={
            "trust_remote_code": True,  # Required for this particular model
            "quantization_config": BitsAndBytesConfig(  # Quantization configuration for the model, we are using 8-bit quantization
                load_in_8bit=True,
            ),
        },
    ).model_dump(mode="json"),
)

aana_app.register_deployment(
    name="hf_llm",  # Name of the deployment, which will be using to access the deployment
    instance=hf_text_generation_deployment,  # Instance of the deployment that we just created above
    deploy=True,  # Tell Aana to deploy the component immediately instead of waiting aana_app.deploy()
)

INFO 2025-04-22 11:33:54,400 serve 120992 -- Connecting to existing Serve app in namespace "serve". New http options will not be applied.


Create AanaDeploymentHandle to connect to the deployment. We use the same name `hf_llm` that we used while deploying the model.

In [4]:
from aana.deployments.aana_deployment_handle import AanaDeploymentHandle

handle = await AanaDeploymentHandle.create("hf_llm")

INFO 2025-04-22 11:34:04,721 serve 120992 -- Started <ray.serve._private.router.SharedRouterLongPollClient object at 0x7a4ff6a4de40>.


`HfTextGenerationDeployment` can be used to generate text from the model given fully formed prompt with chat template already applied.

In [5]:
prompt = "<bos><start_of_turn>user\nCan you provide ways to eat combinations of bananas and dragonfruits?<end_of_turn><start_of_turn>model\nSure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey.<end_of_turn><start_of_turn>user\nWhat about solving an 2x + 3 = 7 equation?<end_of_turn><start_of_turn>model\n"

In [6]:
async for item in handle.generate_stream(prompt=prompt):
    print(item["text"], end="")

Okay, let's tackle that 2x + 3 = 7 equation! This is a classic example of a step-by-step solution. Here's how we'll break it down:

**1. Isolate the Variable:**

* The goal is to get 'x' by itself on one side of the equation.
* Subtract 3 from both sides: 2x + 3 - 3 = 7 - 3
* Simplify: 2x = 4

**2. Solve for x:**

* Divide both sides by 2: 2x / 2 = 4 / 2
* Simplify: x = 2

**Therefore, the solution to the equation 2x + 3 = 7 is x = 2.**

**Let's check our answer:**

* 2(2) + 3 = 4 + 3 = 7  (This matches the original equation!)

**Key Concepts Used:**

* **Inverse Operations:**  We use subtraction (like subtracting 3) to undo the addition.
* **Equality:**  We're trying to get the equation to *equal* a specific value (7).

**Let

In [7]:
await handle.generate(prompt=prompt)

{'text': "Okay, let's tackle that 2x + 3 = 7 equation! This is a classic example of a step-by-step solution. Here's how we'll break it down:\n\n**1. Isolate the Variable:**\n\n* The goal is to get 'x' by itself on one side of the equation.\n* Subtract 3 from both sides: 2x + 3 - 3 = 7 - 3\n* Simplify: 2x = 4\n\n**2. Solve for x:**\n\n* Divide both sides by 2: 2x / 2 = 4 / 2\n* Simplify: x = 2\n\n**Therefore, the solution to the equation 2x + 3 = 7 is x = 2.**\n\n**Let's check our answer:**\n\n* 2(2) + 3 = 4 + 3 = 7  (This matches the original equation!)\n\n**Key Concepts Used:**\n\n* **Inverse Operations:**  We use subtraction (like subtracting 3) to undo the addition.\n* **Equality:**  We're trying to get the equation to *equal* a specific value (7).\n\n**Let"}

You can also give `HfTextGenerationDeployment` a list of messages and it will apply chat template automatically.

In [8]:
messages = [
    {
        "role": "user",
        "content": "Can you provide ways to eat combinations of bananas and dragonfruits?",
    },
    {
        "role": "assistant",
        "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey.",
    },
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

Aana SDK provides ChatDialog class to form dialog object from the list of messages. This dialog object then can be passed to `HfTextGenerationDeployment` to generate the response.

In [9]:
from aana.core.models.chat import ChatDialog

dialog = ChatDialog.from_list(messages)

In [10]:
async for item in handle.chat_stream(dialog=dialog):
    print(item["text"], end="")

Okay, let's solve the equation 2x + 3 = 7. Here's how to do it step-by-step:

1. **Subtract 3 from both sides:**
   2x + 3 - 3 = 7 - 3
   2x = 4

2. **Divide both sides by 2:**
   2x / 2 = 4 / 2
   x = 2

**Therefore, the solution is x = 2**

Let me know if you'd like to try another equation!做到قدمه



Okay, let's tackle that equation:

**2x + 3 = 7**

Here's how to solve it:

1. **Subtract 3 from both sides:**
   2x + 3 - 3 = 7 - 3
   2x = 4

2. **Divide both sides by 2:**
   2x / 2 = 4 / 2
   x = 2

**Therefore, the solution is x = 2**

Let me know if you'd like to try another equation!


In [11]:
await handle.chat(dialog=dialog)

{'message': ChatMessage(content="Okay, let's solve the equation 2x + 3 = 7. Here's how to do it step-by-step:\n\n1. **Subtract 3 from both sides:**\n   2x + 3 - 3 = 7 - 3\n   2x = 4\n\n2. **Divide both sides by 2:**\n   2x / 2 = 4 / 2\n   x = 2\n\n**Therefore, the solution is x = 2**\n\nLet me know if you'd like to try another equation!做到قدمه\n\n\n\nOkay, let's tackle that equation:\n\n**2x + 3 = 7**\n\nHere's how to solve it:\n\n1. **Subtract 3 from both sides:**\n   2x + 3 - 3 = 7 - 3\n   2x = 4\n\n2. **Divide both sides by 2:**\n   2x / 2 = 4 / 2\n   x = 2\n\n**Therefore, the solution is x = 2**\n\nLet me know if you'd like to try another equation!\n", role='assistant')}

Congratulations! You have successfully deployed an LLM using Aana SDK. You can add Aana Endpoints to your application to interact with the deployed model.

Aana SDK also provides OpenAI-compatible API to interact with the deployed model. It allows you to access the Aana applications with any OpenAI-compatible client. See [OpenAI-compatible API docs](/docs/pages/openai_api.md) for more details.

You can also deploy LLMs using [vLLM integration](/docs/pages/integrations.md#vllm) with Aana SDK. It is a more efficient way to deploy LLMs if you have a GPU.