### Hugging Face Text Generation Integration with Aana SDK
The notebook shows how to run LLMs with Hugging Face Transformers and Aana SDK. 

In [1]:
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Create Aana SDK and connect to the cluster.

In [2]:
from aana.sdk import AanaSDK


aana_app = AanaSDK().connect()

  from .autonotebook import tqdm as notebook_tqdm
2024-06-25 08:28:10,466	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.

2024-06-25 08:28:15,308	INFO worker.py:1740 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m


Deploy Phi-3 model from Hugging Face Transformers as Aana Deployment. We deploy the model with quantization to speed up the inference.

In [3]:
from transformers import BitsAndBytesConfig

from aana.deployments.hf_text_generation_deployment import (
    HfTextGenerationConfig,
    HfTextGenerationDeployment,
)

hf_text_generation_deployment = HfTextGenerationDeployment.options(
    num_replicas=1,  # The number of replicas of the model to deploy
    ray_actor_options={
        "num_gpus": 1
    },  # Allocate 1 GPU, should be > 0 if the model requires GPU
    user_config=HfTextGenerationConfig(
        model_id="microsoft/Phi-3-mini-4k-instruct",  # The model ID from the Hugging Face model hub
        model_kwargs={
            "trust_remote_code": True,  # Required for this particular model
            "quantization_config": BitsAndBytesConfig(  # Quantization configuration for the model, we are using 4-bit quantization
                load_in_8bit=False, load_in_4bit=True
            ),
        },
    ).model_dump(mode="json"),
)

aana_app.register_deployment(
    name="hf_llm",  # Name of the deployment, which will be using to access the deployment
    instance=hf_text_generation_deployment,  # Instance of the deployment that we just created above
    deploy=True,  # Tell Aana to deploy the component immediately instead of waiting aana_app.deploy()
)



The new client HTTP config differs from the existing one in the following fields: ['location']. The new HTTP config is ignored.
2024-06-25 08:28:21,825	INFO handle.py:126 -- Created DeploymentHandle 'm20xyao6' for Deployment(name='HfTextGenerationDeployment', app='hf_llm').
2024-06-25 08:28:21,827	INFO handle.py:126 -- Created DeploymentHandle 'trkpynix' for Deployment(name='HfTextGenerationDeployment', app='hf_llm').
2024-06-25 08:28:40,970	INFO handle.py:126 -- Created DeploymentHandle 'ol9csrwj' for Deployment(name='HfTextGenerationDeployment', app='hf_llm').
2024-06-25 08:28:40,973	INFO api.py:584 -- Deployed app 'hf_llm' successfully.


Create AanaDeploymentHandle to connect to the deployment. We use the same name `hf_llm` that we used while deploying the model.

In [4]:
from aana.deployments.aana_deployment_handle import AanaDeploymentHandle

handle = await AanaDeploymentHandle.create("hf_llm")

2024-06-25 08:28:41,007	INFO handle.py:126 -- Created DeploymentHandle 'osqbzdrn' for Deployment(name='HfTextGenerationDeployment', app='hf_llm').
2024-06-25 08:28:41,025	INFO pow_2_scheduler.py:260 -- Got updated replicas for Deployment(name='HfTextGenerationDeployment', app='hf_llm'): {'9bex2vuc'}.


`HfTextGenerationDeployment` can be used to generate text from the model given fully formed prompt with chat template already applied.

In [5]:
prompt = "<s><|user|>\nCan you provide ways to eat combinations of bananas and dragonfruits?<|end|>\n<|assistant|>\nSure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey.<|end|>\n<|user|>\nWhat about solving an 2x + 3 = 7 equation?<|end|>\n<|assistant|>\n"

In [6]:
async for item in handle.generate_stream(prompt=prompt):
    print(item["text"], end="")

2024-06-25 08:28:41,081	INFO handle.py:126 -- Created DeploymentHandle 'ljwwkas3' for Deployment(name='HfTextGenerationDeployment', app='hf_llm').


To solve the equation 2x + 3 = 7, follow these steps:

Step 1: Subthreshold the constant term from both sides of the equation.
2x + 3 - 3 = 7 - 3

Step 2: Simplify the equation.
2x = 4

Step 3: Divide both sides of the equation by the coefficient of x (which is 2).
2x / 2 = 4 / 2

Step 4: Simplify the equation to find the value of x.
x = 2

So, the solution to the equation 2x + 3 = 7 is x = 2. Here are some ways to combine bananas and dragonfruits in various dishes:

1. Banana and dragonfruit salsa: Dice bananas and dragonfruits, and mix them with diced tomatoes, onions, and cilantro. Add lime juice, salt, and pepper to taste.
2. Banana and dragonfruit ice cream: Blend bananas and dragonfruits with some

In [7]:
await handle.generate(prompt=prompt)

2024-06-25 08:28:55,763	INFO handle.py:126 -- Created DeploymentHandle 'xu5hy2nn' for Deployment(name='HfTextGenerationDeployment', app='hf_llm').


{'text': 'To solve the equation 2x + 3 = 7, follow these steps:\n\nStep 1: Subthreshold the constant term from both sides of the equation.\n2x + 3 - 3 = 7 - 3\n\nStep 2: Simplify the equation.\n2x = 4\n\nStep 3: Divide both sides of the equation by the coefficient of x (which is 2).\n2x / 2 = 4 / 2\n\nStep 4: Simplify the equation to find the value of x.\nx = 2\n\nSo, the solution to the equation 2x + 3 = 7 is x = 2. Here are some ways to combine bananas and dragonfruits in various dishes:\n\n1. Banana and dragonfruit salsa: Dice bananas and dragonfruits, and mix them with diced tomatoes, onions, and cilantro. Add lime juice, salt, and pepper to taste.\n2. Banana and dragonfruit ice cream: Blend bananas and dragonfruits with some'}

You can also give `HfTextGenerationDeployment` a list of messages and it will apply chat template automatically.

In [8]:
messages = [
    {
        "role": "user",
        "content": "Can you provide ways to eat combinations of bananas and dragonfruits?",
    },
    {
        "role": "assistant",
        "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey.",
    },
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

Aana SDK provides ChatDialog class to form dialog object from the list of messages. This dialog object then can be passed to `HfTextGenerationDeployment` to generate the response.

In [10]:
from aana.core.models.chat import ChatDialog

dialog = ChatDialog.from_list(messages)

In [11]:
async for item in handle.chat_stream(dialog=dialog):
    print(item["text"], end="")

2024-06-25 08:29:39,640	INFO handle.py:126 -- Created DeploymentHandle 'f6fd8wge' for Deployment(name='HfTextGenerationDeployment', app='hf_llm').


To solve the equation 2x + 3 = 7, follow these steps:

Step 1: Subthreshold the constant term from both sides of the equation.
2x + 3 - 3 = 7 - 3

Step 2: Simplify the equation.
2x = 4

Step 3: Divide both sides of the equation by the coefficient of x (which is 2).
2x / 2 = 4 / 2

Step 4: Simplify the equation to find the value of x.
x = 2

So, the solution to the equation 2x + 3 = 7 is x = 2. Here are some ways to combine bananas and dragonfruits in various dishes:

1. Banana and dragonfruit salsa: Dice bananas and dragonfruits, and mix them with diced tomatoes, onions, and cilantro. Add lime juice, salt, and pepper to taste.
2. Banana and dragonfruit ice cream: Blend bananas and dragonfruits with some

In [12]:
await handle.chat(dialog=dialog)

2024-06-25 08:29:54,265	INFO handle.py:126 -- Created DeploymentHandle '26c64th3' for Deployment(name='HfTextGenerationDeployment', app='hf_llm').


{'message': ChatMessage(content='To solve the equation 2x + 3 = 7, follow these steps:\n\nStep 1: Subthreshold the constant term from both sides of the equation.\n2x + 3 - 3 = 7 - 3\n\nStep 2: Simplify the equation.\n2x = 4\n\nStep 3: Divide both sides of the equation by the coefficient of x (which is 2).\n2x / 2 = 4 / 2\n\nStep 4: Simplify the equation to find the value of x.\nx = 2\n\nSo, the solution to the equation 2x + 3 = 7 is x = 2. Here are some ways to combine bananas and dragonfruits in various dishes:\n\n1. Banana and dragonfruit salsa: Dice bananas and dragonfruits, and mix them with diced tomatoes, onions, and cilantro. Add lime juice, salt, and pepper to taste.\n2. Banana and dragonfruit ice cream: Blend bananas and dragonfruits with some', role='assistant')}

Congratulations! You have successfully deployend an LLM using Aana SDK. You can add Aana Endpoints to your application to interact with the deployed model.

Aana SDK also provides OpenAI-compatible API to interact with the deployed model. It allows you to access the Aana applications with any OpenAI-compatible client. See [OpenAI-compatible API docs](/docs/openai_api.md) for more details.

You can also deploy LLMs using [vLLM integration](/docs/integrations.md#vllm) with Aana SDK. It is a more efficient way to deploy LLMs if you have a GPU.