## This demo app shows how to query Llama 2 using the Gradio UI.

Open this notebook in <a href="https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/llama_api_providers/OctoAI_API_examples/Llama_Gradio.ipynb"><img data-canonical-src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" src="https://camo.githubusercontent.com/f5e0d0538a9c2972b5d413e0ace04cecd8efd828d133133933dfffec282a4e1b/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667"></a>

Since we are using OctoAI in this example, you'll need to obtain an OctoAI token:

- You will need to first sign into [OctoAI](https://octoai.cloud/) with your Github or Google account
- Then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first)

**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI.

To run this example:
- Run the notebook
- Set up your OCTOAI API token and enter it when prompted
- Enter your question and click Submit

In the notebook or a browser with URL http://127.0.0.1:7860 you should see a UI with your answer.

Let's start by installing the necessary packages:
- langchain provides necessary RAG tools for this demo
- octoai-sdk allows us to use OctoAI Llama 2 endpoint
- gradio is used for the UI elements

And setting up the OctoAI token.

In [None]:
# !pip install octoai-sdk gradio 

In [None]:
from getpass import getpass
import os

OCTOAI_API_TOKEN = getpass()
os.environ["OCTOAI_TOKEN"] = OCTOAI_API_TOKEN

In [None]:
import json

import octoai
from octoai.chat import TextModel
from octoai.client import Client

client = Client()

In [None]:
import gradio as gr

def chat(message, history):
    
    history_octoai_format = [
        {
          "role": "system",
          "content": "Below is an instruction that describes a task. Write a response that appropriately completes the request.",
        },
    ]
    for human, ai in history:
        history_octoai_format.append({"role": "user", "content": human})
        history_octoai_format.append({"role": "assistant", "content": ai})
    history_octoai_format.append({"role": "user", "content": message})
    completion = client.chat.completions.create(
        model="meta-llama-3-8b-instruct",
        messages=history_octoai_format,
        max_tokens=300,
    )
    return completion.choices[0].message.content

gr.ChatInterface(chat).launch(debug=True)