# DSX Dev GenAI Text to Text Models

Following code demonstrates how to use text generation models using Dev GenAI. This notebook serves as an example. For more documentation regarding the usage of Dev GenAI and other products please refer to our [Dev GenAI Confluence page](https://confluence.dell.com/display/DSX/Request+GenAI+Model+Access).

#### Installing Packages

Run the following cell to install required packages to run this notebook. 

In [None]:
!pip install 'openai>=1.45.0' requests python-dotenv certifi

: 

### Intial setup
### Setup API key

Environment variables are loaded using dotenv, ensuring sensitive information like API keys are not hard-coded.

In [None]:
from dotenv import load_dotenv

# At first, add your api key to .env file as DEV_GENAI_API_KEY='Insert_your_Dev_GenAI_Text_to_Text_API_key_here' and load env file
load_dotenv('.env', override=True)

# alternatively you can export an enviroment variable using following command: export DEV_GENAI_API_KEY="Insert_your_Dev_GenAI_Text_to_Text_API_key_here" 

### Update certifi bundle with Dell certificates

In [None]:
import requests
import zipfile
import io
import certifi

def update_certifi():
    # URL to download the Dell certificates zip file
    url = "https://pki.dell.com//Dell%20Technologies%20PKI%202018%20B64_PEM.zip"
    print("Downloading Dell certificates zip from:", url)
    response = requests.get(url)
    # Use raise_for_status() for concise error checking
    response.raise_for_status()
    print("Downloaded certificate zip, size:", len(response.content), "bytes")

    # Determine the location of the certifi bundle
    cert_path = certifi.where()
    print("Certifi bundle path:", cert_path)

    # Define the names of the certificates within the zip file
    dell_root_cert_name = "Dell Technologies Root Certificate Authority 2018.pem"
    dell_issuing_cert_name = "Dell Technologies Issuing CA 101_new.pem"

    # Append the certificates directly from the zip archive in memory.
    print("Appending Dell certificates to certifi bundle...")
    try:
        with zipfile.ZipFile(io.BytesIO(response.content)) as z:
            # Read certificate contents directly from the zip file in memory
            # Ensure decoding from bytes to string (assuming UTF-8)
            root_cert_content = z.read(dell_root_cert_name).decode('utf-8')
            issuing_cert_content = z.read(dell_issuing_cert_name).decode('utf-8')

            # Append the certificates to the certifi bundle
            # (Make sure you have backup of certifi bundle if needed.)
            with open(cert_path, "a") as bundle:
                bundle.write("\n")
                bundle.write(root_cert_content)
                bundle.write("\n") # Ensure newline after first cert
                bundle.write(issuing_cert_content)
                bundle.write("\n") # Ensure newline after second cert

        print("Dell certificates successfully added to certifi bundle.")

    except KeyError as e:
        # Handle case where expected certificate file is not in the zip
        print(f"Error: Certificate file '{e}' not found in the zip archive.")
    except Exception as e:
        # Handle other potential errors during processing
        print(f"An error occurred during certificate appending: {e}")


update_certifi()

### OpenAI client initialization
This section of code initializes the Azure Open AI client, which is necessary to interact with OpenAI's API using Python. 

#### Useful Links:

* OpenAI API Documentation: [OpenAI API](https://platform.openai.com/docs/introduction)

In [None]:
from openai import OpenAI
import httpx
import os

http_client=httpx.Client(verify=certifi.where())
client = OpenAI(
    base_url='https://genai-api-dev.dell.com/v1',
    http_client=http_client,
    api_key=os.environ["DEV_GENAI_API_KEY"]
)

## Dev GenAI responses leveraging OpenAI package

### Get results from a simple prompt (OpenAI Completion)

This example demonstrates how to use OpenAI client to get responses from `mixtral-8x7b-instruct-v01`, `llamaguard-7b`, `mistral-7b-instruct-v03`, `phi-3-mini-128k-instruct`, `phi-3-5-moe-instruct`, `llama-3-8b-instruct`, `llama-3-1-8b-instruct`, `llama-3-2-3b-instruct`, `codellama-13b-instruct`, `sqlcoder-7b-2`, `codestral-22b-v0-1`,`llama-3-3-70b-instruct` models. The `completions.create` method is used, where a prompt is provided, and the model generates a response.

In [None]:
streaming = True
max_output_tokens = 200

# Available Models list
available_models = ["mixtral-8x7b-instruct-v01", "llamaguard-7b", "mistral-7b-instruct-v03", "phi-3-mini-128k-instruct", "phi-3-5-moe-instruct", "llama-3-8b-instruct", "llama-3-1-8b-instruct", "llama-3-2-3b-instruct","codellama-13b-instruct", "sqlcoder-7b-2", "codestral-22b-v0-1", "llama-3-3-70b-instruct"]

# Let's select the model from available list
model_selected = available_models[0]

for model_selected in available_models:
    print(f"Model: {model_selected}")
    completion = client.completions.create(
        model=model_selected,
        max_tokens=max_output_tokens,
        prompt=f'Can you explain who are the Los Angeles Dodgers and what are they known for is in less than {max_output_tokens} tokens?',
        stream=streaming)

    if streaming:
        for chunk in completion:
            print(chunk.choices[0].text, end='')
    else:
        print(completion.choices[0].text)


### Get results from OpenAI Chat Completion

This example demonstrates how to use the Dev GenAI client to get responses from a chat model `mixtral-8x7b-instruct-v01`, `llamaguard-7b`, `mistral-7b-instruct-v03`, `phi-3-mini-128k-instruct`, `phi-3-5-moe-instruct`, `llama-3-8b-instruct`, `llama-3-1-8b-instruct`, `llama-3-2-3b-instruct`, `codellama-13b-instruct`, `codestral-22b-v0-1`, `llama-3-3-70b-instruct`. The `chat.completions.create` method is used, where a conversation context is provided, and the model generates a continuation. In this case, the conversation is about the favourite condiment. This is useful for building conversational AI applications.


In [None]:

streaming = False # To enable streaming, set streaming to True
available_models = ["mixtral-8x7b-instruct-v01", "llamaguard-7b", "mistral-7b-instruct-v03", "phi-3-mini-128k-instruct", "phi-3-5-moe-instruct", "llama-3-8b-instruct", "llama-3-1-8b-instruct", "llama-3-2-3b-instruct","codellama-13b-instruct", "codestral-22b-v0-1", "llama-3-3-70b-instruct"]
selected_model = available_models[5]

completion = client.chat.completions.create(
    model=selected_model,
    messages = [
            {"role": "user", "content": "What is your favourite condiment?"},
            {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
            {"role": "user", "content": "Do you have mayonnaise recipes?"}
        ],
    stream=streaming
)

if streaming:
    for chunk in completion:
        if chunk.id:
            if chunk.choices[0].delta.content == None and chunk.choices[0].delta.role != None:
                print(chunk.choices[0].delta.role+': ', end='')
            elif chunk.choices[0].delta.content != None:
                print(chunk.choices[0].delta.content, end='')
else:
    print(completion.choices[0].message.role + ': ' + completion.choices[0].message.content)

### Get results for a code generator model (OpenAI Completion)

This example demonstrates how to use OpenAI client to get responses from `codellama-13b-instruct`, `sqlcoder-7b-2`, `codestral-22b-v0-1`, `llama-3-sqlcoder-8b` models. The `completions.create` method is used, where a prompt is provided, and the model generates a response.

In [None]:
streaming = True
max_output_tokens = 200

sqlcoder_prompt = """### Task
                        Generate a SQL query to answer [QUESTION] join the two tables and count for unique ids[/QUESTION]
                        ### Database Schema
                        The query will run on a database with the following schema: test_schema.table1 and test_schema.table2
                        ### Answer
                        Given the database schema, here is the SQL query that [QUESTION] join the two tables and count for unique ids[/QUESTION]
                        [SQL]
                """
codellama_prompt = "Write a python function to generate the nth fibonacci number."

# prompt pieces for llama-3-sqlcoder-8b
user_question = "What are our top 3 products by revenue in the New York region?"
instructions = "- if the question cannot be answered given the database schema, return \"I do not know\"\n- recall that the current date in YYYY-MM-DD format is 2024-09-13"
create_table_statements = """
CREATE TABLE products (
  product_id INTEGER PRIMARY KEY, -- Unique ID for each product
  name VARCHAR(50), -- Name of the product
  price DECIMAL(10,2), -- Price of each unit of the product
  quantity INTEGER  -- Current quantity in stock
);

CREATE TABLE customers (
  customer_id INTEGER PRIMARY KEY, -- Unique ID for each customer
  name VARCHAR(50), -- Name of the customer
  address VARCHAR(100) -- Mailing address of the customer
);

CREATE TABLE salespeople (
  salesperson_id INTEGER PRIMARY KEY, -- Unique ID for each salesperson 
  name VARCHAR(50), -- Name of the salesperson
  region VARCHAR(50) -- Geographic sales region 
);

CREATE TABLE sales (
  sale_id INTEGER PRIMARY KEY, -- Unique ID for each sale
  product_id INTEGER, -- ID of product sold
  customer_id INTEGER,  -- ID of customer who made purchase
  salesperson_id INTEGER, -- ID of salesperson who made the sale
  sale_date DATE, -- Date the sale occurred 
  quantity INTEGER -- Quantity of product sold
);

CREATE TABLE product_suppliers (
  supplier_id INTEGER PRIMARY KEY, -- Unique ID for each supplier
  product_id INTEGER, -- Product ID supplied
  supply_price DECIMAL(10,2) -- Unit price charged by supplier
);

-- sales.product_id can be joined with products.product_id
-- sales.customer_id can be joined with customers.customer_id 
-- sales.salesperson_id can be joined with salespeople.salesperson_id
-- product_suppliers.product_id can be joined with products.product_id
"""
sqlcoder_8b_prompt = f"""<|begin_of_text|><|start_header_id|>user<|end_header_id|>
                           Generate a SQL query to answer this question: `{user_question}`
                           {instructions}
                           DDL statements:
                           {create_table_statements}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
                           The following SQL query best answers the question `{user_question}`:
                           ```sql
                           """


available_models = ["codellama-13b-instruct", "sqlcoder-7b-2", "codestral-22b-v0-1", "llama-3-sqlcoder-8b"]
# Let's select the model from available list
model_selected = available_models[0]


# Available Prompts per model list
models_prompts = {"codellama-13b-instruct" : codellama_prompt, 
                  "sqlcoder-7b-2": sqlcoder_prompt,
                  "llama-3-sqlcoder-8b": sqlcoder_8b_prompt,
                  "codestral-22b-v0-1": codellama_prompt
                 }

# Let's select the prompt from available list
prompt_selected = models_prompts[model_selected]


print(f"Model: {model_selected}")
completion = client.completions.create(
    model=model_selected,
    
    max_tokens=max_output_tokens,
    prompt=prompt_selected,
    stream=streaming)

if streaming:
    for chunk in completion:
        print(chunk.choices[0].text, end='')
else:
    print(completion.choices[0].text)

## II. Get Dev GenAI responses leveraging Python requests

### II a. Python request Completion

This example demonstrates how to get responses from `mixtral-8x7b-instruct-v01`, `llamaguard-7b`, `mistral-7b-instruct-v03`, `phi-3-mini-128k-instruct`, `phi-3-5-moe-instruct`, `llama-3-8b-instruct`, `llama-3-1-8b-instruct`, `llama-3-2-3b-instruct`, `llama-3-3-70b-instruct` models using python request. We will be sending headers including API-Key and json data that includes prompt to `v1/completions` endpoint, and the model generates a response.


In [None]:
import json
import requests
import os

streaming = True
max_output_tokens = 200

# Available Models list
available_models = ["mixtral-8x7b-instruct-v01", "llamaguard-7b", "mistral-7b-instruct-v03", "phi-3-mini-128k-instruct", "phi-3-5-moe-instruct", "llama-3-8b-instruct", "llama-3-1-8b-instruct", "llama-3-2-3b-instruct", "llama-3-3-70b-instruct"]

# Let's select the model from available list
model_selected = available_models[0]

def stream_and_yield_response(response):
    for chunk in response.iter_lines():
        decoded_chunk = chunk.decode("utf-8")
        if decoded_chunk == "data: [DONE]":
            pass
        elif decoded_chunk.startswith("data: {"):
            payload = decoded_chunk.lstrip("data:")
            json_payload = json.loads(payload)
            yield json_payload['choices'][0]['text']


# function from confluence
def llm_api(data):
    """
    Create a request to Dev GenAI Text to Text model with API key in header.
    """
    
    url = f"https://genai-api-dev.dell.com/v1/completions"
      
    headers = {
        'accept': 'application/json',
        'api-key': os.environ["DEV_GENAI_API_KEY"],
        'Content-Type': 'application/json'
    }
      
    try:
        response = requests.post(url, headers=headers, json=data, stream=data['stream'], verify=certifi.where())
        response.raise_for_status()
        
        if data['stream']:
            for result in stream_and_yield_response(response):
                print(result, end='')
        else:
            response_dict = response.json()
            result = response_dict['choices'][0]['text']
            print(result)
           
    except requests.exceptions.HTTPError as err:
        print('Error code:', err.response.status_code)
        print('Error message:', err.response.text)
    except Exception as err:
        print('Error:', err)

# Model instruction and Parameters
instruction_text = f'Can you explain who are the Los Angeles Dodgers and what are they known for is in less than {max_output_tokens} tokens?'
  
data = {
    'prompt': instruction_text,
    'temperature': 0.5,
    'top_p': 0.95,
    'max_tokens': max_output_tokens,
    'stream': streaming,
    'model': model_selected
    }

# API Call
llm_api(data)

### II b. Python request Chat Completion

This example demonstrates how to get responses from `mixtral-8x7b-instruct-v01`, `llamaguard-7b`, `mistral-7b-instruct-v03`, `phi-3-mini-128k-instruct`, `phi-3-5-moe-instruct`, `llama-3-8b-instruct`, `llama-3-1-8b-instruct`, `llama-3-2-3b-instruct`, `llama-3-3-70b-instruct` models using python request. We will be sending headers including API-Key and json data that includes the chat conversation content to `v1/chat/completions` endpoint, and the model generates a response. In this case, the conversation is about the Los Angeles Dodgers won the World Series in 2020. This is useful for building conversational AI applications.

In [None]:
import json
import requests
import os

streaming = True
max_output_tokens = 200

# Available Models list
available_models = ["mixtral-8x7b-instruct-v01", "llamaguard-7b", "mistral-7b-instruct-v03", "phi-3-mini-128k-instruct", "phi-3-5-moe-instruct", "llama-3-8b-instruct", "llama-3-1-8b-instruct", "llama-3-2-3b-instruct", "llama-3-3-70b-instruct"]

# Let's select the model from available list
model_selected = available_models[0]

def stream_and_yield_response(response):
    for chunk in response.iter_lines():
        decoded_chunk = chunk.decode("utf-8")
        if decoded_chunk == "data: [DONE]":
            pass
        elif decoded_chunk.startswith("data: {"):
            payload = decoded_chunk.lstrip("data:")
            json_payload = json.loads(payload)

            if ('role' in json_payload['choices'][0]['delta'] and json_payload['choices'][0]['delta']['role'] != None): 
                yield json_payload['choices'][0]['delta']['role'] + ': '
            else:
                yield json_payload['choices'][0]['delta']['content']

# function from confluence
def llm_api(data):
    """
    Creates a request to Dev GenAI Text to Text model with API key in header.
    """

    url = f"https://genai-api-dev.dell.com/v1/chat/completions"

    headers = {
        'accept': 'application/json',
        'api-key': os.environ["DEV_GENAI_API_KEY"],
        'Content-Type': 'application/json'
    }

    try:
        response = requests.post(url, headers=headers, json=data, stream=data['stream'], verify=certifi.where())
        response.raise_for_status()

        if data['stream']:
            for result in stream_and_yield_response(response):
                print(result, end='')
        else:
            response_dict = response.json()
            result = response_dict['choices'][0]['message']['role'] + ': ' + response_dict['choices'][0]['message']['content']
            print(result)

    except requests.exceptions.HTTPError as err:
        print('Error code:', err.response.status_code)
        print('Error message:', err.response.text)
    except Exception as err:
        print('Error:', err)

# Model instruction and Parameters
messages =  [{'role': 'user', 'content': f'You are a helpful assistant who needs to anser in less than {max_output_tokens} tokens'},
             {'role': 'assistant', 'content': 'The Los Angeles Dodgers won the World Series in 2020.'},
             {'role': 'user', 'content': 'Who are the Los Angeles Dodgers?'}]

data = {
    'messages': messages,
    'temperature': 0.5,
    'top_p': 0.95,
    'max_tokens': max_output_tokens,
    'stream': streaming,
    'model': model_selected
    }

# API Call
llm_api(data)