## Improving Customer Support Automation


### Prerequisites to follow this notebook:

1. install the remyxai-cli: `pip install git+https://github.com/remyxai/remyxai-cli.git`
2. Make sure the REMYXAI_API_KEY is set in the environment variable

In [3]:
import logging
import traceback
from remyxai.client.myxboard import MyxBoard
from remyxai.client.remyx_client import RemyxAPI
from remyxai.api.evaluations import EvaluationTask

### Step 1: Figure out which models is has the highest score for customer prompts

For that we will select several latest models for the evaluation, make sure the models you select should supported by the remyx, checkout all the supported models from here: ___.  

In [12]:
model_ids = ['microsoft/Phi-3-mini-4k-instruct', 'BioMistral/BioMistral-7B', 'codellama/CodeLlama-7b-Instruct-hf', 'gorilla-llm/gorilla-openfunctions-v2', 'meta-llama/Llama-2-7b-hf', 'mistralai/Mistral-7B-Instruct-v0.3', 'meta-llama/Meta-Llama-3-8B', 'meta-llama/Meta-Llama-3-8B-Instruct', 'Qwen/Qwen2-1.5B', 'Qwen/Qwen2-1.5B-Instruct']  # Replace with your preferred models

create a myxboard, all the results for this evaluation will be stored in this board

In [15]:
myx_board_name = "customer_support_myxboard"
myx_board = MyxBoard(model_repo_ids=model_ids, name=myx_board_name)
myx_board

<remyxai.client.myxboard.MyxBoard at 0x7fe3238edcf0>

Initialize the RemyxAPI class which will help us to run the services without defining the urls

In [17]:
remyx_api = RemyxAPI()
remyx_api

<remyxai.client.remyx_client.RemyxAPI at 0x7fe3238ede70>

It's time to define the tasks(services) we want to use. We will be going to use Myxmatch, which comes under remyx evaluation tasks. To learn more about how remyx works visit here: ___.

In [18]:
tasks = [EvaluationTask.MYXMATCH]

Now, we need to define the prompts what customers going to use when using our customer support agent. The prompts should be as close to what prompts going to use in the production by the customer then only our evaluation results can help use to understand the limitations of models and pick the best model. 

Use Atleast 3 prompts

In [20]:
prompts = [
        "I want to return an item. It was damaged on delivery.",
        "Why hasn’t my package arrived? It was supposed to be delivered yesterday.",
        "Your product is defective. It doesn't work as advertised.",
    ]


`remyx_api.evaluate` function will call the Remyx evaluation API. Myxmatch takes about 10 minutes to complete, it is an async API, instead of waiting for the tasks to complete it will response back after initiating all the tasks in the background. 

In [13]:
for prompt in prompts:
    remyx_api.evaluate(myx_board, tasks, prompt=prompt)

Starting evaluation...
Starting evaluation...
Starting evaluation...


ERROR:root:HTTP error occurred: 403 Client Error: FORBIDDEN for url: https://engine.remyx.ai/api/v1.0/task/job-status/myxmatch-task-87ba56c7-cff9-459b-84a7-d1c1073f2550
ERROR:root:Failed to update MyxBoard: 403
ERROR:root:HTTP error occurred: 403 Client Error: FORBIDDEN for url: https://engine.remyx.ai/api/v1.0/task/job-status/myxmatch-task-87ba56c7-cff9-459b-84a7-d1c1073f2550
ERROR:root:Failed to update MyxBoard: 403
ERROR:root:HTTP error occurred: 403 Client Error: FORBIDDEN for url: https://engine.remyx.ai/api/v1.0/task/job-status/myxmatch-task-87ba56c7-cff9-459b-84a7-d1c1073f2550
ERROR:root:Failed to update MyxBoard: 403
ERROR:root:HTTP error occurred: 403 Client Error: FORBIDDEN for url: https://engine.remyx.ai/api/v1.0/task/job-status/myxmatch-task-87ba56c7-cff9-459b-84a7-d1c1073f2550
ERROR:root:Failed to update MyxBoard: 403
ERROR:root:HTTP error occurred: 403 Client Error: FORBIDDEN for url: https://engine.remyx.ai/api/v1.0/task/job-status/myxmatch-task-87ba56c7-cff9-459b-84a7-

Let's checkout the status of the evaluation tasks which we have just started.

In [33]:
from remyxai.api.evaluations import download_evaluation, list_evaluations

evaluations = list_evaluations()
evaluations

[{'eval_type': 'myxmatch',
  'name': 'customer_support_myxboard',
  'status': 'FINISHED'},
 {'eval_type': 'myxmatch', 'name': 'test_myxboard', 'status': 'FINISHED'},
 {'eval_type': 'myxmatch', 'name': 'movies knowledge', 'status': 'FINISHED'}]

When the `status` will show as `Finished` for the `customer_support_myxboard`, we can download the evaluation to look more into the detail 

In [36]:
results = download_evaluation("myxmatch", "customer_support_myxboard")
results['message']

{'models': [{'model': 'Qwen2-1.5B', 'rank': 1},
  {'model': 'Phi-3-mini-4k-instruct', 'rank': 2},
  {'model': 'Meta-Llama-3-8B-Instruct', 'rank': 3},
  {'model': 'Meta-Llama-3-8B', 'rank': 4},
  {'model': 'CodeLlama-7b-Instruct-hf', 'rank': 5},
  {'model': 'Qwen2-1.5B-Instruct', 'rank': 6},
  {'model': 'Mistral-7B-Instruct-v0.3', 'rank': 7},
  {'model': 'BioMistral-7B', 'rank': 8},
  {'model': 'gorilla-openfunctions-v2', 'rank': 9},
  {'model': 'Llama-2-7b-hf', 'rank': 10}],
 'prompt': "Your product is defective. It doesn't work as advertised."}

### Qwen2-1.5B Won

Now we know the ranking the of the models, where `Qwen2-1.5B` has the rank 1 respective to responses on the given prompts. To get the best responses as a customer support AI agent, we should use `Qwen2-1.5B` as our 1st choice in among other given models.