Skip to content

Add code to API repo only which will call Harmony with a cloud based third party LLM such as OpenAI or Google Vertex #4

@woodthom2

Description

@woodthom2

Description

Depends on FE issue harmonydata/app#14

if you look at the API call
https://api.harmonydata.ac.uk/docs#/Text/match_text_match_post
there is already a parameter in the API to say which model we use

image

but we don't use it
I would like the Harmony API repo to allow you to send "openai" and "gpt-4" or whatever so that it uses one of those
once the API had this implemented we can add it to the front end as a dropdown like in this mockup
image

so the dropdown should defualt to the huggingface model which harmony is already using
but if they change it, then we use the third party LLMs
I don't think we have enough users that it would be expensive but if that changes, we can start to ask users to paste their OpenAI API key

Rationale

the problem is, psychologists are using the tool but are sometimes unhappy with the matching. e.g. sometimes it thinks words are similar when they are not
this is because the LLM I am running is an open source one from HuggingFace and Google and OpenAI's LLMs are better but they are 3rd party and need to be called via an API

I have a writeup of how the other LLMs perform on a test dataset here: https://github.com/harmonydata/matching/blob/main/analyse_results.ipynb

For example, Vertex AI Gecko and OpenAI Ada 2 and Ada 3 outperform current Harmony on datasets such as GAD-7:

image

image

What code needs to be edited?

I think it will be in text_router.py in the API repo:

https://github.com/harmonydata/harmonyapi/blob/main/routers/text_router.py#L210

Using OpenAI or other LLMs for vectorisation

Any word vector representation can be used by Harmony. The below example works for OpenAI's text-embedding-ada-002 model as of July 2023, provided you have create a paid OpenAI account. However, since LLMs are progressing rapidly, we have chosen not to integrate Harmony directly into the OpenAI client libraries, but instead allow you to pass Harmony any vectorisation function of your choice.

import openai
import numpy as np
from harmony import match_instruments_with_function, example_instruments
model_name = "text-embedding-ada-002"
def convert_texts_to_vector(texts):
    vectors = openai.Embedding.create(input = texts, model=model_name)['data']
    return np.asarray([vectors[i]["embedding"] for i in range(len(vectors))])
instruments = example_instruments["CES_D English"], example_instruments["GAD-7 Portuguese"]
all_questions, similarity, query_similarity, new_vectors_dict = match_instruments_with_function(instruments, None, convert_texts_to_vector)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions