Description
Depends on FE issue harmonydata/app#14
if you look at the API call
https://api.harmonydata.ac.uk/docs#/Text/match_text_match_post
there is already a parameter in the API to say which model we use

but we don't use it
I would like the Harmony API repo to allow you to send "openai" and "gpt-4" or whatever so that it uses one of those
once the API had this implemented we can add it to the front end as a dropdown like in this mockup

so the dropdown should defualt to the huggingface model which harmony is already using
but if they change it, then we use the third party LLMs
I don't think we have enough users that it would be expensive but if that changes, we can start to ask users to paste their OpenAI API key
Rationale
the problem is, psychologists are using the tool but are sometimes unhappy with the matching. e.g. sometimes it thinks words are similar when they are not
this is because the LLM I am running is an open source one from HuggingFace and Google and OpenAI's LLMs are better but they are 3rd party and need to be called via an API
I have a writeup of how the other LLMs perform on a test dataset here: https://github.com/harmonydata/matching/blob/main/analyse_results.ipynb
For example, Vertex AI Gecko and OpenAI Ada 2 and Ada 3 outperform current Harmony on datasets such as GAD-7:


What code needs to be edited?
I think it will be in text_router.py in the API repo:
https://github.com/harmonydata/harmonyapi/blob/main/routers/text_router.py#L210
Using OpenAI or other LLMs for vectorisation
Any word vector representation can be used by Harmony. The below example works for OpenAI's text-embedding-ada-002 model as of July 2023, provided you have create a paid OpenAI account. However, since LLMs are progressing rapidly, we have chosen not to integrate Harmony directly into the OpenAI client libraries, but instead allow you to pass Harmony any vectorisation function of your choice.
import openai
import numpy as np
from harmony import match_instruments_with_function, example_instruments
model_name = "text-embedding-ada-002"
def convert_texts_to_vector(texts):
vectors = openai.Embedding.create(input = texts, model=model_name)['data']
return np.asarray([vectors[i]["embedding"] for i in range(len(vectors))])
instruments = example_instruments["CES_D English"], example_instruments["GAD-7 Portuguese"]
all_questions, similarity, query_similarity, new_vectors_dict = match_instruments_with_function(instruments, None, convert_texts_to_vector)
Description
Depends on FE issue harmonydata/app#14
if you look at the API call
https://api.harmonydata.ac.uk/docs#/Text/match_text_match_post
there is already a parameter in the API to say which model we use
but we don't use it

I would like the Harmony API repo to allow you to send "openai" and "gpt-4" or whatever so that it uses one of those
once the API had this implemented we can add it to the front end as a dropdown like in this mockup
so the dropdown should defualt to the huggingface model which harmony is already using
but if they change it, then we use the third party LLMs
I don't think we have enough users that it would be expensive but if that changes, we can start to ask users to paste their OpenAI API key
Rationale
the problem is, psychologists are using the tool but are sometimes unhappy with the matching. e.g. sometimes it thinks words are similar when they are not
this is because the LLM I am running is an open source one from HuggingFace and Google and OpenAI's LLMs are better but they are 3rd party and need to be called via an API
I have a writeup of how the other LLMs perform on a test dataset here: https://github.com/harmonydata/matching/blob/main/analyse_results.ipynb
For example, Vertex AI Gecko and OpenAI Ada 2 and Ada 3 outperform current Harmony on datasets such as GAD-7:
What code needs to be edited?
I think it will be in
text_router.pyin the API repo:https://github.com/harmonydata/harmonyapi/blob/main/routers/text_router.py#L210
Using OpenAI or other LLMs for vectorisation
Any word vector representation can be used by Harmony. The below example works for OpenAI's text-embedding-ada-002 model as of July 2023, provided you have create a paid OpenAI account. However, since LLMs are progressing rapidly, we have chosen not to integrate Harmony directly into the OpenAI client libraries, but instead allow you to pass Harmony any vectorisation function of your choice.