Add code to API repo only which will call Harmony with a cloud based third party LLM such as OpenAI or Google Vertex

## Description

Depends on FE issue https://github.com/harmonydata/app/issues/14

if you look at the API call
https://api.harmonydata.ac.uk/docs#/Text/match_text_match_post
there is already a parameter in the API to say which model we use


![image](https://github.com/harmonydata/harmonyapi/assets/15210965/381690ba-a4f5-4e12-9d63-afcce8cd8937)

but we don't use it
I would like the Harmony API repo to allow you to send "openai" and "gpt-4" or whatever so that it uses one of those
once the API had this implemented we can add it to the front end as a dropdown like in this mockup
![image](https://github.com/harmonydata/harmonyapi/assets/15210965/4a71707c-6727-40c0-8a71-89c3710542fe)

so the dropdown should defualt to the huggingface model which harmony is already using
but if they change it, then we use the third party LLMs
I don't think we have enough users that it would be expensive but if that changes, we can start to ask users to paste their OpenAI API key

## Rationale

the problem is, psychologists are using the tool but are sometimes unhappy with the matching. e.g. sometimes it thinks words are similar when they are not
this is because the LLM I am running is an open source one from HuggingFace and Google and OpenAI's LLMs are better but they are 3rd party and need to be called via an API

I have a writeup of how the other LLMs perform on a test dataset here: https://github.com/harmonydata/matching/blob/main/analyse_results.ipynb

For example, Vertex AI Gecko and OpenAI Ada 2 and Ada 3 outperform current Harmony on datasets such as GAD-7:

![image](https://github.com/harmonydata/harmonyapi/assets/15210965/4f44a0de-e786-4f9b-89ef-0fa2507bd712)

![image](https://github.com/harmonydata/harmonyapi/assets/15210965/e2493bf9-aebf-4423-8862-c68ebd9c4662)

## What code needs to be edited?

I think it will be in `text_router.py` in the API repo:

https://github.com/harmonydata/harmonyapi/blob/main/routers/text_router.py#L210

## Using OpenAI or other LLMs for vectorisation

Any word vector representation can be used by Harmony. The below example works for OpenAI's [text-embedding-ada-002](https://openai.com/blog/new-and-improved-embedding-model) model as of July 2023, provided you have create a paid OpenAI account. However, since LLMs are progressing rapidly, we have chosen not to integrate Harmony directly into the OpenAI client libraries, but instead allow you to pass Harmony any vectorisation function of your choice.

```
import openai
import numpy as np
from harmony import match_instruments_with_function, example_instruments
model_name = "text-embedding-ada-002"
def convert_texts_to_vector(texts):
    vectors = openai.Embedding.create(input = texts, model=model_name)['data']
    return np.asarray([vectors[i]["embedding"] for i in range(len(vectors))])
instruments = example_instruments["CES_D English"], example_instruments["GAD-7 Portuguese"]
all_questions, similarity, query_similarity, new_vectors_dict = match_instruments_with_function(instruments, None, convert_texts_to_vector)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code to API repo only which will call Harmony with a cloud based third party LLM such as OpenAI or Google Vertex #4

Description

Rationale

What code needs to be edited?

Using OpenAI or other LLMs for vectorisation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add code to API repo only which will call Harmony with a cloud based third party LLM such as OpenAI or Google Vertex #4

Description

Description

Rationale

What code needs to be edited?

Using OpenAI or other LLMs for vectorisation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions