### Relevant Links
- https://openai.com/blog/introducing-text-and-code-embeddings/
- https://platform.openai.com/docs/guides/embeddings/what-are-embeddings
- https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb
- https://platform.openai.com/docs/api-reference/completions/create

In [49]:
import openai
import json
from importlib import reload

with open("openai_api.json") as f:
    creds = json.load(f)

openai.api_version = creds["api_version"]
openai.api_base = creds["api_base"]
openai.api_type = creds["api_type"]
openai.api_key = creds["api_key"]

In [89]:
# List of models:
# [(m.id, m.status) for m in openai.Model.list()["data"]]

### Completion API

In [250]:
def get_prompt(p_context, p_question):
    p_prompt = f"""Answer the question as truthfully as possible using the provided text, and if the answer is not contained within the text below, say "I don't know".

Context:
{p_context}
Question: {p_question}
Answer:
"""
    return p_prompt

def complete(p_prompt):
    response = openai.Completion.create(
        prompt=p_prompt,
        deployment_id="text-chat-davinci-002",  # token limit: ~4000
        temperature=0.0,
        max_tokens=150,
        stop="\n\n",
    )
    # (no. of tokens in prompt + max_tokens arg) should be < token limit
    # return response
    return response["choices"][0]["text"]

In [242]:
# context taken from: https://www.microsoft.com/en-us/research/project/project-florence-vl/
context = """One of the core aspirations in artificial intelligence is to develop
algorithms that endow computers with an ability to effectively learn from
multi-modality (or multi-channel) data. This data is similar to sights and
sounds attained from vision and language that help humans make sense of the
world around us. For example, computers could mimic this ability by searching
the most similar images for a text query (or vice versa) and by describing the
content of an image using natural language.
Azure Florence-Vision and Language, short for Florence-VL, is launched to
achieve this goal, where we aim to build new foundation models for Multimodal
Intelligence. Florence-VL, as part of Project
Florence, is funded by the Microsoft AI
Cognitive Service team since 2020. Motivated by the strong demand from real
applications and recent research progresses on computer vision, natural
language processing, and vision-language understanding, we strive to advance
the state of the art on vision-language modeling and develop the best computer
vision technologies as part of our mission to empower everyone on the planet
to achieve more.
"""
question = "What is florence-vl about"
# question = "What does florence-vl stand for"

prompt = get_prompt(context, question)
print(complete(prompt))

Florence-VL, as part of Project Florence, is about building new foundation models for Multimodal Intelligence. The goal is to endow computers with an ability to effectively learn from multi-modality (or multi-channel) data, similar to sights and sounds attained from vision and language that help humans make sense of the world around us. The project is funded by the Microsoft AI Cognitive Service team since 2020 and aims to advance the state of the art on vision-language modeling and develop the best computer vision technologies as part of the mission to empower everyone on the planet to achieve more. 


### Embeddings

In [198]:
import pandas as pd
import numpy as np
from openai.embeddings_utils import get_embedding, cosine_similarity

#### Text Similarity

In [96]:
# sentences = [
#     "feline friends say",
#     "canine companions say",
#     "meow",
#     "woof",
# ]
# df = pd.DataFrame()
# df["sentence"] = sentences
# df["embedding"] = df["sentence"].apply(lambda x: openai.Embedding.create(input=x, deployment_id="text-similarity-davinci-001"))
# df

#### Text Search

In [200]:
import tiktoken

encoding = tiktoken.get_encoding("gpt2")

def get_tokens(text):
    return encoding.encode(text)

tokens = get_tokens("What is florence-vl about")
for token in tokens:
    print(f"{token}:", encoding.decode([token]))

2061: What
318:  is
781:  fl
382: ore
1198: nce
12: -
19279: vl
546:  about


In [221]:
import re
import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.microsoft.com/en-us/research/project/project-florence-vl/")
soup = BeautifulSoup(r.text, features='html.parser')
text = soup.get_text()
text = re.sub(r"[\n\t]+", "\n", text)
lines = [line for line in text.split("\n") if len(line) > 30]  # small lines don't have much information
text = "\n".join(lines)

filepath = "data/project-florence-vl.txt"
with open(filepath, "w") as f:
    f.write(text)

In [232]:
def get_chunks(text):
    SIZE = 4000
    chunks = []
    for i in range(0, len(text), SIZE):  # naive chunking
        chunk = text[i:i+SIZE]
        chunks.append(chunk)
    return chunks

with open(filepath) as f:
    text = f.read()
chunks = get_chunks(text)

In [233]:
df = pd.DataFrame.from_dict({
    "file": [filepath] * len(chunks),
    "chunk_text": chunks,
})
df["chunk_token_count"] = df["chunk_text"].apply(lambda x: len(get_tokens(x)))
df

Unnamed: 0,file,chunk_text,chunk_token_count
0,data/project-florence-vl.txt,Project Florence-VL - Microsoft Research\n ...,847
1,data/project-florence-vl.txt,on model trained on 800M image-text pairs. GIT...,879
2,data/project-florence-vl.txt,he common interface for all pre-training and d...,825
3,data/project-florence-vl.txt,that the more and more accurate pseudo labels ...,288


In [234]:
def get_embedding(text, model):
    response = openai.Embedding.create(input=text, deployment_id=model)
    return response["data"][0]["embedding"]

df["embedding"] = df["chunk_text"].apply(lambda x: get_embedding(x, "text-search-davinci-doc-001"))  # token limit: ~2000
df

Unnamed: 0,file,chunk_text,chunk_token_count,embedding
0,data/project-florence-vl.txt,Project Florence-VL - Microsoft Research\n ...,847,"[0.0013549693394452333, 0.008274282328784466, ..."
1,data/project-florence-vl.txt,on model trained on 800M image-text pairs. GIT...,879,"[-0.002092509064823389, 0.0050740959122776985,..."
2,data/project-florence-vl.txt,he common interface for all pre-training and d...,825,"[-0.010482553392648697, 0.006804130505770445, ..."
3,data/project-florence-vl.txt,that the more and more accurate pseudo labels ...,288,"[-0.004963461309671402, 0.007704003248363733, ..."


In [261]:
def get_qa_df(p_df, p_question):
    question_embedding = get_embedding(p_question, "text-search-davinci-query-001")  # token limit: ~2000
    qa_df = p_df.copy()
    qa_df["similarity"] = df["embedding"].apply(lambda x: cosine_similarity(x, question_embedding))
    return qa_df

# question = "what are multi channel videos"
# question = "what is the full form of GLIP and what is it about"
question = "what is the difference between project florence and project florence-vl"
qa_df = get_qa_df(df, question)
qa_df

Unnamed: 0,file,chunk_text,chunk_token_count,embedding,similarity
0,data/project-florence-vl.txt,Project Florence-VL - Microsoft Research\n ...,847,"[0.0013549693394452333, 0.008274282328784466, ...",0.364001
1,data/project-florence-vl.txt,on model trained on 800M image-text pairs. GIT...,879,"[-0.002092509064823389, 0.0050740959122776985,...",0.233602
2,data/project-florence-vl.txt,he common interface for all pre-training and d...,825,"[-0.010482553392648697, 0.006804130505770445, ...",0.244051
3,data/project-florence-vl.txt,that the more and more accurate pseudo labels ...,288,"[-0.004963461309671402, 0.007704003248363733, ...",0.233082


In [263]:
def get_answer(context, question):
    prompt = get_prompt(context, question)
    return complete(prompt)

answers = list(qa_df["chunk_text"].apply(lambda x: get_answer(x, question)))
for i, answer in enumerate(answers):
    print(f"Answer{i}:", answer)

Answer0: Project Florence is the overarching project that includes Florence-VL as a subproject. Florence-VL is focused on building new foundation models for Multimodal Intelligence, while the goals of the larger Project Florence are not specified in the text.
Answer1: I don't know. 
Answer2: I don't know.
Answer3: I don't know.
