# Batch Q&A runs using WikipediaQA
Performs [document Q&A](https://python.langchain.com/en/latest/use_cases/question_answering.html) on Wikipedia articles. Use [Wikipedia-API](https://pypi.org/project/Wikipedia-API/) to search, retrieve, and beautify Wikipedia articles, [LangChain](https://python.langchain.com/en/latest/index.html) for the Q&A framework, and OpenAI & HuggingFace models for embeddings and LLMs.

For more details, see accompanying [blog post](https://georgesung.github.io/ai/llm-qa-eval-wikipedia/) and [Github repo](https://github.com/georgesung/LLM-WikipediaQA).

## Compute requirements
If you're only running OpenAI models for both embeddings and LLM, any CPU instance will work.

If you're running the open source embeddings and/or LLM models, make sure you have enough hardware resources. For Colab instances, choose the A100 GPU with high RAM. Otherwise, figure out what instance will work for you, e.g. [g5.12xlarge](https://aws.amazon.com/ec2/instance-types/g5/) (4x A10Gs), or an instance with an A100.

In [2]:
!nvidia-smi

Mon May  8 12:00:33 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0    40W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Batch run config

If you're running OpenAI models (ada embeddings and/or GPT 3.5), put your API key in this line:
```
os.environ["OPENAI_API_KEY"] = "hello"
```

To choose which embeddings and LLMs you wish to run, modify the following lists to include only the models you wish to run:
```
llms = [LLM_OPENAI_GPT35, LLM_FASTCHAT_T5_XL, LLM_FLAN_T5_XL, LLM_FLAN_T5_XXL]

embs = [EMB_INSTRUCTOR_XL, EMB_OPENAI_ADA]
```

When sweeping over different embeddings/LLMs, we will fix the LLM/embedding to a constant, e.g. in the [blog post](https://georgesung.github.io/ai/llm-qa-eval-wikipedia/):
* When sweeping over `embs = [EMB_INSTRUCTOR_XL, EMB_OPENAI_ADA]`, the LLM is fixed to `LLM_OPENAI_GPT35`
* When sweeping over `llms = [LLM_OPENAI_GPT35, LLM_FASTCHAT_T5_XL, LLM_FLAN_T5_XL, LLM_FLAN_T5_XXL]`, the embedding is fixed to `EMB_INSTRUCTOR_XL`


In [3]:
# Optional: OpenAI API key
import os
os.environ["OPENAI_API_KEY"] = "hello"

In [4]:
# Constants
EMB_OPENAI_ADA = "text-embedding-ada-002"
EMB_INSTRUCTOR_XL = "hkunlp/instructor-xl"

LLM_OPENAI_GPT35 = "gpt-3.5-turbo"
LLM_FLAN_T5_XXL = "google/flan-t5-xxl"
LLM_FLAN_T5_XL = "google/flan-t5-xl"
LLM_FASTCHAT_T5_XL = "lmsys/fastchat-t5-3b-v1.0"

In [5]:
# LLMs/embeddings over which to sweep
llms = [LLM_FASTCHAT_T5_XL, LLM_FLAN_T5_XL]  # full list: [LLM_OPENAI_GPT35, LLM_FASTCHAT_T5_XL, LLM_FLAN_T5_XL, LLM_FLAN_T5_XXL]
embs = [EMB_INSTRUCTOR_XL,]  # full list: [EMB_OPENAI_ADA, EMB_INSTRUCTOR_XL]

# Embeddings/LLMs to fix during sweep of the above
fixed_emb = EMB_INSTRUCTOR_XL
fixed_llm = LLM_FASTCHAT_T5_XL  # LLM_OPENAI_GPT35 was used in blog post

## Begin batch run execution

In [6]:
!pip install transformers langchain
!pip install accelerate bitsandbytes
!pip install chromadb beautifulsoup4 openai
!pip install tiktoken
!pip install sentence_transformers
!pip install wikipedia-api
!pip install InstructorEmbedding

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m93.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain
  Downloading langchain-0.0.161-py3-none-any.whl (758 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m759.0/759.0 kB[0m [31m65.8 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m30.0 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m112.6 MB/s[0m eta [36m0:00:00[0m
Col

In [7]:
import os
import re
import time

import requests
import wikipediaapi
from InstructorEmbedding import INSTRUCTOR
from langchain import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain.docstore.document import Document
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.text_splitter import CharacterTextSplitter, TokenTextSplitter
from langchain.vectorstores import Chroma
from transformers import pipeline

  from tqdm.autonotebook import trange


In [8]:
class WikipediaQA:
    question_check_template = """Given the following pieces of context, determine if the question is able to be answered by the information in the context.
Respond with 'yes' or 'no'.
{context}
Question: {question}
"""
    QUESTION_CHECK_PROMPT = PromptTemplate(
        template=question_check_template, input_variables=["context", "question"]
    )
    def __init__(self, config: dict={}):
        self.config = config
        self.embedding = None
        self.vectordb = None
        self.llm = None
        self.qa = None
    
    # The following class methods are useful to create global GPU model instances
    # This way we don't need to reload models in an interactive app,
    # and the same model instance can be used across multiple user sessions
    @classmethod
    def create_instructor_xl(cls):
        return HuggingFaceInstructEmbeddings(model_name=EMB_INSTRUCTOR_XL, model_kwargs={"device": "cuda"})
    
    @classmethod
    def create_flan_t5_xxl(cls, load_in_8bit=False):
        # Local flan-t5-xxl with 8-bit quantization for inference
        # Wrap it in HF pipeline for use with LangChain
        return pipeline(
            task="text2text-generation",
            model="google/flan-t5-xxl",
            model_kwargs={"device_map": "auto", "load_in_8bit": load_in_8bit, "max_length": 512, "temperature": 0.}
        )
    
    @classmethod
    def create_flan_t5_xl(cls, load_in_8bit=False):
        return pipeline(
            task="text2text-generation",
            model="google/flan-t5-xl",
            model_kwargs={"device_map": "auto", "load_in_8bit": load_in_8bit, "max_length": 512, "temperature": 0.}
        )
    
    @classmethod
    def create_fastchat_t5_xl(cls, load_in_8bit=False):
        return pipeline(
            task="text2text-generation",
            model = "lmsys/fastchat-t5-3b-v1.0",
            model_kwargs={"device_map": "auto", "load_in_8bit": load_in_8bit, "max_length": 512, "temperature": 0.}
        )

    def init_models(self) -> None:
        """ Initialize new models based on config """
        load_in_8bit = self.config["load_in_8bit"]
        
        if self.config["embedding"] == EMB_OPENAI_ADA:
            # OpenAI ada embeddings API
            self.embedding = OpenAIEmbeddings()
        elif self.config["embedding"] == EMB_INSTRUCTOR_XL:
            # Local INSTRUCTOR-XL embeddings
            if self.embedding is None:
                self.embedding = WikipediaQA.create_instructor_xl()
        else:
            raise ValueError("Invalid config")

        if self.config["llm"] == LLM_OPENAI_GPT35:
            # OpenAI GPT 3.5 API
            pass
        elif self.config["llm"] == LLM_FLAN_T5_XL:
            if self.llm is None:
                self.llm = WikipediaQA.create_flan_t5_xl(load_in_8bit=load_in_8bit)
        elif self.config["llm"] == LLM_FLAN_T5_XXL:
            if self.llm is None:
                self.llm = WikipediaQA.create_flan_t5_xxl(load_in_8bit=load_in_8bit)
        elif self.config["llm"] == LLM_FASTCHAT_T5_XL:
            if self.llm is None:
                self.llm = WikipediaQA.create_fastchat_t5_xl(load_in_8bit=load_in_8bit)
        else:
            raise ValueError("Invalid config")

    def search_and_read_page(self, search_query: str) -> tuple[str, str]:
        """
        Searches wikipedia for the given query, take the first result
        Then chunks the text of it and indexes it into a vector store

        Returns the title and text of the page
        """
        # Search Wikipedia and get first result
        wiki_wiki = wikipediaapi.Wikipedia('en')
        docs = {}
        search_url = f"https://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch={search_query}"
        search_response = requests.get(search_url).json()
        wiki_title = search_response["query"]["search"][0]["title"]
        wiki_text = wiki_wiki.page(wiki_title).text
        docs[wiki_title] = wiki_text

        # Create new vector store and index it
        self.vectordb = None
        documents = [Document(page_content=docs[title]) for title in docs]

        # Split by section, then split by token limmit
        text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
        texts = text_splitter.split_documents(documents)
        text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=10, encoding_name="cl100k_base")  # may be inexact
        texts = text_splitter.split_documents(texts)

        self.vectordb = Chroma.from_documents(documents=texts, embedding=self.embedding)

        # Create the LangChain chain
        if self.config["llm"] == LLM_OPENAI_GPT35:
            # Use ChatGPT API
            self.qa = RetrievalQA.from_chain_type(llm=OpenAI(model_name=LLM_OPENAI_GPT35, temperature=0.), chain_type="stuff",\
                                        retriever=self.vectordb.as_retriever(search_kwargs={"k":4}))
        else:
            # Use local LLM
            hf_llm = HuggingFacePipeline(pipeline=self.llm)
            self.qa = RetrievalQA.from_chain_type(llm=hf_llm, chain_type="stuff",\
                                        retriever=self.vectordb.as_retriever(search_kwargs={"k":4}))
            if self.config["question_check"]:
                self.q_check = RetrievalQA.from_chain_type(llm=hf_llm, chain_type="stuff",\
                             retriever=self.vectordb.as_retriever(search_kwargs={"k":4}))
                self.q_check.combine_documents_chain.llm_chain.prompt = WikipediaQA.QUESTION_CHECK_PROMPT

        return wiki_title, wiki_text

    def get_answer(self, question: str) -> str:
        if self.config["llm"] != LLM_OPENAI_GPT35 and self.config["question_check"]:
            # For local LLMs, do a self-check to see if question can be answered
            # If unanswerable, respond with "I don't know"
            answerable = self.q_check.run(question)
            if self.config["llm"] == LLM_FASTCHAT_T5_XL:
                answerable = self._clean_fastchat_t5_output(answerable)
            if answerable != "yes":
                return "I don't know"
        
        # Answer the question
        answer = self.qa.run(question)
        if self.config["llm"] == LLM_FASTCHAT_T5_XL:
            answer = self._clean_fastchat_t5_output(answer)
        return answer
    
    def _clean_fastchat_t5_output(self, answer: str) -> str:
        # Remove <pad> tags, double spaces, trailing newline
        answer = re.sub(r"<pad>\s+", "", answer)
        answer = re.sub(r"  ", " ", answer)
        answer = re.sub(r"\n$", "", answer)
        return answer

In [9]:
# Questions to ask from different Wikipeida articles
article_questions = {
    "GPT-4": [
        "Who created GPT-4?",
        "How is GPT-4 better than its predecessors?",
        "Who played in the finals?",  # should say "I don't know"
    ],
    "2022 FIFA World Cup": [
        "Where was the event held? Name the city and country",
        "Who won the tournament?",
        "Who played in the finals?",
        "Who had the broadcasting rights?",
        "Which two teams qualified for the knock-out round from Group D?",
        "How is GPT-4 better than its predecessors?",  # should say "I don't know"
    ],
    "Diablo IV": [
        "What classes are available?",
        "When is the release date?",
        "How is the vanishing gradient issue addressed?",  # should say "I don't know"
    ],
    "Stable Diffusion": [
        "How does Stable Diffusion work?",
        "Who created Stable Diffusion?",
        "Describe the model architecture",
        "Who played in the finals?",  # should say "I don't know"
    ],
}

## Evaluate LLMs

Select the LLM(s) to run below. If running LLMs locally using Colab, try the high RAM option with a premium GPU. Otherwise, you may not have enough hardware resources.

If you run out of GPU memory, try setting `"load_in_8bit": True` below. This is useful if you're running the FLAN_T5_XXL (11B) model.

If running CPU only, make sure to select the OpenAI ada embeddings along with GPT 3.5.

In [10]:
t0 = time.time()
answers = {}

for llm in llms:
    config = {"embedding": fixed_emb, "llm": llm, "question_check": True, "load_in_8bit": False}
    qa = WikipediaQA(config)
    qa.init_models()
    answers[llm] = {}
    
    for article in article_questions:
        qa.search_and_read_page(article)
        answers[llm][article] = {}
        
        for question in article_questions[article]:
            print(f"Getting answer using {llm} for {article} - {question}")
            answer = qa.get_answer(question)
            answers[llm][article][question] = answer
            
print(f"Time taken: {int(time.time() - t0)} sec")

Downloading (…)7f436/.gitattributes:   0%|          | 0.00/1.48k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/270 [00:00<?, ?B/s]

Downloading (…)/2_Dense/config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.15M [00:00<?, ?B/s]

Downloading (…)0daf57f436/README.md:   0%|          | 0.00/66.3k [00:00<?, ?B/s]

Downloading (…)af57f436/config.json:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/4.96G [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)7f436/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.40k [00:00<?, ?B/s]

Downloading (…)f57f436/modules.json:   0%|          | 0.00/461 [00:00<?, ?B/s]

load INSTRUCTOR_Transformer
max_seq_length  512


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/6.71G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/142 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.40k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]



Getting answer using lmsys/fastchat-t5-3b-v1.0 for GPT-4 - Who created GPT-4?




Getting answer using lmsys/fastchat-t5-3b-v1.0 for GPT-4 - How is GPT-4 better than its predecessors?
Getting answer using lmsys/fastchat-t5-3b-v1.0 for GPT-4 - Who played in the finals?


Token indices sequence length is longer than the specified maximum sequence length for this model (2153 > 2048). Running this sequence through the model will result in indexing errors


Getting answer using lmsys/fastchat-t5-3b-v1.0 for 2022 FIFA World Cup - Where was the event held? Name the city and country
Getting answer using lmsys/fastchat-t5-3b-v1.0 for 2022 FIFA World Cup - Who won the tournament?
Getting answer using lmsys/fastchat-t5-3b-v1.0 for 2022 FIFA World Cup - Who played in the finals?




Getting answer using lmsys/fastchat-t5-3b-v1.0 for 2022 FIFA World Cup - Who had the broadcasting rights?
Getting answer using lmsys/fastchat-t5-3b-v1.0 for 2022 FIFA World Cup - Which two teams qualified for the knock-out round from Group D?
Getting answer using lmsys/fastchat-t5-3b-v1.0 for 2022 FIFA World Cup - How is GPT-4 better than its predecessors?




Getting answer using lmsys/fastchat-t5-3b-v1.0 for Diablo IV - What classes are available?
Getting answer using lmsys/fastchat-t5-3b-v1.0 for Diablo IV - When is the release date?
Getting answer using lmsys/fastchat-t5-3b-v1.0 for Diablo IV - How is the vanishing gradient issue addressed?




Getting answer using lmsys/fastchat-t5-3b-v1.0 for Stable Diffusion - How does Stable Diffusion work?
Getting answer using lmsys/fastchat-t5-3b-v1.0 for Stable Diffusion - Who created Stable Diffusion?
Getting answer using lmsys/fastchat-t5-3b-v1.0 for Stable Diffusion - Describe the model architecture
Getting answer using lmsys/fastchat-t5-3b-v1.0 for Stable Diffusion - Who played in the finals?
load INSTRUCTOR_Transformer
max_seq_length  512


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.44k [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/50.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00002.bin:   0%|          | 0.00/9.45G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00002.bin:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (674 > 512). Running this sequence through the model will result in indexing errors


Getting answer using google/flan-t5-xl for GPT-4 - Who created GPT-4?
Getting answer using google/flan-t5-xl for GPT-4 - How is GPT-4 better than its predecessors?
Getting answer using google/flan-t5-xl for GPT-4 - Who played in the finals?




Getting answer using google/flan-t5-xl for 2022 FIFA World Cup - Where was the event held? Name the city and country
Getting answer using google/flan-t5-xl for 2022 FIFA World Cup - Who won the tournament?
Getting answer using google/flan-t5-xl for 2022 FIFA World Cup - Who played in the finals?
Getting answer using google/flan-t5-xl for 2022 FIFA World Cup - Who had the broadcasting rights?
Getting answer using google/flan-t5-xl for 2022 FIFA World Cup - Which two teams qualified for the knock-out round from Group D?
Getting answer using google/flan-t5-xl for 2022 FIFA World Cup - How is GPT-4 better than its predecessors?




Getting answer using google/flan-t5-xl for Diablo IV - What classes are available?
Getting answer using google/flan-t5-xl for Diablo IV - When is the release date?
Getting answer using google/flan-t5-xl for Diablo IV - How is the vanishing gradient issue addressed?




Getting answer using google/flan-t5-xl for Stable Diffusion - How does Stable Diffusion work?
Getting answer using google/flan-t5-xl for Stable Diffusion - Who created Stable Diffusion?
Getting answer using google/flan-t5-xl for Stable Diffusion - Describe the model architecture
Getting answer using google/flan-t5-xl for Stable Diffusion - Who played in the finals?
Time taken: 815 sec


In [11]:
# Print answers in markdown table format

# Header
md_str = "|Article|Question|"
for llm in llms:
    md_str += f"{llm}|"
md_str += "\n|"
for _ in range(len(llms)+2):
    md_str += "--|"
md_str += "\n"

# Content
for article in article_questions:
    md_str += "|"
    for question in article_questions[article]:
        md_str += f"{article}|{question}|"
        for llm in llms:
            # Update formatting to work with markdown
            answer = answers[llm][article][question]
            answer = answer.replace("\n", "<br>")
            md_str += f"{answer}|"
        md_str += "\n"

In [12]:
# Render markdown
from IPython.display import display, Markdown
display(Markdown(md_str))

|Article|Question|lmsys/fastchat-t5-3b-v1.0|google/flan-t5-xl|
|--|--|--|--|
|GPT-4|Who created GPT-4?|OpenAI.|OpenAI|
GPT-4|How is GPT-4 better than its predecessors?|GPT-4 is better than its predecessors because it can take images as well as text as input.|GPT-4 can take images as well as text as input|
GPT-4|Who played in the finals?|I don't know|I don't know|
|2022 FIFA World Cup|Where was the event held? Name the city and country|The event was held in Doha, Qatar.|Qatar|
2022 FIFA World Cup|Who won the tournament?|Argentina won the tournament.|Argentina|
2022 FIFA World Cup|Who played in the finals?|Argentina and France.|Argentina and France|
2022 FIFA World Cup|Who had the broadcasting rights?|The broadcasting rights for the 2022 FIFA World Cup were held by Bell Media (Canada), Fox (U.S. English), and NBCUniversal (U.S. Spanish).|Fox Sports|
2022 FIFA World Cup|Which two teams qualified for the knock-out round from Group D?|France and Australia.|I don't know|
2022 FIFA World Cup|How is GPT-4 better than its predecessors?|I don't know|I don't know|
|Diablo IV|What classes are available?|Barbarian, Sorceress, Druid, Rogue, and Necromancer.|Barbarian, Sorceress, Druid, Rogue, and Necromancer|
Diablo IV|When is the release date?|June 6, 2023.|June 6, 2023|
Diablo IV|How is the vanishing gradient issue addressed?|I don't know|I don't know|
|Stable Diffusion|How does Stable Diffusion work?|Stable Diffusion works by using a latent diffusion model, a kind of deep generative neural network, to generate detailed images conditioned on text descriptions. The model can generate new images from scratch through the use of a text prompt describing elements to be included or omitted from the output, or existing images can be re-drawn by the model to incorporate new elements described by a text prompt through its diffusion-denoising mechanism. It also allows the use of prompts to partially alter existing images via inpainting and outpainting.|It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt|
Stable Diffusion|Who created Stable Diffusion?|The start-up company Stability AI in collaboration with a number of academic researchers and non-profit organizations.|Stability AI|
Stable Diffusion|Describe the model architecture|Stable Diffusion uses a kind of diffusion model (DM), called a latent diffusion model (LDM) developed by the CompVis group at LMU Munich. Introduced in 2015, diffusion models are trained with the objective of removing successive applications of Gaussian noise on training images, which can be thought of as a sequence of denoising autoencoders. Stable Diffusion consists of 3 parts: the variational autoencoder (VAE), U-Net, and an optional text encoder. The VAE encoder compresses the image from pixel space to a smaller dimensional latent space, capturing a more fundamental semantic meaning of the image. Gaussian noise is iteratively applied to the compressed latent representation during forward diffusion. The U-Net block, composed of a ResNet backbone, denoises the output from forward diffusion backwards to obtain a latent representation. Finally, the VAE decoder generates the final image by converting the representation back into pixel space. The denoising step can be flexibly conditioned on a string of text, an image, or another modality. For conditioning on text, the fixed, pretrained CLIP ViT-L/14 text encoder is used to transform text prompts to an embedding space. Researchers point to increased computational efficiency for training and generation as an advantage of LDMs.|Stable Diffusion consists of 3 parts: the variational autoencoder (VAE), U-Net, and an optional text encoder|
Stable Diffusion|Who played in the finals?|I don't know|I don't know|


## Evaluate embeddings

In [14]:
t0 = time.time()
answers = {}

for emb in embs:
    config = {"embedding": emb, "llm": fixed_llm, "question_check": True, "load_in_8bit": False}
    qa = WikipediaQA(config)
    qa.init_models()
    answers[emb] = {}
    
    for article in article_questions:
        qa.search_and_read_page(article)
        answers[emb][article] = {}
        
        for question in article_questions[article]:
            print(f"Getting answer using {emb} for {article} - {question}")
            answer = qa.get_answer(question)
            answers[emb][article][question] = answer
            #print(f"{llm} -- {article} -- {question}:\n{answer}\n")
            
print(f"Time taken: {int(time.time() - t0)} sec")

load INSTRUCTOR_Transformer
max_seq_length  512




Getting answer using hkunlp/instructor-xl for GPT-4 - Who created GPT-4?




Getting answer using hkunlp/instructor-xl for GPT-4 - How is GPT-4 better than its predecessors?
Getting answer using hkunlp/instructor-xl for GPT-4 - Who played in the finals?


Token indices sequence length is longer than the specified maximum sequence length for this model (2153 > 2048). Running this sequence through the model will result in indexing errors


Getting answer using hkunlp/instructor-xl for 2022 FIFA World Cup - Where was the event held? Name the city and country
Getting answer using hkunlp/instructor-xl for 2022 FIFA World Cup - Who won the tournament?
Getting answer using hkunlp/instructor-xl for 2022 FIFA World Cup - Who played in the finals?




Getting answer using hkunlp/instructor-xl for 2022 FIFA World Cup - Who had the broadcasting rights?
Getting answer using hkunlp/instructor-xl for 2022 FIFA World Cup - Which two teams qualified for the knock-out round from Group D?
Getting answer using hkunlp/instructor-xl for 2022 FIFA World Cup - How is GPT-4 better than its predecessors?




Getting answer using hkunlp/instructor-xl for Diablo IV - What classes are available?
Getting answer using hkunlp/instructor-xl for Diablo IV - When is the release date?
Getting answer using hkunlp/instructor-xl for Diablo IV - How is the vanishing gradient issue addressed?




Getting answer using hkunlp/instructor-xl for Stable Diffusion - How does Stable Diffusion work?
Getting answer using hkunlp/instructor-xl for Stable Diffusion - Who created Stable Diffusion?
Getting answer using hkunlp/instructor-xl for Stable Diffusion - Describe the model architecture
Getting answer using hkunlp/instructor-xl for Stable Diffusion - Who played in the finals?
Time taken: 1406 sec


In [15]:
# Print answers in markdown table format

# Header
md_str = "|Embedding|Question|"
for emb in embs:
    md_str += f"{emb}|"
md_str += "\n|"
for _ in range(len(embs)+2):
    md_str += "--|"
md_str += "\n"

# Content
for article in article_questions:
    md_str += "|"
    for question in article_questions[article]:
        md_str += f"{article}|{question}|"
        for emb in embs:
            # Update formatting to work with markdown
            answer = answers[emb][article][question]
            answer = answer.replace("\n", "<br>")
            md_str += f"{answer}|"
        md_str += "\n"

In [16]:
# Render markdown
from IPython.display import display, Markdown
display(Markdown(md_str))

|Embedding|Question|hkunlp/instructor-xl|
|--|--|--|
|GPT-4|Who created GPT-4?|OpenAI.|
GPT-4|How is GPT-4 better than its predecessors?|GPT-4 is better than its predecessors because it can take images as well as text as input.|
GPT-4|Who played in the finals?|I don't know|
|2022 FIFA World Cup|Where was the event held? Name the city and country|The event was held in Doha, Qatar.|
2022 FIFA World Cup|Who won the tournament?|Argentina won the tournament.|
2022 FIFA World Cup|Who played in the finals?|Argentina and France.|
2022 FIFA World Cup|Who had the broadcasting rights?|The broadcasting rights for the 2022 FIFA World Cup were held by Bell Media (Canada), Fox (U.S. English), and NBCUniversal (U.S. Spanish).|
2022 FIFA World Cup|Which two teams qualified for the knock-out round from Group D?|France and Australia.|
2022 FIFA World Cup|How is GPT-4 better than its predecessors?|I don't know|
|Diablo IV|What classes are available?|Barbarian, Sorceress, Druid, Rogue, and Necromancer.|
Diablo IV|When is the release date?|June 6, 2023.|
Diablo IV|How is the vanishing gradient issue addressed?|I don't know|
|Stable Diffusion|How does Stable Diffusion work?|Stable Diffusion works by using a latent diffusion model, a kind of deep generative neural network, to generate detailed images conditioned on text descriptions. The model can generate new images from scratch through the use of a text prompt describing elements to be included or omitted from the output, or existing images can be re-drawn by the model to incorporate new elements described by a text prompt through its diffusion-denoising mechanism. It also allows the use of prompts to partially alter existing images via inpainting and outpainting.|
Stable Diffusion|Who created Stable Diffusion?|The start-up company Stability AI in collaboration with a number of academic researchers and non-profit organizations.|
Stable Diffusion|Describe the model architecture|Stable Diffusion uses a kind of diffusion model (DM), called a latent diffusion model (LDM) developed by the CompVis group at LMU Munich. Introduced in 2015, diffusion models are trained with the objective of removing successive applications of Gaussian noise on training images, which can be thought of as a sequence of denoising autoencoders. Stable Diffusion consists of 3 parts: the variational autoencoder (VAE), U-Net, and an optional text encoder. The VAE encoder compresses the image from pixel space to a smaller dimensional latent space, capturing a more fundamental semantic meaning of the image. Gaussian noise is iteratively applied to the compressed latent representation during forward diffusion. The U-Net block, composed of a ResNet backbone, denoises the output from forward diffusion backwards to obtain a latent representation. Finally, the VAE decoder generates the final image by converting the representation back into pixel space. The denoising step can be flexibly conditioned on a string of text, an image, or another modality. For conditioning on text, the fixed, pretrained CLIP ViT-L/14 text encoder is used to transform text prompts to an embedding space. Researchers point to increased computational efficiency for training and generation as an advantage of LDMs.|
Stable Diffusion|Who played in the finals?|I don't know|
