Prerequisites:
- Please prepare your API key in `.env` file like this:
```
OPENAI_API_KEY=sk-xxxx
```
- install necessary python package here:

In [None]:
#!pip install python-dotenv
#!pip install openai
#!pip install --upgrade langchain

### Key findings
- I used GPT-4 models because they provide much better accuracy than GPT-3.5 models on "out of scope" questions when controlling for prompt length. So I used "gpt-4-1106-preview" despite higher costs and latency.  
- Moving the questions very close to the instructions in the prompt made the model more likely to flag irrelevant questions as "out of scope" without verbose explanations.

### Future opportunities 
- With GPT-3.5, irrelevant questions yielded more verbose, wandering responses. But it can still work well with fewer than 3 questions.
- In practice, GPT-3.5 with an additional classifier could filter out irrelevant questions, either a fine-tuned small language model or a simple BERT classifier.

### How to play with the code
Go to section "**Test your case here**", and play with:
```
answers = answer_questions(questions, paragraphs)
```

In [1]:
import os
import json
from typing import List
import time

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

In [2]:
# model initialization
llm_model = "gpt-4-1106-preview"
chat = ChatOpenAI(temperature=0.0, model=llm_model)

In [3]:
# Test your connection
SIMPLE_TEMPLATE = """What is {thing}?"""
prompt_template = ChatPromptTemplate.from_template(SIMPLE_TEMPLATE)
prompt = prompt_template.format_messages(
                    thing="1+2"
                    )

t0 = time.time()
response = chat(prompt)
print(f"response: {response.content}; {time.time() - t0}")

response: 1 + 2 equals 3.; 1.0373380184173584


In [10]:
TEMPLATE_STRING = """
For the following text,

text: ```{input_text}```

Please answer the following questions:
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{{
{qa_format}
}}
"""

def answer_questions(questions: List[str], paragraphs: str) -> List[str]:
    """Takes a list of question strings and a paragraph string, builds a prompt query for an AI assistant bot, sends the prompt, and returns the list of extracted answer strings.

    Args:
    questions (List[str]): List of question strings

    paragraphs (str): Paragraph text

    Returns:
    List[str]: List of answer strings extracted from the AI response

    Raises:
    Exception: If no questions provided or paragraph too short/long

    Steps:

    Validate input
    Build a QA schema for the questions
    Format prompt with paragraph and QA schema
    Send prompt and get AI response
    Extract answers list from response
    The internal _build_qa_schema function formats the questions into the expected QA schema for the prompt.

    The chat() function sends the prompt to the actual AI service.

    """    
    def _build_qa_schema(questions: List[str]) -> str:
        qa = []
        for i, q in enumerate(questions):
            qa.append(f'"question_{i}": string // for question: "{q}", if the question is irrelevant to the input text, do not waste your time, just say "out of scope" right away')
        return '\n'.join(qa)    

    ### -- basic test --
    PARAGRAPH_LENGTH_MIN = 20
    PARAGRAPH_LENGTH_MAX = 5000
    
    if not questions:
        raise Exception("Please provide at least one question in `questions`")
        
    if len(paragraphs) < PARAGRAPH_LENGTH_MIN:
        raise Exception(f"Your paragraph should be longer than {PARAGRAPH_LENGTH_MIN} characters")

    if len(paragraphs) < PARAGRAPH_LENGTH_MIN:
        raise Exception(f"Your paragraph should be longer than {PARAGRAPH_LENGTH_MIN} characters")

    if len(paragraphs) > PARAGRAPH_LENGTH_MAX:
        raise Exception(f"Your paragraph should be within {PARAGRAPH_LENGTH_MAX} characters")
        
    ### -- main function --
    prompt_template = ChatPromptTemplate.from_template(TEMPLATE_STRING)
    prompt = prompt_template.format_messages(
                        input_text=paragraphs,
                        qa_format=_build_qa_schema(questions)
                        )

    response = chat(prompt)
    
    ### post process the output
    answers = json.loads(response.content[8:-4])
    return list(answers.values())
     

# Test your case here

For paragraph shorter than 1000 characters, it takes ~650 millisecond for each question added to `questions`.

In [16]:
paragraphs = """
If you’ve always dreamed about living in San Francisco, now might be the right time to make your move. \
LinkedIn reports that the Bay Area has seen the second-biggest worker population gain of any area in the U.S. \
And this makes sense — many companies are calling workers back into the office now that the pandemic is past us (perhaps to work under a hybrid model), and San Francisco salaries make it an attractive place regardless of work-from-home policies.

Interestingly, a lot of these rebounding employees are landing in the suburbs. \
In fact, San Francisco rents are slightly down year-over-year (decreasing by 4.3%) as of September 2023, in part due to the growth outside the city proper. \
You can chalk some of this up to San Francisco’s notoriously high real estate prices (i.e., prices are high in the city, so not many people move in, therefore rent prices don’t increase much).
"""

questions = [
    # relevant questions -- expected the accurate answer from paragraphs
    "What does LinkedIn report about the Bay Area recently?", # LinkedIn reports that the Bay Area has seen the second-biggest worker population gain of any area in the U.S.
    "Why does it make sense that the Bay Area has gained workers?", # It makes sense because many companies are calling workers back into the office now that the pandemic is past us, and San Francisco salaries make it an attractive place regardless of work-from-home policies.
    "Where are many of the rebounding employees landing?", # Many of the rebounding employees are landing in the suburbs.
    
    # irrelevant questions -- expected "out of scope"
    "What does Airbnb report about the Bay Area recently?", # out of scope
    "Who is the biggest competitor of Tesla?", # out of scope
    "What does LinkedIn report about the Seattle Area recently?", # out of scope
    "How to build a large language model (LLM) using pytorch?", # out of scope
]

In [18]:
# This is a normal case
t0 = time.time()
answers = answer_questions(questions, paragraphs)
print(f"response: { json.dumps(answers, indent=2) } \ntime: { round(time.time() - t0, 2) } seconds")

response: [
  "LinkedIn reports that the Bay Area has seen the second-biggest worker population gain of any area in the U.S.",
  "It makes sense that the Bay Area has gained workers because many companies are calling workers back into the office now that the pandemic is past, possibly to work under a hybrid model, and San Francisco salaries are attractive.",
  "Many of the rebounding employees are landing in the suburbs.",
  "out of scope",
  "out of scope",
  "out of scope",
  "out of scope"
] 
time: 5.18 seconds


In [14]:
# This will throw exception because the paragraphs is too short
answers = answer_questions(questions, paragraphs[:15])

Exception: Your paragraph should be longer than 20 characters

In [15]:
# This will throw exception because there is no question
answers = answer_questions([], paragraphs)

Exception: Please provide at least one question in `questions`

# Long article (more than a few paragraph)

I use Testla earning call from page 3 of
https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q3-2023-Update-3.pdf

This test aims to text the scenario where the LLM is fed with all the information.
Some valid questions are asked, some are irrelevant but in the context, some are easy questions like 1+2=? luring the LLM to answer.

The results are correct.
```
response: [
  "The main objectives of the company in Q3-2023 were reducing cost per vehicle, free cash flow generation while maximizing delivery volumes, and continued investment in AI and other growth projects.",
  "The cost of goods sold per vehicle decreased to approximately $37,500 in Q3.",
  "The $37,500 refers to the reduced cost of goods sold per vehicle for the company in Q3.",
  "out of scope",
  "out of scope",
  "out of scope",
  "out of scope"
] 
time: 5.04 seconds
```

In [62]:
paragraphs = """
Our main objectives remained unchanged in Q3-2023: reducing cost per
vehicle, free cash flow generation while maximizing delivery volumes and
continued investment in AI and other growth projects.

Our cost of goods sold per vehicle decreased to ~$37,500 in Q3. While
production cost at our new factories remained higher than our established
factories, we have implemented necessary upgrades in Q3 to enable
further unit cost reductions. We continue to believe that an industry leader needs to be a cost leader.

During a high interest rate environment, we believe focusing on
investments in R&D and capital expenditures for future growth, while
maintaining positive free cash flow, is the right approach. Year-to-date,
our free cash flow reached $2.3B while our cash and investments position continues to improve.

We have more than doubled the size of our AI training compute to
accommodate for our growing dataset as well as our Optimus robot
project. Our humanoid robot is currently being trained for simple tasks
through AI rather than hard-coded software, and its hardware is being
further upgraded.

Lastly, with a combined gross profit generation of over $0.5B in Q3, our
Energy Generation and Storage business and Services and Other business
have become meaningful contributors to our profitability.
"""

In [63]:
questions = [
    # relevant questions -- expected the accurate answer from paragraphs
    "What were the main objectives of the company in Q3-2023?",
    "How much did the cost of goods sold per vehicle decrease to in Q3?",
    "Please explain $37,500?",
    
    # irrelevant questions -- expected "out of scope"
    "What is 1 + 2?",
    "Who is the biggest competitor of Tesla?",
    "What is the cost of goods sold per vehicle in Q4?",
    "How to build a large language model (LLM) using pytorch?"
]
    
t0 = time.time()
answers = answer_questions(questions, paragraphs)
print(f"response: { json.dumps(answers, indent=2) } \ntime: { round(time.time() - t0, 2) } seconds")

response: [
  "The main objectives of the company in Q3-2023 were reducing cost per vehicle, free cash flow generation while maximizing delivery volumes, and continued investment in AI and other growth projects.",
  "The cost of goods sold per vehicle decreased to approximately $37,500 in Q3.",
  "The $37,500 refers to the reduced cost of goods sold per vehicle for the company in Q3.",
  "out of scope",
  "out of scope",
  "out of scope",
  "out of scope"
] 
time: 5.04 seconds


# Number of Questions vs Latency vs Accuracy

number of questions:
```
20 questions ~ 15.35s
10 questions ~ 8.7s
5 questions ~ 5.77s
```

The latency is almost linearly increasing to the number of questions.
So, in practice we should **limit the number of questions for latency**.

All the questions are **answered accurately**. 
Although the temperature was set to 0, the LLM's responses are slightly different in 3 cases 20Q, 10Q, 5Q. 
That's because the entire prompt was different.

In [65]:
# Use 20 questions
questions = [
    # relevant questions -- expected the accurate answer from paragraphs
    "What were the unchanged main objectives of the company in Q3-2023?",
    "What was the cost of goods sold per vehicle in Q3?",
    "How did the production costs at the new factories compare to the established ones?",
    "What actions were taken in the new factories in Q3?",
    "How does the company view the relationship between industry leadership and cost leadership?",
    "What strategy is deemed appropriate in a high interest rate environment?",
    "What was the company's free cash flow by the end of Q3?",
    "How has the company's cash and investments position changed by Q3?",
    "By what factor was the AI training compute increased?",
    "What is the purpose of expanding the AI training compute?",
    "What is the Optimus robot project?",
    "How is the humanoid robot, Optimus, being trained?",
    "What upgrades are being made to the Optimus robot's hardware?",
    "What was the combined gross profit of the Energy Generation and Storage business and Services and Other business in Q3?",
    "How have these two businesses contributed to the company's profitability?",
    "What is the significance of maximizing delivery volumes for the company?",
    "How does the company balance investment in growth projects with maintaining positive cash flow?",
    "In what ways is the company investing in AI?",
    "What are the implications of the cost reductions achieved in Q3?",
    "How does the company's strategy align with its goals for future growth?",
]
    
t0 = time.time()
answers = answer_questions(questions, paragraphs)
print(f"response: { json.dumps(answers, indent=2) } \ntime: { round(time.time() - t0, 2) } seconds")

response: [
  "reducing cost per vehicle, free cash flow generation while maximizing delivery volumes and continued investment in AI and other growth projects",
  "~$37,500",
  "Production cost at new factories remained higher than at established factories",
  "Implemented necessary upgrades to enable further unit cost reductions",
  "The company believes that an industry leader needs to be a cost leader",
  "Focusing on investments in R&D and capital expenditures for future growth, while maintaining positive free cash flow",
  "Year-to-date, the free cash flow reached $2.3B",
  "The cash and investments position continues to improve",
  "More than doubled",
  "To accommodate for the growing dataset as well as the Optimus robot project",
  "A humanoid robot project where the robot is being trained for simple tasks through AI",
  "Through AI rather than hard-coded software",
  "Its hardware is being further upgraded",
  "Over $0.5B",
  "They have become meaningful contributors to the co

In [66]:
# Use 10 questions
t0 = time.time()
answers = answer_questions(questions[:10], paragraphs)
print(f"response: { json.dumps(answers, indent=2) } \ntime: { round(time.time() - t0, 2) } seconds")

response: [
  "reducing cost per vehicle, free cash flow generation while maximizing delivery volumes and continued investment in AI and other growth projects",
  "~$37,500",
  "Production cost at new factories remained higher than at established factories",
  "Implemented necessary upgrades to enable further unit cost reductions",
  "The company believes that an industry leader needs to be a cost leader",
  "Focusing on investments in R&D and capital expenditures for future growth, while maintaining positive free cash flow",
  "Year-to-date, the free cash flow reached $2.3B",
  "The cash and investments position continues to improve",
  "More than doubled",
  "To accommodate for the growing dataset as well as the Optimus robot project"
] 
time: 8.7 seconds


In [67]:
# Use 5 questions
t0 = time.time()
answers = answer_questions(questions[:5], paragraphs)
print(f"response: { json.dumps(answers, indent=2) } \ntime: { round(time.time() - t0, 2) } seconds")

response: [
  "The unchanged main objectives of the company in Q3-2023 were reducing cost per vehicle, free cash flow generation while maximizing delivery volumes, and continued investment in AI and other growth projects.",
  "The cost of goods sold per vehicle in Q3 was approximately $37,500.",
  "The production costs at the new factories remained higher than at the established factories.",
  "The actions taken in the new factories in Q3 included implementing necessary upgrades to enable further unit cost reductions.",
  "The company believes that an industry leader needs to be a cost leader."
] 
time: 5.77 seconds


## Remove some paragraphs to invalidate some more questions

The results are still correct.
```
[
  "The main objectives of the company in Q3-2023 were reducing cost per vehicle, free cash flow generation while maximizing delivery volumes, and continued investment in AI and other growth projects.",
  "out of scope",
  "out of scope",
  "out of scope",
  "out of scope",
  "out of scope",
  "out of scope"
]
```

In [60]:
paragraphs = """
Our main objectives remained unchanged in Q3-2023: reducing cost per
vehicle, free cash flow generation while maximizing delivery volumes and
continued investment in AI and other growth projects.
"""

questions = [
    # relevant questions -- expected the accurate answer from paragraphs
    "What were the main objectives of the company in Q3-2023?",
    "How much did the cost of goods sold per vehicle decrease to in Q3?",
    "Please explain $37,500?",
    
    # irrelevant questions -- expected "out of scope"
    "What is 1 + 2?",
    "Who is the biggest competitor of Tesla?",
    "What is the cost of goods sold per vehicle in Q4?",
    "How to build a large language model (LLM) using pytorch?"
]
    
t0 = time.time()
answers = answer_questions(questions, paragraphs)
print(f"response: { json.dumps(answers, indent=2) } \ntime: { round(time.time() - t0, 2) } seconds")

response: [
  "The main objectives of the company in Q3-2023 were reducing cost per vehicle, free cash flow generation, maximizing delivery volumes, and continued investment in AI and other growth projects.",
  "out of scope",
  "out of scope",
  "out of scope",
  "out of scope",
  "out of scope",
  "out of scope"
] 
time: 5.03 seconds
