## Get Training Data
Step 3 is to get the question/answer data to train the model on. We'll use ChatGPT for this.

## Environment Setup
This step uses the following libraries:
|Library|License|
|-|-|
| [python-dotenv](https://github.com/theskumar/python-dotenv) | BSD-3-Clause |
| [pydantic](https://github.com/pydantic/pydantic) | MIT |
| [OpenAI](https://github.com/openai/openai-python) | Apache 2.0 |

In [1]:
import os, json
from pathlib import Path

from openai import OpenAI
from dotenv import load_dotenv
from pydantic import BaseModel

In [12]:
load_dotenv()
OAI_KEY = os.environ["PAT_API_KEY"]

In [6]:
CHUNKED_JSONL = Path("FM5_0/QuantFactory/Llama-3.2-1B-GGUF/data/chunked/chunked.jsonl")
N_PASSAGES    = 30
OPENAI_MODEL  = "gpt-4o-mini"

In [7]:
passages = []
with CHUNKED_JSONL.open() as f:
    for line in f:
        passages.append(json.loads(line)["text"])

Using OpenAI's structured API, we can leverage Pydantic classes to make sure our response adheres to a format. This is a simple question/answer format.

In [8]:
class QA(BaseModel):
    question: str
    answer: str

Now log in and set the basic settings.

In [13]:
client = OpenAI(api_key=OAI_KEY)

In [14]:
TEMPERATURE   = 0.4
PROMPT        = ("Read the passage and emit exactly ONE question/answer pair with source information: "
                 "question – a factual question (5‑25 words). "
                 "answer – verbatim answer text from the passage (≤ 50 words).")

And get a response to see if it's right.

In [15]:
resp = client.responses.parse(
                model=OPENAI_MODEL,
                input = [
                    {
                        "role": "system",
                        "content": PROMPT
                    },
                    {
                        "role": "user",
                        "content": passages[0]
                    }
                ],
                text_format=QA
)


In [16]:
print(resp.output_parsed)

question='What is FM 5-0?' answer="FM 5-0, Planning and Orders Production, is the Army's comprehensive reference manual for planning."
