# <font color=red>LangChain:  Example Generate Causal Reasoning Pre-training Data</font>
- https://docs.langchain.com/docs

<span style="font-family:'Comic Sans MS', cursive, sans-serif;"><font color=orange>
## Demo 1 - Tiny example to generate data for a {subject}, e.g. biology
</font></span>
This program demos a way to generate causal reasoning examples that might be used to pre-train a model like AuroraGPT.</br>
The program works with GPT-4 trained "as-is", i.e. it does not use "trusted documents" to make sure all is correct.</br>
It also does not try to avoid hallucination with prompt directives such as "don't make things up".

In [None]:
import sys, os, random

from langchain import LLMChain, PromptTemplate
from langchain.chat_models import ChatOpenAI

NUM_EXAMPLES_TO_GEN = 11

# MODEL = "gpt-3.5-turbo"
MODEL = "gpt-4"

TEMPLATE = """
You are generating data which will be used to fine-tune a Large Language Model.
The model will be used to work with advanced high-school students studying {subject}.
You will be generating single prompt/response pairs which present examples of
causal reasoning that advanced high-school students in {subject} can understand.
There may also be some previously generated examples provided in the prompt to
help you to ensure uniqueness and diversity.
Please keep each response to a "reasonable" length, i.e. no more than 100 words.
The prompt/response pair that you create should be in this format:
Prompt: <prompt goes here>
Response: <response goes here>
\n
Question:
{question}
"""

QUESTION = """
Please generate a single prompt/response pair which presents an example of
causal reasoning that advanced high-school students in {subject} can understand.
Be sure to avoid duplication with prior questions listed.
Be sure to develop the response using a step-by-step process.
"""

llm = ChatOpenAI(model_name=MODEL,temperature=0.5) # ,max_tokens=100)
prompt_template = PromptTemplate.from_template(TEMPLATE)
answer_chain = LLMChain(llm=llm, prompt=prompt_template)

def generate_one_example(prev_examples, temperature=0.5):
    if len(prev_examples) > 8:
        prev_examples = random.sample(prev_examples,8)
    prevs = ""
    for example in prev_examples:
        prevs += "Previous example:\n" + example + "\n"
    question = prevs + QUESTION
    answer = answer_chain.run(subject="biology",question=question)
    return answer

prev_examples = []
for i in range(NUM_EXAMPLES_TO_GEN):
    print(f'Generating example {i}')
    example = generate_one_example( prev_examples, temperature=0.5 )
    prev_examples.append(example)

for (i,pe) in enumerate(prev_examples):
    print(f"\nCausal Reasoning Example {i}\n----------------------------")
    print(f"{pe}\n")