In [1]:
"""
A script using langchain to generate the ground-truth test dataset using Gemini API 

# TODO: Implement Group Feedback
"""

'\nA script using langchain to generate the ground-truth test dataset using Gemini API \n\n# TODO: Implement Group Feedback\n'

Initialize Globals for Notebook:

In [2]:
SYS = """
You are a financial advisor responsible for helping train an AI language model
to provide comprehensive, sound financial advice based on a company's financial 
history. You are tasked with writing questions and ground-truth answers for the 
task's benchmark dataset.

You will be provided a set of historical data on a given company. Given this data, 
you should come up with a question that would effectively test an LLM's ability to
give coherent and correct information about a company. The LLM may also be asked to
give some subjective advice about a company's financial outlook. In these cases, while
there isn't necessarily a "correct" answer, any LLM answer should be supported clearly
by the provided data. The questions you create should have these goals in mind, and the 
answers you generate should appropriately address the goals.

Format your output in the following format:

Do not include anything else in your response. 

Here is an example of what your output could look like:

<<Example>>

What do AAPL's earnings reports say about it's growth potential?

Investors can be confident about AAPL's long-term growth potential. It has showed 
consistent growth year-over-year, with revenue figures increasing by at least 2 percent
in every year. 

Here is the user input:

{format_instructions}

{query}

Don't wrap the JSON output in anything (markdown, etc). Just return the JSON object itself.
"""

STOCK_INFO_PATH = "/Users/willferguson/Downloads/GT Fall 2024/CS 6220/cs6220-project/general_question_answering/data/10-K"

Install and import Necessary Libraries

In [79]:
%pip install langchain-google-genai

Note: you may need to restart the kernel to use updated packages.


In [3]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
import pandas as pd
import json
import os
import re
from tqdm import tqdm
from time import sleep
from dotenv import load_dotenv

load_dotenv()
# TODO: Put other needed imports here

  from .autonotebook import tqdm as notebook_tqdm

For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  exec(code_obj, self.user_global_ns, self.user_ns)


True

Configure AI chat model we'll feed company information into (you'll need to set up gcloud services [here](https://cloud.google.com/docs/authentication/provide-credentials-adc#how-to) and create an API key).

In [4]:
class EvaluationSample(BaseModel):
    question: str = Field(description="The question the LLM generates based on given stock info.")
    answer: str = Field(description="The answer to the LLM generated question")

In [5]:
llm = ChatGoogleGenerativeAI(
    model='gemini-1.5-flash',
    temperature=0.5 # Make model a bit more creative than default
)

parser = JsonOutputParser(pydantic_object=EvaluationSample)

prompt = PromptTemplate(
    template=SYS,
    input_variables=['query'],
    partial_variables={'format_instructions': parser.get_format_instructions()}
)

chain = prompt | llm | parser

print(os.getenv("GOOGLE_API_KEY"))

AIzaSyAnpknevoXfdwMiNdDmW1qJ64rXwkqrQBU


In [6]:
labels = [
    "Question ID",
    "Company",
    "Question",
    "Answer"
]
entries = []
no = 1

for file in tqdm(os.listdir(STOCK_INFO_PATH)):
    with open(os.path.join(STOCK_INFO_PATH, file), 'r') as js:
        data = json.loads(js.read())
        js.close()

    # Remove Items that arent necessary
    delete_keys = [
        'cik',
        'htm_filing_link',
        'filing_html_index',
        'filing_type',
        'item_1', # Business description (probably known to model in most cases)
        'item_1B', # Staff Comments
        'item_3', # Legal Proceedings
        'item_4', # Mine Safety
        'item_9', # Accounting Details,
        'item_9A', # Procedurals
        'item_10', # Executives
        'item_11', # Compensation of Exec
        'item_14', # More accounting
        'item_15', # Schedules
        'item_16' # Summary
    ]
    [data.pop(key) for key in delete_keys if key in data.keys()] # Remove Keys that won't help LLM

    # For Now, Generate 3 Samples per company
    for i in range(3):
        try:
            output = chain.invoke({'query': json.dumps(data)})
            entries.append([f"qa_{re.sub(' ', repl='_', string=data['company'])}_{no}", data['company'], output['question'], output['answer']])
            no += 1
        except Exception as e:
            # If the output returned by the llm is no good, just skip this pass (might happen once per run)
            pass
        
        sleep(5) # Sleep Some so I don't get rate limited

 24%|██▍       | 112/460 [51:00<2:10:12, 22.45s/it]Retrying langchain_google_genai.chat_models._chat_with_retry.<locals>._chat_with_retry in 2.0 seconds as it raised InternalServerError: 500 An internal error has occurred. Please retry or report in https://developers.generativeai.google/guide/troubleshooting.
100%|██████████| 460/460 [3:19:34<00:00, 26.03s/it]  


Output the sample questions into a pandas dataframe, exported into CSV format:

In [7]:
df = pd.DataFrame(
    data=entries,
    columns=labels
)

df.to_csv('./evaluation_dataset.csv')