# Generate Dataset from Your Documentation

![image](./imgs/‎GenAIEnterprises.‎015.png)

![image](./imgs/‎GenAIEnterprises.‎016.png)

![image](./imgs/‎GenAIEnterprises.‎017.png)

# Initial Setup

In [9]:
import boto3
import os
import openai
from llama_index import ServiceContext
from llama_index.llms import OpenAI
from IPython.display import display, Markdown
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

def get_api_key(ssm_client, parameter_path):
    '''Get the OpenAI API key from the SSM Parameter Store'''
    try:
        response = ssm_client.get_parameter(
            Name=parameter_path,
            WithDecryption=True
        )
        return response['Parameter']['Value']
    except ssm_client.exceptions.ParameterNotFound:
        raise Exception(f'Parameter {parameter_path} not found in SSM Parameter Store')

# Create an SSM client using Boto3
region_name = os.getenv('AWS_REGION', 'us-east-1') 
ssm = boto3.client('ssm', region_name=region_name)

openai_api_key = get_api_key(ssm_client=ssm, parameter_path='/openai/api_key')
langchain_api_key = get_api_key(ssm_client=ssm, parameter_path='/langchain/api_key')


os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.langchain.plus"
os.environ['OPENAI_API_KEY'] = openai_api_key
os.environ["LANGCHAIN_API_KEY"] = langchain_api_key
openai.api_key = openai_api_key

# Set the model variable based on the current date
llm_model = "gpt-3.5-turbo-16k"

# Create the vector store and embedding function
embedding = OpenAIEmbeddings()
vectordb = Chroma(
    persist_directory='docs/chroma/',
    embedding_function=embedding
)

# Model Evaluation

In [11]:
baseline_llm_model = "gpt-3.5-turbo"
ft_llm_model = "ft:gpt-3.5-turbo-0613:neurons-lab::7xi7PZeg"

In [12]:
import json

def load_data(path: str):
    fp = open(path, "r")
    data_dicts = []
    for line in fp:
        d = json.loads(line)
        data_dicts.append(d)
    return data_dicts

train_dicts = load_data("data/qa_pairs_train.jsonl")
eval_dicts = load_data("data/qa_pairs_val.jsonl")

## Manual Evaluation

In [82]:
from langchain.schema import HumanMessage, SystemMessage
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={'fetch_k': 5, 'k': 7})

def query_model(model, query, retriever=None, temperature=0):
    """Query the model with the given data and return the result.
    
    Args:
        model (str): The model name to use.
        query: The query to use.
        retriever (Retriever): The retriever to use.
        temperature (float): The temperature to use when querying the model.
    
    Returns:
        dict: The query and result."""

    llm = ChatOpenAI(model_name=model, temperature=temperature)

    if not retriever:
        messages = [
            SystemMessage(
                content="Provide answers to questions based on the company handbook to help employees quickly find the information they need. Ensure that your responses are concise and directly address the questions asked without providing additional information."
            ),
            HumanMessage(
                content=query
            ),
        ]
        result = llm(messages)
        return result.content
    else:
        qa_chain = RetrievalQA.from_chain_type(
            llm,
            retriever=retriever,
            return_source_documents=True,
        )
        result = qa_chain(query)
        return result["result"]
        
        


In [132]:

query = train_dicts[48]['query']
expected_response = train_dicts[48]['response']

display(Markdown("## Query:"))
display(Markdown(query))
display(Markdown("## Expected Response:"))
display(Markdown(expected_response))

display(Markdown("## Baseline LLM:"))
baseline_result = query_model(baseline_llm_model, query)
display(Markdown(f"### Result: \n {baseline_result}"))

display(Markdown("## FineTuned LLM:"))
ft_result = query_model(ft_llm_model, query)
display(Markdown(f"### Result: \n {ft_result}"))

display(Markdown("## FineTuned LLM + RAG:"))
ft_rag_result = query_model(ft_llm_model, query, retriever=retriever)
display(Markdown(f"### Result: \n {ft_rag_result}"))



## Query:

What is the purpose of a buddy in the Made Tech company?

## Expected Response:

The purpose of a buddy in the Made Tech company is to help new team members connect and find their way, especially during their first 3 months with the company. Buddies are friendly and helpful individuals who proactively check in with new starters, offer their help and answer questions, explain how things work in the company, and provide guidance and support. They play an important role in helping new team members integrate into Made Tech.

## Baseline LLM:

### Result: 
 The purpose of a buddy in Made Tech is to provide support and guidance to new employees during their onboarding process.

## FineTuned LLM:

### Result: 
 The purpose of a buddy in the Made Tech company is to provide support and guidance to new joiners during their first few weeks. Buddies are responsible for helping new joiners settle in, answering any questions they may have, and assisting them in finding the right people to speak to. They also organize regular catch-ups with new joiners to check on their progress and well-being. Overall, buddies play a crucial role in making new joiners feel welcome and helping them navigate their initial experiences at Made Tech.

## FineTuned LLM + RAG:

### Result: 
 The purpose of a buddy in the Made Tech company is to help new team members connect and find their way, especially in their first 3 months with the company when everything is new to them. Buddies are friendly individuals who provide support and guidance to new team members, both in-person when back in the office (e.g., showing them good lunch places, taking them out for coffee or lunch, helping them navigate the local area) and remotely through video calls or "coffee chats". Being a buddy is an important role in helping new team members integrate into Made Tech.

## Embedding Distance

To measure semantic similarity (or dissimilarity) between a prediction and a reference label string, you could use a vector vector distance metric the two embedded representations using the embedding_distance evaluator.

Note: This returns a **distance** score, meaning that the lower the number, the **more** similar the prediction is to the reference, according to their embedded representation.

In [154]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.evaluation import load_evaluator

embedding_model = HuggingFaceEmbeddings()
hf_evaluator = load_evaluator("embedding_distance", embeddings=embedding_model)


In [155]:
hf_evaluator.evaluate_strings(prediction="I shall go", reference="I shan't go")

{'score': 0.5486443280477362}

In [156]:
hf_evaluator.evaluate_strings(prediction="I shall go", reference="I will go")

{'score': 0.21018880025138598}

In [142]:
display(Markdown("## Baseline LLM:"))
display(hf_evaluator.evaluate_strings(prediction=baseline_result, reference=expected_response))

display(Markdown("## FineTuned LLM:"))
display(hf_evaluator.evaluate_strings(prediction=ft_result, reference=expected_response))

display(Markdown("## FineTuned LLM + RAG:"))
display(hf_evaluator.evaluate_strings(prediction=ft_rag_result, reference=expected_response))

## Baseline LLM:

{'score': 0.060356865075396504}

## FineTuned LLM:

{'score': 0.025603949751298205}

## FineTuned LLM + RAG:

{'score': 0.02472507271870661}

In [90]:
import pandas as pd
from tqdm.notebook import tqdm

from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

pd.set_option("display.max_colwidth", None)

EVAL_PROMPT_TMPL = PromptTemplate.from_template(
    """\
We provide a question and the 'ground-truth' answer. We also provide \
the predicted answer.

Evaluate whether the predicted answer is correct, given its similarity \
to the ground-truth. If details provided in predicted answer are reflected \
in the ground-truth answer, return "YES". To return "YES", the details don't \
need to exactly match. Be lenient in evaluation if the predicted answer \
is missing a few details. Try to make sure that there are no blatant mistakes. \
Otherwise, return "NO".

Question: {question}
Ground-truth Answer: {gt_answer}
Predicted Answer: {pred_answer}
Evaluation Result: \
"""
)


def eval_match_gt(query, gt_response, pred_response):
    llm = ChatOpenAI(model_name="gpt-4", temperature=0)
    fmt_prompt = EVAL_PROMPT_TMPL.format(
        question=query,
        gt_answer=gt_response,
        pred_answer=pred_response,
    )

    messages = [
        HumanMessage(
            content=fmt_prompt
        ),
    ]

    result = llm(messages)
    if "yes" in str(result).lower():
        return 1
    else:
        return 0


def run_evals(eval_dicts):
    """Run evals - fine-tuned model, RAG system, and base model."""

    raw_responses = []
    for eval_dict in tqdm(eval_dicts):
        print(eval_dict)
        gt_response = eval_dict["response"]
        query_model(ft_llm_model, train_dicts[48], retriever=retriever)
        ft_rag_response = str(query_model(ft_llm_model, eval_dict["query"], retriever=retriever))
        ft_response = str(query_model(ft_llm_model, eval_dict["query"]))
        rag_response = str(query_model(baseline_llm_model, eval_dict["query"], retriever=retriever))
        base_response = str(query_model(baseline_llm_model, eval_dict["query"]))

        # try evaluations
        ft_rag_eval = eval_match_gt(eval_dict["query"], gt_response, ft_rag_response)
        ft_eval = eval_match_gt(eval_dict["query"], gt_response, ft_response)
        rag_eval = eval_match_gt(eval_dict["query"], gt_response, rag_response)
        base_eval = eval_match_gt(eval_dict["query"], gt_response, base_response)

        response_dict = {
            "query": eval_dict["query"],
            "gt_response": gt_response,
            "ft_rag_response": ft_rag_response,
            "ft_response": ft_response,
            "rag_response": rag_response,
            "base_response": base_response,
            "ft_rag_eval": ft_rag_eval,
            "ft_eval": ft_eval,
            "rag_eval": rag_eval,
            "base_eval": base_eval,
        }

        raw_responses.append(response_dict)

    raw_responses_df = pd.DataFrame(raw_responses)

    eval_dict = {
        "ft_rag_score": raw_responses_df["ft_rag_eval"].mean(),
        "ft_score": raw_responses_df["ft_eval"].mean(),
        "rag_score": raw_responses_df["rag_eval"].mean(),
        "base_score": raw_responses_df["base_eval"].mean(),
    }

    sub_responses_df = raw_responses_df[
        [
            "query",
            "gt_response",
            "ft_rag_response",
            "ft_response",
            "rag_response",
            "base_response",
        ]
    ]

    return eval_dict, raw_responses_df, sub_responses_df

In [94]:
eval_dict, raw_response_df, sub_responses_df = run_evals(eval_dicts)

  0%|          | 0/100 [00:00<?, ?it/s]

{'query': 'What is the policy at Made Tech regarding false allegations of discrimination?', 'response': 'Making a false allegation deliberately and in bad faith at Made Tech will be treated as misconduct.'}
{'query': 'What types of working hours patterns are supported by Made Tech?', 'response': 'Made Tech supports reduced working hours patterns, but does not support compressed hours or other patterns such as annualised hours.'}
{'query': 'What is the other path to a salary adjustment at Made Tech apart from the annual review?', 'response': 'The other path to a salary adjustment at Made Tech apart from the annual review is through the Internal mobility process, which includes promotions or lateral moves within the financial year.'}
{'query': 'What happens if the 28th of the month falls on a weekend or during a bank holiday at Made Tech?', 'response': 'If the 28th of the month falls on a weekend or during a bank holiday at Made Tech, the payday is moved to the Friday before.'}
{'query':

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..


{'query': "How should chalet members record their time spent on 'New business and revenue' activities?", 'response': "Chalet members should record their time spent on 'New business and revenue' activities as 'Bench/Chalet' in their timesheets."}
{'query': 'What are the expectations for Made Tech team members while working remotely in terms of team ceremonies and Slack communication?', 'response': 'While working remotely, Made Tech team members are expected to ensure that they are available for team ceremonies as normal and are able to dial in without interruption. They are also expected to be a good Slack citizen and check in regularly and visibly with their team throughout the working day in public channels. Additionally, they should ensure that they are not expected for any in-person meetings or sessions while working remotely.'}
{'query': 'What are the benefits for Made Tech employees to link their Microsoft Certification Account to the Made Tech Azure Partner Portal?', 'response': 

Failed to patch https://api.langchain.plus/runs/307efa4c-0157-4851-9dae-7c7c8486bb85 in LangSmith API. 400 Client Error: Bad Request for url: https://api.langchain.plus/runs/307efa4c-0157-4851-9dae-7c7c8486bb85
{"detail":"Cannot update a run that has already finished"}


{'query': 'What are the monthly meet-ups organized by the People team at Made Tech?', 'response': 'The monthly meet-ups organized by the People team at Made Tech are social activities that take place outside of the work environment and office spaces. These meet-ups are a great way for the team to spend some social time together and meet new people. They are organized around the current office locations of Made Tech, which include London, Manchester, Bristol, and Swansea.'}
{'query': 'What is the recommended procedure before applying for the Cycle to Work scheme at Made Tech?', 'response': 'It is recommended to find your perfect bike and accessories before applying for the Cycle to Work scheme at Made Tech. This way, you will know exactly what amount to apply for and avoid applying for too much or too little. It is important to get this right because once you have been approved, you cannot amend your application.'}
{'query': 'How can Made Tech employees adjust their pension contribution

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..


{'query': "What activities are encouraged during 'Chalet Time' at Made Tech?", 'response': "During 'Chalet Time' at Made Tech, activities that are encouraged include building skills, contributing to Communities of Practice, and growing Made Tech's business. It is a time to focus on other activities that are not directly related to client work."}
{'query': 'What are the approval processes in place for expense claims at Made Tech?', 'response': 'Claims that align with the policy will be approved, but any claims that are outside of the guidelines will need approval from the Head of Department. Expenses that are aligned to a specific delivery or account should be agreed with the Delivery Manager and/or account team in advance. Expenses for events and conferences need to be approved in advance by the relevant Head of Department.'}
{'query': 'What is the preferred format for internal office documents at Made Tech?', 'response': 'The preferred format for internal office documents at Made Tech

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..


{'query': 'How can chalet members contribute to new revenue and business as per the Made Tech handbook?', 'response': 'Chalet members can contribute to new revenue and business by participating in various activities outlined in the Made Tech handbook. The Bids team will provide details of bids that chalet members can contribute to, allowing them to contribute to the revenue generation process. Additionally, the Marketing team will share tasks that chalet members can do to promote Made Tech, such as creating blog posts or contacting relevant teams for input. These tasks will be posted in the Chalet Community Board, and chalet members can engage in them to contribute to the growth of the business.'}
{'query': 'What platforms should every member of Made Tech have access to, with two-factor authentication?', 'response': 'Every member of Made Tech should have access to Google Mail, Github, Slack, 1Password, Trello (unless using Google Auth to login to Trello), and HiBob, all with two-factor

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..


{'query': 'What is the process to submit expense claims at Made Tech?', 'response': "Expense claims at Made Tech need to be submitted via Xero. If an employee does not have a login for Xero, they can drop a message in #ops-finance or email finance@madetech.com to request one. All expense claims must be made within 90 days and will typically be paid within 2 weeks. There are different approval processes in place depending on the type of expense. Normal expenses that align with the policy will be approved, but any claims that are outside of the guidelines will require approval from the employee's Head of Department. Expenses that are aligned to a specific delivery or account should be agreed upon with the Delivery Manager and/or account team in advance. Expenses for events and conferences need to be approved in advance by the relevant Head of Department. For any other information, employees should refer to the policy document and direct any questions to #ops-finance or finance@madetech.c

In [95]:
display(eval_dict)
display(sub_responses_df)

{'ft_rag_score': 0.84, 'ft_score': 0.59, 'rag_score': 0.75, 'base_score': 0.46}

Unnamed: 0,query,gt_response,ft_rag_response,ft_response,rag_response,base_response
0,What is the policy at Made Tech regarding false allegations of discrimination?,Making a false allegation deliberately and in bad faith at Made Tech will be treated as misconduct.,"Making a false allegation deliberately and in bad faith will be treated as misconduct according to Made Tech's equality, diversity, and inclusion policy.",Made Tech takes false allegations of discrimination seriously and they may result in disciplinary action.,Making a false allegation deliberately and in bad faith will be treated as misconduct according to Made Tech's policy.,"Made Tech takes false allegations of discrimination very seriously. Any employee found to have made a false allegation of discrimination may be subject to disciplinary action, up to and including termination of employment."
1,What types of working hours patterns are supported by Made Tech?,"Made Tech supports reduced working hours patterns, but does not support compressed hours or other patterns such as annualised hours.","Made Tech supports a variety of working hours patterns, including but not limited to:\n- 9-day fortnight\n- Part-time\n- Job share\n- Flexible hours\n- Remote working\n\nThey also offer a flexible benefits package that allows individuals to choose the benefits that suit their lifestyle, including the option to buy and sell up to 5 days of annual leave.","Made Tech supports a variety of working hours patterns, including full-time, part-time, and job share arrangements. They also offer flexible working hours, allowing employees to start and finish work at different times.",The provided context does not mention specific working hour patterns supported by Made Tech. It is best to reach out to Made Tech directly for information on their supported working hour patterns.,"Made Tech supports flexible working hours patterns, including full-time, part-time, and job-sharing arrangements."
2,What is the other path to a salary adjustment at Made Tech apart from the annual review?,"The other path to a salary adjustment at Made Tech apart from the annual review is through the Internal mobility process, which includes promotions or lateral moves within the financial year.","The other path to a salary adjustment at Made Tech apart from the annual review is through the Internal mobility process, which includes promotions or lateral moves within the financial year.",The other path to a salary adjustment at Made Tech apart from the annual review is through the promotion process.,"The other path to a salary adjustment at Made Tech is through the Internal mobility process, which includes promotions or lateral moves within the financial year.","Employees can request a salary adjustment outside of the annual review process by submitting a formal request to their line manager. This request should include a clear justification for the adjustment, such as significant changes in job responsibilities or market conditions. The line manager will review the request and make a decision based on the company's salary adjustment policy."
3,What happens if the 28th of the month falls on a weekend or during a bank holiday at Made Tech?,"If the 28th of the month falls on a weekend or during a bank holiday at Made Tech, the payday is moved to the Friday before.","If the 28th of the month falls on a weekend or during a bank holiday at Made Tech, the payday will be moved to the last working day before the 28th.","If the 28th of the month falls on a weekend or during a bank holiday at Made Tech, the company will pay employees on the last working day before the 28th.","If the 28th of the month falls on a weekend or during a bank holiday at Made Tech, the holiday requests will be reviewed and confirmed on the next working day after the weekend or bank holiday.","If the 28th of the month falls on a weekend or during a bank holiday at Made Tech, the salary payment will be made on the last working day before the 28th."
4,What happens to the old laptops once they have been returned at Made Tech?,"The old laptops at Made Tech are either traded in or recycled, depending on their condition. If a laptop is not eligible for trade-in, such as if it has an iCloud lock, is too damaged, not working, or has no value, it undergoes a full disk wipe and is recycled by Green Machine Computers. On the other hand, if a laptop has trade-in value, it undergoes a full disk wipe, and preparations are made to maximize its trade-in valuation, such as removing stickers and cleaning the device. The laptop is then valued through Stormfront's trade-in service and sent to a trade-in partner for analysis. Any payment received from the trade-in is credited to Made Tech's next invoice.","Once the old laptops have been returned at Made Tech, they are either traded in or recycled, depending on their condition and trade-in value. If a laptop is not eligible for trade-in (e.g. has icloud lock, is too damaged, not working, no value), then Systemagic performs a full disk wipe and the laptop is recycled by Green Machine Computers. If a laptop has a trade-in value, then Systemagic performs a full disk wipe, prepares it to be in the best condition to maximize trade-in valuation (e.g. stickers removed, residue scrubbed off, device cleaned), and sends it to a trade-in partner for analysis. Any payment received from the trade-in is credited to MadeTech's next invoice.",The old laptops at Made Tech are sent to a third-party company called Tier 1. Tier 1 then refurbishes the laptops and sells them to new owners.,"Once the old laptops have been returned at Made Tech, they are either traded in or recycled, depending on their condition and trade-in value. If a laptop is not eligible for trade-in (e.g., has icloud lock, is too damaged, not working, no value), it undergoes a full disk wipe and is recycled by Green Machine Computers. If a laptop has a trade-in value, it also undergoes a full disk wipe and is prepared to be in the best condition for trade-in valuation. Stickers are removed, residue is scrubbed off, and the device is cleaned. The laptop is then valued through Stormfront's trade-in service and sent to a trade-in partner for analysis. Any payment received from the trade-in is credited to Made Tech's next invoice.","Once old laptops are returned at Made Tech, they are securely wiped of all data and then either recycled or donated to a charitable organization."
...,...,...,...,...,...,...
95,How should chalet members record their time spent on 'Learning time' activities?,Chalet members should record their time spent on 'Learning time' activities as 'Bench/Chalet' in their timesheets.,Chalet members should record their time spent on 'Learning time' activities as 'Bench/Chalet' in their timesheets.,Chalet members should record their time spent on 'Learning time' activities as 'Chalet' in their timesheets.,Chalet members should record their time spent on 'Learning time' activities as 'Learning Time' in their timesheets.,Chalet members should record their time spent on 'Learning time' activities by logging it in the designated time tracking system or by submitting a timesheet to their supervisor.
96,What are the expectations from non-engineers during the on-boarding process at Made Tech?,Non-engineers at Made Tech are expected to complete the Github tutorial during the on-boarding process.,The information provided does not specify the expectations from non-engineers during the onboarding process at Made Tech.,"Non-engineers at Made Tech are expected to complete the ""Introduction to Agile"" course on the Made Tech Academy platform during the on-boarding process.","The provided context does not explicitly mention the expectations from non-engineers during the onboarding process at Made Tech. Therefore, it is unclear what specific expectations non-engineers would have during onboarding. It would be best to reach out to Made Tech directly for more information on the onboarding process and expectations for non-engineers.","During the on-boarding process at Made Tech, non-engineers are expected to familiarize themselves with the company's values, culture, and processes. They should actively participate in team meetings, ask questions, and seek guidance from their colleagues. Non-engineers should also complete any required training and comply with company policies and procedures."
97,What is the process to submit expense claims at Made Tech?,"Expense claims at Made Tech need to be submitted via Xero. If an employee does not have a login for Xero, they can drop a message in #ops-finance or email finance@madetech.com to request one. All expense claims must be made within 90 days and will typically be paid within 2 weeks. There are different approval processes in place depending on the type of expense. Normal expenses that align with the policy will be approved, but any claims that are outside of the guidelines will require approval from the employee's Head of Department. Expenses that are aligned to a specific delivery or account should be agreed upon with the Delivery Manager and/or account team in advance. Expenses for events and conferences need to be approved in advance by the relevant Head of Department. For any other information, employees should refer to the policy document and direct any questions to #ops-finance or finance@madetech.com.","Expense claims at Made Tech need to be submitted via Xero. If you don't have a login for Xero, you can drop a message in #ops-finance or email finance@madetech.com to request one. \n\nAll claims must be made within 90 days and will typically be paid within 2 weeks. \n\nThere are different approval processes depending on the type of expense:\n- Normal expenses that align with the policy will be approved. However, any claims that are outside of the guidelines will need approval from your Head of Department.\n- Expenses that are aligned to a specific delivery or account (e.g. client entertainment, travel to meetings) should be agreed with the Delivery Manager and/or account team in advance.\n- Expenses for events and conferences need to be approved in advance by the relevant Head of Department.\n\nFor more information, you can read the policy document provided and direct any questions to #ops-finance or finance@madetech.com.","To submit expense claims at Made Tech, you need to use the Xero Expenses app. You can download this app on your phone or access it through the Xero website. When submitting an expense claim, you should select the appropriate category for your expense. If you are unsure about which category to choose, you can refer to the Expense Category Guide. It is important to note that all expenses should be submitted within 30 days of the expense being incurred. If you have any questions or need assistance with your expense claim, you can reach out to the Finance team.","Expense claims at Made Tech need to be submitted via Xero. If you don't have a login for Xero, you can drop a message in #ops-finance or email finance@madetech.com to request access. \n\nClaims must be made within 90 days and will typically be paid within 2 weeks. \n\nThere are different approval processes depending on the type of expense:\n- Normal expenses: Claims that align with the policy will be approved. Claims outside of the guidelines will need approval from your Head of Department.\n- Account expenses: Expenses aligned to a specific delivery or account should be agreed with the Delivery Manager and/or account team in advance.\n- Events and conferences: Expenses need to be approved in advance by the relevant Head of Department.\n\nFor more information, please refer to the policy document and direct any questions to #ops-finance or finance@madetech.com.","To submit expense claims at Made Tech, employees should follow these steps:\n\n1. Fill out the expense claim form accurately and completely.\n2. Attach all relevant receipts and supporting documentation.\n3. Submit the completed form and receipts to the Finance department.\n4. Allow for processing time, and reimbursement will be made via the chosen payment method."
98,How can employees at Made Tech expense Railcards for business travel?,"To expense Railcards for business travel at Made Tech, employees need to follow these steps:\n\n1. Purchase a railcard.\n2. Add the railcard to your TravelPerk account.\n3. Take a screenshot to show that the railcard has been added.\n4. Share the image and receipt on Xero when submitting the expense claim.\n\nIt's important to note that Made Tech will reimburse the price of any yearly railcard purchased from April 26th, 2023 onwards. If an employee leaves Made Tech, they will not be charged for the railcard and do not need to return it. However, railcards cannot be expensed during the notice period.","To expense Railcards for business travel at Made Tech, employees need to purchase a railcard and add it to their TravelPerk account. The price of any yearly railcard purchased from April 26th, 2023 onwards will be reimbursed by Made Tech. Once the railcard has been added, employees should take a screenshot to show that it has been added to their TravelPerk account. The image and receipt of the railcard purchase can then be shared on Xero when expensing it back. It's important to note that railcards cannot be expensed during the notice period and if an employee leaves Made Tech, they will not be charged for the railcard and do not need to return it.",Employees at Made Tech can expense Railcards for business travel by purchasing the Railcard themselves and then claiming the cost back through the expenses system. The Railcard should be purchased as a personal expense and not through the company.,"Employees at Made Tech can expense Railcards for business travel by following these steps:\n\n1. Purchase a yearly railcard from April 26th, 2023 onwards.\n2. Add the railcard to your TravelPerk account.\n3. Take a screenshot to show that the railcard has been added.\n4. Share the image of the railcard and the receipt on Xero when expensing it back.\n\nPlease note that railcards cannot be expensed during your notice period. If you leave Made Tech, you will not be charged for the railcard and you do not need to return it.",Employees at Made Tech can expense Railcards for business travel by submitting the original receipt along with the expense claim form to the Finance department.


# Evaluate LLM models in LangSmith

## Create LangSmith LLM Dataset

In [96]:

from langsmith import Client

client = Client()
dataset = client.create_dataset(dataset_name="MadeTech Dataset", description="Made Tech Handbook dataset with LLM inputs and outputs", data_type="llm")

for eval_dict in eval_dicts:
    client.create_example(
        inputs={"input": eval_dict["query"]},
        outputs={"output": eval_dict["response"]},
        dataset_id=dataset.id,
        # Or dataset_name="My LLM Dataset"
    )

## Evaluate BaseLine LLM in LangSmith

In [None]:
import langsmith
from langchain.chat_models import ChatOpenAI
from langchain import smith

# Replace with the LLM you want to test
my_llm = ChatOpenAI(model=baseline_llm_model,temperature=0)

# Define the evaluators to apply
eval_config = smith.RunEvalConfig(
    evaluators=[
        "cot_qa"
    ],
    custom_evaluators=[],
    eval_llm=ChatOpenAI(model="gpt-4", temperature=0)
)

client = langsmith.Client()
chain_results = client.run_on_dataset(
    dataset_name="MadeTech Dataset",
    llm_or_chain_factory=my_llm,
    evaluation=eval_config,
    project_name="test-baseline-llm-22",
    verbose=True,
    tags=["baseline", "llm"],
)

## Evaluate Fine Tuned LLM in LangSmith

In [109]:
import langsmith
from langchain.chat_models import ChatOpenAI
from langchain import smith

# Replace with the LLM you want to test
my_llm = ChatOpenAI(model=ft_llm_model, temperature=0)

# Define the evaluators to apply
eval_config = smith.RunEvalConfig(
    evaluators=[
        "cot_qa"
    ],
    custom_evaluators=[],
    eval_llm=ChatOpenAI(model="gpt-4", temperature=0)
)

client = langsmith.Client()
chain_results = client.run_on_dataset(
    dataset_name="MadeTech Dataset",
    llm_or_chain_factory=my_llm,
    evaluation=eval_config,
    project_name="test-fine-tuned-llm-23",
    verbose=True,
    tags=["ft", "llm"],
)

View the evaluation results for project 'test-fine-tuned-llm-23' at:
https://smith.langchain.com/o/f09d95d4-78e8-5dcb-a136-09de2b589be4/projects/p/57515a66-a598-465c-9871-6e9bdd9d3790
[--------------------------------------->          ] 81/100

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for 10KTPM-200RPM in organization org-f5H7bPv9fgPptJoKt2f4cPHG on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues..


[------------------------------------------->      ] 89/100

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for 10KTPM-200RPM in organization org-f5H7bPv9fgPptJoKt2f4cPHG on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues..


[--------------------------------------------->    ] 93/100

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for 10KTPM-200RPM in organization org-f5H7bPv9fgPptJoKt2f4cPHG on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues..


[------------------------------------------------->] 100/100
 Eval quantiles:
                         0.25  0.50  0.75
COT Contextual Accuracy   0.0   0.0   1.0


# RAG Evaluation

## Create LangSmith KV Dataset

In [110]:
from langsmith import Client

client = Client()
dataset = client.create_dataset(dataset_name="MadeTech Dataset KV", description="Made Tech Handbook dataset with LLM inputs and outputs", data_type="kv")

for eval_dict in eval_dicts:
    client.create_example(
        inputs={"input": eval_dict["query"]},
        outputs={"output": eval_dict["response"]},
        dataset_id=dataset.id,
        # Or dataset_name="My LLM Dataset"
    )

## Evaluate BaseLine RAG in LangSmith

In [116]:
import langsmith
from langchain.chat_models import ChatOpenAI
from langchain import smith

retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={'fetch_k': 5, 'k': 7})
llm = ChatOpenAI(model_name=baseline_llm_model, temperature=0)

def chain_constructor():
    return RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
)

# Define the evaluators to apply
eval_config = smith.RunEvalConfig(
    evaluators=[
        "cot_qa"
    ],
    custom_evaluators=[],
    eval_llm=ChatOpenAI(model="gpt-4", temperature=0)
)

client = langsmith.Client()
chain_results = client.run_on_dataset(
    dataset_name="MadeTech Dataset KV",
    llm_or_chain_factory=chain_constructor,
    evaluation=eval_config,
    project_name="test-rag-baseline-27",
    verbose=True,
    tags=["rag", "baseline"],
)


View the evaluation results for project 'test-rag-baseline-27' at:
https://smith.langchain.com/o/f09d95d4-78e8-5dcb-a136-09de2b589be4/projects/p/8af439de-859f-47df-8ec3-c092afa78729
[------------------------------------------------> ] 98/100

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).


[------------------------------------------------->] 99/100

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).


[------------------------------------------------->] 100/100
 Eval quantiles:
                         0.25  0.50  0.75
COT Contextual Accuracy   0.0   1.0   1.0


## Evaluate Fine Tune RAG in LangSmith

In [121]:
import langsmith
from langchain.chat_models import ChatOpenAI
from langchain import smith

retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={'fetch_k': 5, 'k': 7})
llm = ChatOpenAI(model_name=ft_llm_model, temperature=0)

def chain_constructor():
    return RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
)

# Define the evaluators to apply
eval_config = smith.RunEvalConfig(
    evaluators=[
        "cot_qa"
    ],
    custom_evaluators=[],
    eval_llm=ChatOpenAI(model="gpt-4", temperature=0)
)

client = langsmith.Client()
chain_results = client.run_on_dataset(
    dataset_name="MadeTech Dataset KV",
    llm_or_chain_factory=chain_constructor,
    evaluation=eval_config,
    project_name="test-rag-ft-28",
    verbose=True,
    tags=["ft", "rag"],
)

View the evaluation results for project 'test-rag-ft-28' at:
https://smith.langchain.com/o/f09d95d4-78e8-5dcb-a136-09de2b589be4/projects/p/c9be9a2b-0c22-4e97-9512-e605d9b6a17d
[----------------------->                          ] 48/100

Failed to post https://api.langchain.plus/runs in LangSmith API. 409 Client Error: Conflict for url: https://api.langchain.plus/runs
{"detail":"Run with this id already exists"}
Error in LangChainTracer.on_chat_model_start callback: Failed to post https://api.langchain.plus/runs in LangSmith API. 409 Client Error: Conflict for url: https://api.langchain.plus/runs
{"detail":"Run with this id already exists"}
Error in LangChainTracer.on_chat_model_start callback: Failed to post https://api.langchain.plus/runs in LangSmith API. 409 Client Error: Conflict for url: https://api.langchain.plus/runs
{"detail":"Run with this id already exists"}
Error in LangChainTracer.on_chain_end callback: Failed to patch https://api.langchain.plus/runs/11013266-88cc-4d08-a861-80dfa6beb8cf in LangSmith API. 400 Client Error: Bad Request for url: https://api.langchain.plus/runs/11013266-88cc-4d08-a861-80dfa6beb8cf
{"detail":"Cannot update a run that has already finished"}


[----------------------->                          ] 49/100

Error in LangChainTracer.on_chain_start callback: Failed to post https://api.langchain.plus/runs in LangSmith API. 409 Client Error: Conflict for url: https://api.langchain.plus/runs
{"detail":"Run with this id already exists"}


[------------------------------------------->      ] 87/100

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).


[------------------------------------------------->] 100/100
 Eval quantiles:
                         0.25  0.50  0.75
COT Contextual Accuracy   0.0   1.0   1.0
