# Chatbot Project Documentation

## Introduction
The goal of this Jupyter Notebook is to enable new team members to easily prototype a new chain using LangChain. This chain aims to improve the performance of an existing chatbot or test its current performance. The notebook will serve as both documentation and a prototyping tool, ensuring a comprehensive understanding of the codebase and facilitating smooth onboarding for new developers.
This chatbot is designed to interact with the Central Bank of Bahrain (CBB) rulebook, providing users with accurate and relevant information based on their queries. By leveraging LangChain, we can seamlessly integrate multiple components to create a robust chatbot solution.


# Table of Contents

1. [Project Setup](#Project-Setup)
    - [Install dependencies](#Install-dependencies)
2. [Initializing a configuration file](#Initializing-a-configuration-file)
3. [Understanding the Codebase](#Understanding-the-Codebase)
4. [Load environment configuration](#Load-environment-configuration) 
5. [Running Streamlit App](#Running-Streamlit-App)
6. [Chain Generation and Execution](#Chain-Generation-and-Execution)
7. [Prototyping and Testing](#Prototyping-and-Testing)
8. [Simple Test Evaluation](#Simple-Test-Evaluation)


# Project Setup

## Prerequisites

1. Python Installation
2. Jupyter Installation
3. Anaconda Installation (Optional)

To run this project, you need to install the required dependencies. Use the following commands to set up your environment:

1. **Create a virtual environment** (optional but recommended):
```bash
python -m venv env
```

2. **Install dependencies**:


## Install dependencies
Run the following command to install all required packages:


In [None]:
!pip install -r ../requirements.txt

# Initializing a configuration file

#### This step is not necessary if you have an existing configuration file

Files:
- config_{experiment-name}.json

Functionality:

    - Creating the configuration file that will be used for generating the chain or performing tests.

**The arguments for chain_config must match the parameters passed to the specified solution class.**

In the following code, choose configuration variables for your experiment:


In [1]:
import json

# Define your configuration dictionary

config = {
    "exe_simple_test": False,
    "exe_automated_test": True,
    "exe_streamlit_app": False,
    "chain_config":{
        "solution_class":"GPTPineconeSolution",
        "args":{
            "pinecone_index_name":"rulebook-small-v0",
            "embed_model":"text-embedding-3-small",
            "gen_model":"gpt-4o",
            "search_type": "mmr",
            "search_k": 8,
            "search_lambda": 0.25,
            "pinecone_key": "4f0dee68-2b9e-41a8-b38f-2b95f685d954"
        }
    },
    "automated_test_config":{
        "evaluators":["de_contextual_recall","de_faithfulness","de_noise_awarness","de_correctness"],
        "dataset_name":"cbb-test-dataset-v2",
        "per_q_repeat":1,
        "split_data":True,
        "splits": ["specific"],
        "evaluator_model":"gpt-4o",
        "experiment_name": "config_{experiment-name}" # name your experiment
    }
}

# Define your filename
json_file_path = f"config/{config['automated_test_config']['experiment_name']}.json"

# Write the configuration to a JSON file
with open(json_file_path, 'w') as f:
    json.dump(config, f, indent=4)

print(f"Configuration saved to {json_file_path}")

Configuration saved to config/config_{experiment-name}.json


# Understanding the Codebase

To comprehend the codebase, it's crucial to understand the key components and their roles. This section provides an overview of these components:

- **Chain Configuration**: This is where we define the structure and parameters of the LangChain used for the chatbot.
- **Solution Classes**: Classes for solution implementations  to generate a new LangChain chain based on the json configurations. New solution classes can be made to implement other vectorDBs, gen models, APIs, etc. .
- **Testing and Evaluation**: Provides a structured way to define, prototype, and test new solution chains.

Each of these components is configurable via the JSON file created in the previous step. Let's dive into how these configurations are loaded and utilized in the project.


# Load environment configuration

This section loads the environment configuration from the specified JSON file and to configure logging

Files:
- .env file
- main.py
- config_{experiment-name}.json

For this section you will need to create a .env file and store the following keys within it : <br>
_OPENAI_API_KEY_, _LANGCHAIN_API_KEY_, _PINECONE_API_KEY_, _INDEX_NAME_.

**This code cell wil configure logging if needed:**

In [2]:
# configure logging
import logging

def configure_logging():
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s [%(levelname)s] {%(module)s::%(funcName)s} %(message)s",
        handlers=[
            logging.StreamHandler()
        ]
    )
    logger = logging.getLogger("Prototype")
    return logger

# Logger : 
logger = configure_logging()


**In the following code, we will load the configuration file:**

In [3]:
import json, os
import dotenv
from common.utils import store_env_var, load_config
from common.constants import CONFIG_VAR_NAME

#environmental variable for path configuration file
store_env_var(name='config_path', value=f'{json_file_path}')
#runtime environmental variable, either 'evaluation' or 'streamlit_app'
store_env_var(name="runtime", value='evaluation') 

# Load environment variables
dotenv.load_dotenv()

config = load_config()


# Running Streamlit App

To run the Streamlit app for interactive testing and prototyping, ensure you have Streamlit installed in your environment.

And to run the streamlit ensure that "exe_streamlit_app" is set to true and run the following command while specifying the path for your config file.

This command will launch the app in your default web browser, allowing you to interact with the chatbot and test its responses in real-time.


In [None]:
!python main.py -c config/config_{experiment-name}.json

# Chain Generation and Execution

This section outlines the process of generating the LangChain using the loaded configuration. The chain is then executed to produce responses based on user queries.

Let's walk through the code to generate and execute the chain:


In [4]:
from common.chain_generator import generate_chain

# generate chain from config file
solution = generate_chain(config["chain_config"])
    
# Prompt for the chatbot
test_input = "What is the proper definition of 'financial instrument' according to the cbb notebook?"

# Invoke the chain to print the chatbot's message
result = solution.invoke(test_input)
print("Result:")
print(f"Response: {result.response}")
print(f"Context: {result.context}")

  warn_deprecated(
  warn_deprecated(
2024-07-23 12:05:03,298 [INFO] HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-23 12:05:14,576 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Result:
Response: The Central Bank of Bahrain (CBB) defines a "financial instrument" as follows:

"A financial instrument is any contract that gives rise to both a financial asset of one entity and a financial liability or equity instrument of another entity. Financial instruments include both primary financial instruments (or cash instruments) and derivative financial instruments. A financial asset is any asset that is cash, the right to receive cash or another financial asset; or the contractual right to exchange financial assets on potentially favourable terms, or an equity instrument. A financial liability is the contractual obligation to deliver cash or another financial asset or to exchange financial liabilities under conditions that are potentially unfavourable." (Source: CA-8.1.3, Apr 08)

Additionally, specific types of financial instruments are listed as follows:

"(a) Transferable securities;
(b) Islamic financial instruments;
(c) Money market instruments;
(d) Holdings in co

# Prototyping and Testing

Prototyping and testing are essential steps to ensure the chatbot performs as expected. This section provides a framework for running tests and evaluating the chatbot's responses.

To check for required dictionary keys for chain_config you can run the following code:

In [5]:
import sys

def check_required_test_configs(config, logger):
    # Keys needed for the automated testing
    required_keys = [ "evaluators", "dataset_name", "per_q_repeat", "split_data", "splits", "evaluator_model", "experiment_name"]
    
    # checking for any missing keys
    missing_keys = [key for key in required_keys if key not in config]
    
    # log missing keys
    if missing_keys:
        logger.error(f"Some automated test config are missing: {missing_keys}. Aborting...")
        sys.exit(1)
        
# ensuring all keys are in the config file
check_required_test_configs(config['automated_test_config'], logger)


**The code to execute an automated test:**

In [None]:
from common.chain_generator import generate_chain
from evaluation.evaluators import setup_evaluators
from evaluation.test import run_test
from common.constants import LANGSMITH_SPLIT_TIME_BUFF

# generate chain from config file
solution = generate_chain(config["chain_config"])

# load automated test configuration
config = config["automated_test_config"]

# build all evaluators
available_evaluators = setup_evaluators(judge_model= config["evaluator_model"])
# map evaluators to string-names
available_evaluators = {key.name: value for key,value in available_evaluators.items()}
# filter for the evaluators specified in config 
evaluators = []
for evaluator_name, evaluator_func in available_evaluators.items():
    if evaluator_name in config["evaluators"]:
        evaluators.append(evaluator_func)

# execute automated test
run_test(solution= solution, 
        dataset_name= config["dataset_name"], 
        split_data= config["split_data"], 
        splits= config["splits"],
        per_q_repeat= config["per_q_repeat"], 
        split_time_buff=LANGSMITH_SPLIT_TIME_BUFF,
        evaluators= evaluators,
        experiment_name= config["experiment_name"]
    )


**Another option to execute an automated test (simply utilize a config dict):**

In [None]:

# execute automated test
run_test(solution= solution, 
        automated_test_config=config["automated_test_config"]
    )

# Simple Test Evaluation

Evaluating the chatbot's performance involves analyzing the responses to various test queries. This section demonstrates how to run multiple tests and collect results for evaluation.

Let's set up a simple evaluation framework:


In [6]:
from common.chain_generator import generate_chain
# generate chain from config file
solution = generate_chain(config["chain_config"])

# ask chain a simple question
print("Invoking: ")
result = solution.invoke("What is the proper definition of 'financial instrument' according to the cbb notebook?")
print(result.response)

# using the 'test' function
inputs = {'question': 'What is the proper definition of \'financial instrument\' according to the cbb notebook?'}
print("Testing: ")
result_test = solution.test(inputs)
print(result_test)
     

Invoking: 


2024-07-23 12:05:36,159 [INFO] HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-23 12:05:51,079 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The Central Bank of Bahrain (CBB) defines a financial instrument as follows:

1. **Volume 4 Definition:**
   "For the purposes of Volume 4, a financial instrument means any of the following:
   - Transferable securities;
   - Islamic financial instruments;
   - Money market instruments;
   - Holdings in collective investment undertakings;
   - Derivative contracts other than commodity derivatives;
   - Derivative contracts relating to commodities settled in cash;
   - Derivative contracts relating to commodities;
   - Credit derivatives;
   - Financial contracts for differences;
   - Other derivative contracts;
   - Interests in real estate property;
   - Certificates representing certain securities; and
   - Rights or Interests in Financial Instruments." (Source: [AU-1.5 Definition of Financial Instruments](https://cbben.thomsonreuters.com/rulebook/au-15-definition-financial-instruments))

2. **General Definition:**
   "A financial instrument is any contract that gives rise to both a 

2024-07-23 12:05:51,610 [INFO] HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-23 12:05:57,825 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


{'answer': 'The definition of a "financial instrument" according to the Central Bank of Bahrain (CBB) rulebook is as follows:\n\n"For the purposes of Volume 4, a financial instrument means any of the following:\n\n(a) Transferable securities;\n(b) Islamic financial instruments;\n(c) Money market instruments;\n(d) Holdings in collective investment undertakings;\n(e) Derivative contracts other than commodity derivatives;\n(f) Derivative contracts relating to commodities settled in cash;\n(g) Derivative contracts relating to commodities;\n(h) Credit derivatives;\n(i) Financial contracts for differences;\n(j) Other derivative contracts;\n(k) Interests in real estate property;\n(l) Certificates representing certain securities; and\n(m) Rights or Interests in Financial Instruments."\n\n(Referenced from: AU-1.5 Definition of Financial Instruments, [link](https://cbben.thomsonreuters.com/rulebook/au-15-definition-financial-instruments))\n\nAdditionally, another definition provided in the ruleb