# Development: ML Flow

File for developing the mlflow code for the project.

In [1]:
# IMPORTS --------------------------------------------------------------------------------------------------------------

# Use the below lines if any dependencies are missing.
# ! python -m pip install uv
# ! python -m uv pip install langchain_openai mlflow load_dotenv langchain pandas langchain_community

import os
import sys

sys.path.append(os.path.abspath('\\'.join(os.getcwd().split('\\')[:-1])))

import pandas as pd
from dotenv import load_dotenv
from ml_flow import (mlflow_server, create_example_llm, evaluate_llm, create_agent, evaluate_agent, get_info_on_runs,
                     delete_all_runs)

_ = load_dotenv()

import logging
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

To begin with, we run the ML-Flow server:

In [2]:
server_process = mlflow_server()

## LLM and Standard ML FLow

Let's generate an example LLM:

In [7]:
example_model = create_example_llm()

We'll then read in an evaluation set:

In [8]:
data_folder_path = '\\'.join(os.getcwd().split('\\')[:-1]) +'/data/'
file_name = 'Evaluation Dataset - LLM.csv'

file_path = data_folder_path + file_name
eval_set = pd.read_csv(file_path)

eval_set = eval_set.rename(columns={'question': 'inputs', 'target': 'targets'})

display(eval_set)

Unnamed: 0,inputs,context,targets
0,How much money does client 1 have in shares?,Client 1 has 20 shares. 70% of their shares ar...,"Client 1 has £14,000 worth of NVDA shares (70%..."
1,How much money does client 2 have in shares?,Client 2 has 10 shares. 30% of their shares ar...,"Client 2 has £3,000 worth of NVDA shares (30% ..."


Let's demonstrate that the model works:

In [9]:
question = eval_set['inputs'][0]
context = eval_set['context'][0]

print(f"Question: {question}")
print('')
print('Answer: ' + example_model.invoke({'inputs': question, 'context': context}))

Question: How much money does client 1 have in shares?



KeyboardInterrupt: 

We then connect to ML-Flow:

In [None]:
results = evaluate_llm(example_model, eval_set, "openai:/gpt-3.5-turbo", "mlflow_llm_development")

2024/05/31 10:12:36 INFO mlflow.tracking.fluent: Experiment with name 'mlflow_llm_development' does not exist. Creating a new experiment.


2024/05/31 10:12:53 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2024/05/31 10:12:55 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
  from .autonotebook import tqdm as notebook_tqdm
100%|██████████| 1/1 [00:00<00:00,  1.11it/s]
100%|██████████| 1/1 [00:01<00:00,  1.15s/it]
100%|██████████| 1/1 [00:01<00:00,  1.38s/it]
100%|██████████| 1/1 [00:01<00:00,  1.29s/it]
100%|██████████| 2/2 [00:01<00:00,  1.94it/s]
100%|██████████| 2/2 [00:01<00:00,  1.33it/s]
100%|██████████| 2/2 [00:01<00:00,  1.27it/s]
100%|██████████| 2/2 [00:01<00:00,  1.41it/s]


And we can then take a look at the results:

In [None]:
output_df = pd.DataFrame(results.tables['eval_results_table'])
display(output_df)

Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 324.36it/s]


Unnamed: 0,inputs,context,targets,outputs,token_count,flesch_kincaid_grade_level/v1/score,ari_grade_level/v1/score,faithfulness/v1/score,faithfulness/v1/justification,answer_similarity/v1/score,answer_similarity/v1/justification,answer_correctness/v1/score,answer_correctness/v1/justification,answer_relevance/v1/score,answer_relevance/v1/justification
0,How much money does client 1 have in shares?,Client 1 has 20 shares. 70% of their shares ar...,"Client 1 has £14,000 worth of NVDA shares (70%...","Client 1 has £14,000 worth of NVDA shares (70%...",66,5,8.1,5,The output correctly states the distribution o...,4,The output aligns closely with the provided ta...,5,The output provided by the model is completely...,5,The output provided directly addresses all asp...
1,How much money does client 2 have in shares?,Client 2 has 10 shares. 30% of their shares ar...,"Client 2 has £3,000 worth of NVDA shares (30% ...","Client 2 has £7,300 in shares. This is calcula...",76,7,10.2,5,The output correctly calculates the amount of ...,4,The output closely aligns with the provided ta...,5,The output provided by the model is correct an...,5,The output directly addresses all aspects of t...


In [None]:
output_df['outputs'][1]

'Client 2 has £7,300 in shares. This is calculated by taking 30% of their shares in NVDA (3 shares x £1000 = £3000) and 70% of their shares in AAPL (7 shares x £190 = £1330), then adding these two amounts together (£3000 + £1330 = £4330).'

In [None]:
txt, _ = get_info_on_runs('mlflow_llm_development')
print(txt)

------------------------------------------------------------------------------------------------------------------------
Run ID: 2e6cb7e2740f488e8a1ffacd0b171c1e
Parameters: {'model': 'first=PromptTemplate(input_variables=[\'context\', \'inputs\'], template="You\'re a investment manager. Using the context provided, reply to the question below to the best of your ability:\\nQuestion:\\n{inputs}\\nContext:\\n{context}") middle=[ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x00000241E8C9CBD0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x00000241E8DC3D10>, model_name=\'gpt-3.5-turbo-0125\', temperature=0.0, openai_api_key=SecretStr(\'**********\'), openai_proxy=\'\')] last=RunnableLambda(_get_content)'}
Metrics: {'answer_correctness/v1/mean': 5.0, 'answer_correctness/v1/p90': 5.0, 'answer_correctness/v1/variance': 0.0, 'answer_relevance/v1/mean': 5.0, 'answer_relevance/v1/p90': 5.0, 'answer_relevance/v1/variance': 0.0, 'answer_

In [None]:
delete_all_runs('mlflow_llm_development')

## Agent Model Evaluation

We can instantiate a simple agent to answer our queries:

In [None]:
example_agent = create_agent()

Let's then evaluate that agent against a set of evaluation questions:

In [None]:
eval_set = pd.read_csv('\\'.join(os.getcwd().split('\\')[:-1]) +'/data/' + 'Evaluation Dataset - Agent.csv')
display(eval_set)

evaluate_agent(example_agent, eval_set['question'], 'mlflow_agent_development')

Unnamed: 0,question
0,Return the values for client_id 1 in the sql d...
1,Tell me the latest AAPL stock price.
2,Return me the stock allocation for client 5.
3,Return me the stock allocation for client 8.
4,Return me the stock allocation for every client.
5,Give me all the stock allocations from all cli...
6,Give me a sentence from the apple 10-k report.
7,What is the net sales of iPhones in 2021.
8,Add a new client to the database with random s...
9,Give me all the stock allocations from all cli...


Evaluating agent on questions...: 100%|██████████| 12/12 [00:55<00:00,  4.61s/it]
2024/05/31 10:14:09 INFO mlflow.tracking.fluent: Experiment with name 'mlflow_agent_development' does not exist. Creating a new experiment.


We can then investigate the performance of the model on ML Flow:

In [None]:
txt, _ = get_info_on_runs('mlflow_agent_development')
print(txt)

------------------------------------------------------------------------------------------------------------------------
Run ID: 27d970c3d142428f9823a1589065c0ef
Metrics: {'ari_score_mean': 5.6, 'ari_score_variance': 0.0, 'response_time_mean': 4.18, 'response_time_variance': 0.29, 'success_rate_mean': 0.0, 'success_rate_variance': 0.0}
Tags: {'mlflow.runName': 'rogue-conch-287', 'mlflow.source.name': 'c:\\Code\\GenAIGroupProject\\.venv\\Lib\\site-packages\\ipykernel_launcher.py', 'mlflow.source.type': 'LOCAL', 'mlflow.user': 'MichaelBerney'}


In [None]:
delete_all_runs('mlflow_agent_development')

## Using Real Agent

Here, our real agent class is used and tracked using MLFlow:

In [None]:
from agent import call_agent
from ml_flow import mlflow_server, evaluate_agent, get_info_on_runs, delete_all_runs

query = 'What is the best stock to invest into?'

response, agent = call_agent(query)

server_process = mlflow_server()
evaluate_agent(agent, query, 'mlflow_agent')

txt, metrics = get_info_on_runs('mlflow_agent_development')
print(txt)

delete_all_runs('mlflow_agent_development')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mSince the question involves specific financial data for Nvidia, I should use the query_embeddings tool to search for this information in the company's 10-K reports. 

Action: query_embeddings

Action Input: "Net sales Nvidia Q4 2021"
[0m

ModuleNotFoundError: No module named 'vector_database_rawtxt'