# Chatbot with MLflow Tracking 

## Setting Up the Core Libraries for the Chatbot and Tracking System

In this section, I start by setting up the essential tools that make the rest of the notebook come alive. The idea is to build a small interactive chatbot that talks through an OpenAI model like GPT-3.5-Turbo while quietly keeping track of everything happening in the background using MLflow. To make that possible, I import a few important libraries. The `os` and `time` modules help manage system operations and measure timing details. `mlflow` is what I use to log and track all the key metrics from each chat session, such as response time, tokens used, and model parameters. The `openai` package provides direct access to the language model itself, while `tiktoken` helps count tokens so I can understand how much text is being processed. Finally, I bring in `colorama` to add a touch of color to the console output, making the chatbot responses more readable and visually engaging. Together, these imports form the foundation for an AI-powered chat experience that‚Äôs both interactive and measurable.

In [None]:
# Import required libraries
import os
import time
import mlflow
from openai import OpenAI
import tiktoken as tk
from colorama import Fore, Style, init

## Defining Core Configuration and Model Parameters

Here, I define all the key settings that control how the chatbot behaves and how MLflow connects to the tracking server. The first line pulls my OpenAI API key from an environment variable, which keeps it secure instead of hardcoding it directly in the notebook. Next, I set the model name to `gpt-3.5-turbo`, which will handle all the chat responses. The `MLFLOW_URI` points to my local MLflow server where the tracking data will be stored and viewed later. Then I configure several tuning parameters such as `TEMPERATURE`, `TOP_P`, `FREQUENCY_PENALTY`, and `PRESENCE_PENALTY`, which influence how creative or consistent the model‚Äôs responses are. The `MAX_TOKENS` value limits how long the responses can be, while `DEBUG` lets me toggle extra logging for troubleshooting if needed. Together, these constants define the chatbot‚Äôs behavior and ensure the system logs each conversation under the right experiment setup.

In [3]:
# Define constants and configuration parameters
API_KEY = os.getenv("OPENAI_API_BOOK_KEY")
MODEL = "gpt-3.5-turbo"
MLFLOW_URI = "http://localhost:5000"
TEMPERATURE = 0.7
TOP_P = 1
FREQUENCY_PENALTY = 0
PRESENCE_PENALTY = 0
MAX_TOKENS = 800
DEBUG = False

## Initializing Colorama for Colored Console Output

In this step, I initialize `colorama`, a lightweight Python library that adds color and styling to text displayed in the terminal. By calling `init()`, I make sure the color codes work properly across different operating systems, especially on Windows where console color support can vary. This small setup step helps make the chatbot‚Äôs messages stand out visually, allowing prompts, responses, and status updates to appear in different colors for better readability during live interaction.

In [4]:
# Initialize colorama for colored console output
init()

## Connecting to the OpenAI API

At this point, I create an instance of the OpenAI client using my stored API key. This step is what actually connects the notebook to OpenAI‚Äôs servers and allows me to send messages to the model and receive responses in return. By initializing the client here, I make sure that every later request‚Äîwhether it‚Äôs a simple chat prompt or a logged conversation‚Äîcan securely communicate with the model I configured earlier.

In [5]:
# Initialize OpenAI client
client = OpenAI(api_key=API_KEY)

## Configuring MLflow Tracking and Experiment Setup

Here, I connect MLflow to the tracking server and define the experiment where all my chatbot sessions will be logged. The first line sets the tracking URI, which tells MLflow where to send and store the recorded data‚Äîsuch as metrics, parameters, and artifacts. The next line creates or switches to an experiment named `"GenAI_Week9"`, grouping all the related runs under a single project space. This setup ensures that each chat interaction is properly organized, making it easy for me to compare performance, tune parameters, and review results later through the MLflow dashboard.

In [6]:
# Set MLflow tracking URI and experiment
mlflow.set_tracking_uri(MLFLOW_URI)
mlflow.set_experiment("GenAI_Week9")

2025/11/08 14:00:01 INFO mlflow.tracking.fluent: Experiment with name 'GenAI_Week9' does not exist. Creating a new experiment.


<Experiment: artifact_location='mlflow-artifacts:/2', creation_time=1762632001448, experiment_id='2', last_update_time=1762632001448, lifecycle_stage='active', name='GenAI_Week9', tags={}>

## Creating Helper Functions for Colorful Chat Display

In this part, I define two simple helper functions that make the chat interface more readable by adding color to the console text. The first function, `print_user_input()`, displays anything I type in green, clearly marking it as the user‚Äôs message. The second one, `print_ai_output()`, prints the AI‚Äôs responses in blue, so it‚Äôs easy to distinguish between who said what during the conversation. Using these color cues not only makes the interaction feel more natural but also helps when reviewing logs or running multiple chat sessions in the same terminal window.

In [7]:
# Define helper functions for colored text display
def print_user_input(text):
    print(f"{Fore.GREEN}You: {Style.RESET_ALL}", text)

def print_ai_output(text):
    print(f"{Fore.BLUE}AI Assistant:{Style.RESET_ALL}", text)

## Adding a Token Counter for Measuring Text Length

In this section, I define a function that counts how many tokens a given piece of text contains. Tokens are the small text chunks that language models process, and knowing how many are used helps track cost, performance, and response size. The function first loads a specific tokenizer using `tiktoken`, which is designed to match the encoding used by OpenAI models. It then converts the input string into its tokenized form and simply returns the number of tokens found. This small utility will later help me log token usage for each chat message and analyze how efficiently the model is generating responses.

In [8]:
# Define a function to count tokens using tiktoken
def count_tokens(string: str, encoding_name="cl100k_base") -> int:
    # Get encoding for the provided name
    encoding = tk.get_encoding(encoding_name)
    
    # Encode the string to tokens
    encoded_string = encoding.encode(string, disallowed_special=())
    
    # Count number of tokens
    num_tokens = len(encoded_string)
    return num_tokens

## Building the Core Function to Generate and Log AI Responses

This function is the heart of the chatbot‚Äîit handles sending messages to the OpenAI model, receiving responses, measuring performance, and logging everything through MLflow. When I call `generate_text()`, it starts by recording the current time so I can later calculate how long the model took to reply. The function then sends the full conversation history to the model using the `ChatCompletion` API, along with the configured parameters that shape the tone and creativity of the response. Once the model replies, I measure the total latency and extract the AI‚Äôs message. To keep track of efficiency, I count tokens for the latest user input, the entire conversation, and the model‚Äôs response. These metrics, along with the model configuration details, are logged to MLflow so I can analyze patterns like response time or token usage over different runs. If debugging is enabled, I can even pause to inspect the active run ID before continuing. In the end, the function returns the AI-generated text, completing one full cycle of interaction, measurement, and tracking.

In [9]:
# Define a function to generate text using OpenAI API
def generate_text(conversation, max_tokens=100) -> str:
    # Record start time for latency calculation
    start_time = time.time()
    
    # Call the OpenAI ChatCompletion API
    response = client.chat.completions.create(
        model=MODEL,
        messages=conversation,
        temperature=TEMPERATURE,
        max_tokens=max_tokens,
        top_p=TOP_P,
        frequency_penalty=FREQUENCY_PENALTY,
        presence_penalty=PRESENCE_PENALTY
    )
    
    # Measure total latency
    latency = time.time() - start_time
    
    # Extract message content
    message_response = response.choices[0].message.content
    
    # Count token usage
    prompt_tokens = count_tokens(conversation[-1]['content'])
    conversation_tokens = count_tokens(str(conversation))
    completion_tokens = count_tokens(message_response)
    
    # Retrieve the active MLflow run
    run = mlflow.active_run()
    
    # Display debug information if enabled
    if DEBUG:
        print(f"Run ID: {run.info.run_id}")
        input("Press Enter to continue...")
    
    # Log metrics to MLflow
    mlflow.log_metrics({
        "request_count": 1,
        "request_latency": latency,
        "prompt_tokens": prompt_tokens,
        "completion_tokens": completion_tokens,
        "conversation_tokens": conversation_tokens
    })
    
    # Log model parameters to MLflow
    mlflow.log_params({
        "model": MODEL,
        "temperature": TEMPERATURE,
        "top_p": TOP_P,
        "frequency_penalty": FREQUENCY_PENALTY,
        "presence_penalty": PRESENCE_PENALTY
    })
    
    # Return AI-generated message
    return message_response

## Enabling MLflow Autologging for Automatic Tracking

Here, I turn on MLflow‚Äôs autologging feature, which automatically captures important information during each run without requiring extra manual code. With this enabled, MLflow keeps track of parameters, metrics, and artifacts generated by the model or supporting libraries. It acts as a safety net that ensures no key data is missed, even if I forget to log something explicitly. This makes the experiment tracking smoother and provides a more complete record of each chatbot interaction for later analysis.

In [10]:
# Enable MLflow autologging
mlflow.autolog()

2025/11/08 14:00:01 INFO mlflow.tracking.fluent: Autologging successfully enabled for openai.


## Enabling MLflow Autologging for Automatic Tracking

Here, I turn on MLflow‚Äôs autologging feature, which automatically captures important information during each run without requiring extra manual code. With this enabled, MLflow keeps track of parameters, metrics, and artifacts generated by the model or supporting libraries. It acts as a safety net that ensures no key data is missed, even if I forget to log something explicitly. This makes the experiment tracking smoother and provides a more complete record of each chatbot interaction for later analysis.

In [11]:
# Start an MLflow run and interact with the AI assistant
with mlflow.start_run() as run:
    # Initialize the system message for conversation context
    conversation = [
        {"role": "system", "content": "You are a helpful assistant."},
    ]
    
    # Enter interactive conversation loop
    while True:
        user_input = input("User: ")
        
        # Exit loop on specific user commands
        if user_input.lower() in ["exit", "quit", "q", "e"]:
            break
        
        # Append user message to conversation
        conversation.append({"role": "user", "content": user_input})
        
        # Generate AI response
        ai_output = generate_text(conversation, MAX_TOKENS)
        
        # Display AI response with color
        print_ai_output(ai_output)
        
        # Add AI response to conversation history
        conversation.append({"role": "assistant", "content": ai_output})

AI Assistant: In traditional programming, a developer writes explicit instructions that tell a computer how to perform a specific task. The developer needs to anticipate all possible scenarios and write code to handle them accordingly. The program follows these predefined rules to produce the desired output.

On the other hand, in machine learning, a computer is trained to learn from data without being explicitly programmed for specific tasks. Instead of writing explicit instructions, a machine learning algorithm learns patterns and relationships from the data provided to it. The model created during the training phase can then make predictions or decisions based on new, unseen data.

In summary, the main difference is that traditional programming relies on explicit instructions provided by the developer, while machine learning leverages data to learn and make decisions. Machine learning is particularly useful for tasks where it is difficult to explicitly define rules or where the data

## Analysis
When I asked the question, "Explain how machine learning differs from traditional programming," the model delivered a clear and accurate explanation that effectively contrasted rule-based programming with data-driven learning. The model‚Äôs response clearly demonstrates that both the chatbot and MLflow tracking are functioning as intended. It provides an accurate and well-organized explanation that distinguishes traditional programming from machine learning in a simple, logical way. The tone is clear and instructive, showing that the model can communicate technical ideas effectively. The successful logging of metrics, visible run details, and smooth display formatting confirm that the full interaction pipeline‚Äîfrom user input to AI response and experiment tracking‚Äîworks seamlessly, making this an effective first validation of the system.