# Nebius Managed Service for MLflow Demo

This demo shows you how key use cases of using the Nebius Managed Service for MLflow for AI development.

## Getting Started

1. Launch your instance of the Managed Service for MLflow with [the MLflow quickstart](https://docs.nebius.com/mlflow/quickstart).
2. Set up your API key to connect to Nebius AI Studio with [the AI Studio quickstart](https://docs.nebius.com/studio/inference/quickstart).

> **Note:** Launching an MLflow cluster may take 30-60 minutes to be fully provisioned and ready to use.

In [1]:
!pip install mlflow==2.20.2 python-dotenv openai



### Secrets and Environment Variables

You will need to set the following environment variables where this notebook is running, so that the code in the following cells can connect to both Nebius Managed Service for MLflow and Nebius AI Studio. 

MLflow:<br>
`MLFLOW_TRACKING_SERVER_CERT_PATH`<br>
`MLFLOW_TRACKING_URI`<br>
`MLFLOW_TRACKING_USERNAME`<br>
`MLFLOW_TRACKING_PASSWORD`<br>

AI Studio:<br>
`NEBIUS_API_KEY`

### Environment Setup

To set the environment variables, run the following cell. You may choose to set them interactively or by loading from a `.env` file.

In [1]:
%reload_ext autoreload
%autoreload 2

import os
import sys
from pathlib import Path
from dotenv import load_dotenv

# Add the parent directory to Python path
sys.path.append(str(Path.cwd().parent))

from env_setup import setup_env_from_file, setup_env_interactive, verify_env_setup

# Option 1: Interactive setup
# setup_env_interactive()

# Option 2: Load from .env file
setup_env_from_file('../.env')

# Verify the setup
verify_env_setup()

Load ENV variables from file: ../.env
All required environment variables are set.


True

### Check connection to MLflow


In [2]:
import mlflow 

# List experiments in MLflow
mlflow.search_experiments()

[<Experiment: artifact_location='mlflow-artifacts:/2', creation_time=1744716009071, experiment_id='2', last_update_time=1744716009071, lifecycle_stage='active', name='Autolog', tags={}>,
 <Experiment: artifact_location='mlflow-artifacts:/1', creation_time=1744712739337, experiment_id='1', last_update_time=1744712739337, lifecycle_stage='active', name='CrewAI Agent Debugging Demo', tags={}>,
 <Experiment: artifact_location='mlflow-artifacts:/0', creation_time=1744710198517, experiment_id='0', last_update_time=1744710198517, lifecycle_stage='active', name='Default', tags={}>]

### Set up Nebius AI Studio client

In [3]:
import openai

API_KEY = os.environ.get("NEBIUS_API_KEY")

# Instantiate the client instance
nebius_client = openai.OpenAI(api_key=API_KEY,
                              base_url="https://api.studio.nebius.ai/v1/")


# Example 1: MLflow Tracking Quickstart

https://mlflow.org/docs/latest/getting-started/intro-quickstart/

In [21]:
import mlflow
from mlflow.models import infer_signature

import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


# Load the Iris dataset
X, y = datasets.load_iris(return_X_y=True)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define the model hyperparameters
params = {
    "solver": "lbfgs",
    "max_iter": 1000,
    "multi_class": "auto",
    "random_state": 8888,
}

# Train the model
lr = LogisticRegression(**params)
lr.fit(X_train, y_train)

# Predict on the test set
y_pred = lr.predict(X_test)

# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)



### Log the model and its metadata to MLflow

- Initiate an MLflow run context to start a new run that we will log the model and metadata to.
- Log model parameters and performance metrics.
- Tag the run for easy retrieval.
- Register the model in the MLflow Model Registry while logging (saving) the model.


In [22]:
# Create a new MLflow Experiment
mlflow.set_experiment("MLflow Quickstart")

# Start an MLflow run
with mlflow.start_run():
    # Log the hyperparameters
    mlflow.log_params(params)

    # Log the loss metric
    mlflow.log_metric("accuracy", accuracy)

    # Set a tag that we can use to remind ourselves what this run was for
    mlflow.set_tag("Training Info", "Basic LR model for iris data")

    # Infer the model signature
    signature = infer_signature(X_train, lr.predict(X_train))

    # Log the model
    model_info = mlflow.sklearn.log_model(
        sk_model=lr,
        artifact_path="iris_model",
        signature=signature,
        input_example=X_train,
        registered_model_name="tracking-quickstart",
    )

Registered model 'tracking-quickstart' already exists. Creating a new version of this model...
2025/03/25 11:59:02 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: tracking-quickstart, version 4
Created version '4' of model 'tracking-quickstart'.


🏃 View run colorful-auk-46 at: https://public-tracking.mlflow-e00e997y0znyf15mwg.backbone-e00rt16p71ew33dg9e.msp.eu-north1.nebius.cloud/#/experiments/1/runs/263bbfe1281f4a0bb5e60cedee2d2523
🧪 View experiment at: https://public-tracking.mlflow-e00e997y0znyf15mwg.backbone-e00rt16p71ew33dg9e.msp.eu-north1.nebius.cloud/#/experiments/1


In [23]:
model_info.model_uri

'runs:/263bbfe1281f4a0bb5e60cedee2d2523/iris_model'

## Load the model as a Python Function (pyfunc) and use it for inference

- Loading the model using MLflow's pyfunc flavor.
- Running Predict on new data using the loaded model

In [24]:
# Load the model back for predictions as a generic Python Function model
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

loaded_model

Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

mlflow.pyfunc.loaded_model:
  artifact_path: iris_model
  flavor: mlflow.sklearn
  run_id: 263bbfe1281f4a0bb5e60cedee2d2523

In [25]:
# Get predictions 

predictions = loaded_model.predict(X_test)

iris_feature_names = datasets.load_iris().feature_names

result = pd.DataFrame(X_test, columns=iris_feature_names)
result["actual_class"] = y_test
result["predicted_class"] = predictions

result[:4]

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),actual_class,predicted_class
0,6.1,2.8,4.7,1.2,1,1
1,5.7,3.8,1.7,0.3,0,0
2,7.7,2.6,6.9,2.3,2,2
3,6.0,2.9,4.5,1.5,1,1


# Example 2: MLflow Tracing for LLM Observability

Traces enhances LLM observability in your Generative AI (GenAI) applications by capturing detailed information about the execution details. 

https://mlflow.org/docs/latest/tracing/

## Automatic Tracing of LLM calls

In [35]:
!env | grep MLFLOW_TRACKING_SERVER_CERT_PATH



MLFLOW_TRACKING_SERVER_CERT_PATH=/home/jovyan/ca.pem


In [6]:
import mlflow
import openai

tracing_experiment =mlflow.set_experiment("MLflow Tracing")

# Enable MLflow automatic tracing for OpenAI with one line of code!
mlflow.openai.autolog()


# Time to call the LLM -- tracing is done automatically
nebius_client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    temperature=0.95,
    messages=[
        {"role": "system", "content": "You are a chatbot."},
        {"role": "user", "content": "What is the weather like today?"},
    ],
)


ChatCompletion(id='chatcmpl-508058828bc24bd7b2eef9e48eea3ae1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="I'm just a chatbot, I don't have have access to real-time information or a physical location, so I don't know the current weather. But I can suggest some ways for you to find out the weather:\n\n1. Check online weather websites or apps, such as AccuWeather or Weather.com, for the current weather conditions in your area.\n2. Look out the window to see the current weather conditions.\n3. Check your local news or weather forecast for updates on the weather.\n\nIf you'd like to know the weather for a specific location, please let me know and I can try to provide you with general information about the typical weather patterns for that area.", refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None)], created=1742904463, model='meta-llama/Llama-3.3-

## Manual Tracing

1. Instrument a function with @mlflow.trace decorator.
2. Instrument any block of code using mlflow.start_span context manager.
3. Grouping or annotating traces using a tag.
4. Disabling trace globally.

https://mlflow.org/docs/latest/tracing/#manual-tracing

In [7]:
import mlflow
from mlflow.entities import SpanType
import openai
import time

In [8]:
# Use MLflow tracing to trace the execution of a function

@mlflow.trace(span_type="func", attributes={"key": "value"})
def add_1(x):
    return x + 1


@mlflow.trace(span_type="func", attributes={"key": "value"})
def minus_1(x):
    return x - 1


@mlflow.trace(name="Trace Test")
def trace_test(x):
    step1 = add_1(x)
    return minus_1(step1)


trace_test(4)

4

In [11]:
# Integrate tracing into your LLM workflow

mlflow.openai.autolog()

@mlflow.trace(span_type=SpanType.CHAIN)
def run(question):
    messages = build_messages(question)
    # MLflow automatically generates a span for OpenAI invocation
    response = nebius_client.chat.completions.create(
        # model="gpt-4o-mini",
        model="meta-llama/Llama-3.3-70B-Instruct",
        max_tokens=100,
        messages=messages,
    )
    return parse_response(response)


@mlflow.trace
def build_messages(question):
    return [
        {"role": "system", "content": "You are a helpful chatbot."},
        {"role": "user", "content": question},
    ]

@mlflow.trace
def parse_response(response):
    return response.choices[0].message.content


run("What is MLflow?")

'**MLflow** is an open-source platform for managing the end-to-end machine learning (ML) lifecycle. It helps data scientists and engineers to track, reproduce, and deploy ML models efficiently. MLflow provides a standard way to manage ML workflows, including data preparation, model training, model evaluation, and model deployment.\n\n### Key Features of MLflow\n\n1. **Experiment Tracking**: MLflow allows users to track and manage experiments, including hyperparameter tuning, model selection, and results comparison.\n2'

In [12]:
# Get the timestamp in milliseconds
one_hour_ago = int(time.time() - 3600) * 1000  # 3600 seconds = 1 hour in milliseconds

one_hour_ago

1742901029000

In [13]:
# Search and analyze traces

mlflow.search_traces(
    tracing_experiment.experiment_id, 
    filter_string=f"timestamp_ms < {one_hour_ago}",
)[:3]

Unnamed: 0,request_id,trace,timestamp_ms,status,execution_time_ms,request,response,request_metadata,spans,tags
0,0383cf8068844fa6a9245fdc233aabd5,Trace(request_id=0383cf8068844fa6a9245fdc233aa...,1742854437706,TraceStatus.OK,3088,"{'kwargs': {'provider': 'huggingface-hub', 'co...","{'config': {'path': 'intfloat/e5-base', 'instr...","{'mlflow.trace_schema.version': '2', 'mlflow.t...","[{'name': 'Embeddings.load', 'context': {'span...",{'mlflow.source.name': '/Users/mikhailrozhkov/...
1,9969d703bbe84fdba40f8edfe4fce715,Trace(request_id=9969d703bbe84fdba40f8edfe4fce...,1742853900983,TraceStatus.OK,3869,"{'kwargs': {'provider': 'huggingface-hub', 'co...","{'config': {'autoid': 'uuid5', 'path': 'intflo...","{'mlflow.trace_schema.version': '2', 'mlflow.t...","[{'name': 'Embeddings.load', 'context': {'span...",{'mlflow.source.name': '/Users/mikhailrozhkov/...
2,776d1d9d3a204c41975729429a535faf,Trace(request_id=776d1d9d3a204c41975729429a535...,1742853720840,TraceStatus.OK,3595,"{'kwargs': {'provider': 'huggingface-hub', 'co...","{'config': {'autoid': 'uuid5', 'path': 'intflo...","{'mlflow.trace_schema.version': '2', 'mlflow.t...","[{'name': 'Embeddings.load', 'context': {'span...",{'mlflow.source.name': '/Users/mikhailrozhkov/...


# Example 3:  Tracing LangGraph 

- Tracing LangGraph with MLflow https://mlflow.org/docs/latest/tracing/integrations/langgraph 
- Example: Code generation with RAG and self-correction with https://langchain-ai.github.io/langgraph/tutorials/code_assistant/langgraph_code_assistant/

For this example set additional vars: 
- OPENAI_API_KEY
- ANTHROPIC_API_KEY

In [5]:
!pip install langchain_openai langchain langgraph langchain_core

Collecting langgraph
  Using cached langgraph-0.3.21-py3-none-any.whl.metadata (7.7 kB)
Collecting langgraph-checkpoint<3.0.0,>=2.0.10 (from langgraph)
  Downloading langgraph_checkpoint-2.0.23-py3-none-any.whl.metadata (4.6 kB)
Collecting langgraph-prebuilt<0.2,>=0.1.1 (from langgraph)
  Downloading langgraph_prebuilt-0.1.7-py3-none-any.whl.metadata (5.0 kB)
Collecting langgraph-sdk<0.2.0,>=0.1.42 (from langgraph)
  Downloading langgraph_sdk-0.1.60-py3-none-any.whl.metadata (1.8 kB)
Collecting xxhash<4.0.0,>=3.5.0 (from langgraph)
  Downloading xxhash-3.5.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (12 kB)
Collecting langchain_core
  Downloading langchain_core-0.2.43-py3-none-any.whl.metadata (6.2 kB)
Collecting ormsgpack<2.0.0,>=1.8.0 (from langgraph-checkpoint<3.0.0,>=2.0.10->langgraph)
  Downloading ormsgpack-1.9.1-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.metadata (43 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Using cached langs

In [6]:
from typing import Literal

import mlflow

from langchain_core.messages import AIMessage, ToolCall
from langchain_core.outputs import ChatGeneration, ChatResult
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

# Enabling tracing for LangGraph (LangChain)
mlflow.langchain.autolog()

# Optional: Set a tracking URI and an experiment
# mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("LangGraph")


@tool
def get_weather(city: Literal["nyc", "sf"]):
    """Use this to get weather information."""
    if city == "nyc":
        return "It might be cloudy in nyc"
    elif city == "sf":
        return "It's always sunny in sf"


llm = ChatOpenAI(model="gpt-4o-mini")
tools = [get_weather]
graph = create_react_agent(llm, tools)

# Invoke the graph
result = graph.invoke(
    {"messages": [{"role": "user", "content": "what is the weather in tokyo?"}]}
)

# Example 4: Tracing CrewAI Agents

- Example https://mlflow.org/docs/latest/tracing/integrations/crewai 

In [4]:
!pip install crewai crewai_tools



In [5]:
import mlflow

# Turn on auto tracing by calling mlflow.crewai.autolog()
mlflow.crewai.autolog()

mlflow.set_experiment("CrewAI")

2025/04/15 13:28:13 INFO mlflow.tracking.fluent: Experiment with name 'CrewAI' does not exist. Creating a new experiment.


<Experiment: artifact_location='mlflow-artifacts:/3', creation_time=1744716493013, experiment_id='3', last_update_time=1744716493013, lifecycle_stage='active', name='CrewAI', tags={}>

2025-04-15 21:10:15,569 - 13915959296 - __init__.py-__init__:364 - ERROR: Exception while exporting Span batch.
Traceback (most recent call last):
  File "/Users/mikhailrozhkov/.pyenv/versions/3.11.6/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/Users/mikhailrozhkov/.pyenv/versions/3.11.6/lib/python3.11/site-packages/urllib3/connectionpool.py", line 534, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/Users/mikhailrozhkov/.pyenv/versions/3.11.6/lib/python3.11/site-packages/urllib3/connection.py", line 516, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mikhailrozhkov/.pyenv/versions/3.11.6/lib/python3.11/http/client.py", line 1378, in getresponse
    response.begin()
  File "/Users/mikhailrozhkov/.pyenv/versions/3.11.6/lib/python3.11/http/client.py", line 318,

In [6]:
from crewai import Agent, Crew, Task
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
from crewai_tools import SerperDevTool, WebsiteSearchTool

from textwrap import dedent

content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(
    content=content, metadata={"preference": "personal"}
)

search_tool = WebsiteSearchTool()

class TripAgents:
    def city_selection_agent(self):
        return Agent(
            role="City Selection Expert",
            goal="Select the best city based on weather, season, and prices",
            backstory="An expert in analyzing travel data to pick ideal destinations",
            tools=[
                search_tool,
            ],
            verbose=True,
        )

    def local_expert(self):
        return Agent(
            role="Local Expert at this city",
            goal="Provide the BEST insights about the selected city",
            backstory="""A knowledgeable local guide with extensive information
        about the city, it's attractions and customs""",
            tools=[search_tool],
            verbose=True,
        )

class TripTasks:
    def identify_task(self, agent, origin, cities, interests, range):
        return Task(
            description=dedent(
                f"""
                Analyze and select the best city for the trip based
                on specific criteria such as weather patterns, seasonal
                events, and travel costs. This task involves comparing
                multiple cities, considering factors like current weather
                conditions, upcoming cultural or seasonal events, and
                overall travel expenses.
                Your final answer must be a detailed
                report on the chosen city, and everything you found out
                about it, including the actual flight costs, weather
                forecast and attractions.

                Traveling from: {origin}
                City Options: {cities}
                Trip Date: {range}
                Traveler Interests: {interests}
            """
            ),
            agent=agent,
            expected_output="Detailed report on the chosen city including flight costs, weather forecast, and attractions",
        )

    def gather_task(self, agent, origin, interests, range):
        return Task(
            description=dedent(
                f"""
                As a local expert on this city you must compile an
                in-depth guide for someone traveling there and wanting
                to have THE BEST trip ever!
                Gather information about key attractions, local customs,
                special events, and daily activity recommendations.
                Find the best spots to go to, the kind of place only a
                local would know.
                This guide should provide a thorough overview of what
                the city has to offer, including hidden gems, cultural
                hotspots, must-visit landmarks, weather forecasts, and
                high level costs.
                The final answer must be a comprehensive city guide,
                rich in cultural insights and practical tips,
                tailored to enhance the travel experience.

                Trip Date: {range}
                Traveling from: {origin}
                Traveler Interests: {interests}
            """
            ),
            agent=agent,
            expected_output="Comprehensive city guide including hidden gems, cultural hotspots, and practical travel tips",
        )


class TripCrew:
    def __init__(self, origin, cities, date_range, interests):
        self.cities = cities
        self.origin = origin
        self.interests = interests
        self.date_range = date_range

    def run(self):
        agents = TripAgents()
        tasks = TripTasks()

        city_selector_agent = agents.city_selection_agent()
        local_expert_agent = agents.local_expert()

        identify_task = tasks.identify_task(
            city_selector_agent,
            self.origin,
            self.cities,
            self.interests,
            self.date_range,
        )
        gather_task = tasks.gather_task(
            local_expert_agent, self.origin, self.interests, self.date_range
        )

        crew = Crew(
            agents=[city_selector_agent, local_expert_agent],
            tasks=[identify_task, gather_task],
            verbose=True,
            memory=True,
            knowledge={
                "sources": [string_source],
                "metadata": {"preference": "personal"},
                "collection_name":"knowledge",
            },
        )

        result = crew.kickoff()
        return result


trip_crew = TripCrew("California", "Tokyo", "Dec 12 - Dec 20", "sports")


/Users/mikhailrozhkov/.pyenv/versions/3.11.6/lib/python3.11/site-packages/pydantic/_internal/_config.py:291: PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.8/migration/
  "cipher": algorithms.TripleDES,
  "class": algorithms.TripleDES,
  warn(
/Users/mikhailrozhkov/.pyenv/versions/3.11.6/lib/python3.11/site-packages/crewai_tools/tools/scrapegraph_scrape_tool/scrapegraph_scrape_tool.py:34: PydanticDeprecatedSince20: Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.8/migration/
  @validator("website_url")
/Users/mikhailrozhkov/.pyenv/versions/3.11.6/lib/python3.11/site-packages/crewai_tool

In [7]:
# Run the crew

result = trip_crew.run()
result.dict()



[1m[95m# Agent:[00m [1m[92mCity Selection Expert[00m
[95m## Task:[00m [92m
Analyze and select the best city for the trip based
on specific criteria such as weather patterns, seasonal
events, and travel costs. This task involves comparing
multiple cities, considering factors like current weather
conditions, upcoming cultural or seasonal events, and
overall travel expenses.
Your final answer must be a detailed
report on the chosen city, and everything you found out
about it, including the actual flight costs, weather
forecast and attractions.

Traveling from: California
City Options: Tokyo
Trip Date: Dec 12 - Dec 20
Traveler Interests: sports
[00m


Inserting batches in chromadb: 100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.23it/s]




[1m[95m# Agent:[00m [1m[92mCity Selection Expert[00m
[95m## Thought:[00m [92mI need to gather comprehensive information regarding flight costs, weather forecasts, and sports events in Tokyo during the travel dates of December 12 to December 20, 2023. I will first search for flight prices and weather forecasts.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"flights from San Francisco to Tokyo December 12 to December 20 2023\", \"website\": \"https://www.kayak.com\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
JNTO - Official Tourism Guide for Japan Travel Trade Media Meetings & Events Newsletter Sign-up Coronavirus (COVID-19) Advisory Information Select Language 简体中文 繁體中文（香港） 繁體中文（臺灣） India (English) Bahasa Indonesia 한국어 ภาษาไทย Tiếng Việt Singapore (English) Philippines (English) Malaysia (English) Australia/New Zealand (English) Français Deutsch Italiano Español United Kingdom (English) N

Inserting batches in chromadb: 100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.47it/s]




[1m[95m# Agent:[00m [1m[92mCity Selection Expert[00m
[95m## Thought:[00m [92mI need to gather information regarding the weather forecast and local sporting events in Tokyo during the travel dates of December 12 to December 20. I will now search for that information.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Tokyo weather forecast December 12 to December 20 2023\", \"website\": \"https://weather.com\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
spectacular cherry blossom illuminations in Tokyo in 2025Sakura are gorgeous in the day, but the light-ups at parks and shrines prove that the blossoms look even better at night Content from our partnersThings to do5 reasons Zekkei Fireworks at Mt Fuji make for a unique Japanese springtime experienceRestaurantsPaid contentDig into the rise of game meat in TokyoDig into the rise of game meat in TokyoThings to doPaid content10 places to experienc

[91m 

I encountered an error while trying to use the tool. This was the error: 404 Client Error: Not Found for url: https://www.japan.travel/en/uk/events/.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mCity Selection Expert[00m
[95m## Thought:[00m [92mI need to gather more specific information regarding local sports events happening in Tokyo from December 12 to December 20, 2023. This will help provide a comprehensive analysis for the final answer regarding the trip from California to Tokyo.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m##

[91m 

I encountered an error while trying to use the tool. This was the error: HTTPSConnectionPool(host='www.tokyotimes.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3575f0090>, 'Connection to www.tokyotimes.com timed out. (connect timeout=30)')).
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mCity Selection Expert[00m
[95m## Thought:[00m [92mThought: I need to find alternative sources to gather information on specific sports events in Tokyo during the travel dates of December 12 to Dec

[91m 

I encountered an error while trying to use the tool. This was the error: 404 Client Error: Not Found for url: https://www.japan.travel/en/uk/events/.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mCity Selection Expert[00m
[95m## Thought:[00m [92mThought: I need to gather comprehensive information regarding the upcoming sports events in Tokyo for the dates of December 12 to December 20, 2023, from a reliable source.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"upcoming sports events in 

Inserting batches in chromadb: 100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.78it/s]




[1m[95m# Agent:[00m [1m[92mCity Selection Expert[00m
[95m## Thought:[00m [92mThought: Since the previous attempts to find information on sporting events in Tokyo were unsuccessful, I will now attempt to gather information about activities that may include sports from another source to ensure a comprehensive approach.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Tokyo events December 2023\", \"website\": \"https://www.timeout.com/tokyo\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Tokyo Weekender | Japan's No.1 English Lifestyle Magazine Music Upcoming Concerts and Music Events in Tokyo: April 2025 Anime & Manga AnimeJapan 2025: All Major Announcements and Previews Things To Do in Tokyo The Best Things To Do in Tokyo This Week Tokyo Weekender's guide to the best events, activities and things to do in the city Travel Guides Japan Hiking Guide: Top Spots from Hokkaido to Okinawa History & 



[1m[95m# Agent:[00m [1m[92mCity Selection Expert[00m
[95m## Thought:[00m [92mThought: I need to gather specific weather forecast information for Tokyo during the travel dates of December 12 to December 20, 2023, to provide a comprehensive overview.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Tokyo weather forecast December 12 to December 20 2023\", \"website\": \"https://weather.com\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
spectacular cherry blossom illuminations in Tokyo in 2025Sakura are gorgeous in the day, but the light-ups at parks and shrines prove that the blossoms look even better at night Content from our partnersThings to do5 reasons Zekkei Fireworks at Mt Fuji make for a unique Japanese springtime experienceRestaurantsPaid contentDig into the rise of game meat in TokyoDig into the rise of game meat in TokyoThings to doPaid content10 places to experience Dutch history in



[1m[95m# Agent:[00m [1m[92mCity Selection Expert[00m
[95m## Final Answer:[00m [92m
**Comprehensive City Guide to Tokyo, Japan (December 12 - December 20, 2023)**

---

**1. Arrival and Flight Costs:**
Traveling from California, the average round-trip flight price from San Francisco to Tokyo is approximately **$900**. Major airlines such as **United**, **ANA**, and **American Airlines** have routes available. Booking early can secure better prices, and keeping flexibility in dates can help find lower tickets.

---

**2. Weather Forecast:**
Tokyo typically experiences mild but cold weather in December. Here’s the forecast to help you pack accordingly:
- **December 12:** High of 15°C (59°F), Low of 7°C (45°F), Mostly Sunny
- **December 13:** High of 14°C (57°F), Low of 8°C (46°F), Partly Cloudy
- **December 14:** High of 14°C (57°F), Low of 6°C (43°F), Rain Expected
- **December 15:** High of 12°C (54°F), Low of 5°C (41°F), Showers Possible
- **December 16:** High of 13°C (55°F

  if mode == Mode.FUNCTIONS:
  if mode == Mode.FUNCTIONS:


[1m[95m# Agent:[00m [1m[92mLocal Expert at this city[00m
[95m## Task:[00m [92m
As a local expert on this city you must compile an
in-depth guide for someone traveling there and wanting
to have THE BEST trip ever!
Gather information about key attractions, local customs,
special events, and daily activity recommendations.
Find the best spots to go to, the kind of place only a
local would know.
This guide should provide a thorough overview of what
the city has to offer, including hidden gems, cultural
hotspots, must-visit landmarks, weather forecasts, and
high level costs.
The final answer must be a comprehensive city guide,
rich in cultural insights and practical tips,
tailored to enhance the travel experience.

Trip Date: Dec 12 - Dec 20
Traveling from: California
Traveler Interests: sports
[00m




[1m[95m# Agent:[00m [1m[92mLocal Expert at this city[00m
[95m## Thought:[00m [92mI need to gather more detailed insights specifically related to sports culture in Tokyo, including local sports bars, events, and dining recommendations that cater to sports enthusiasts. This will enhance the guide for John's trip.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Tokyo sports culture hidden gems, local sports bars, dining recommendations\", \"website\": \"https://www.timeout.com/tokyo\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Shrines with Hidden Samurai Histories Travel Artober March Edition: Tokyo’s Art and Photography Festival Returns This Month Art & Culture Health & Beauty See All 5 Essential Japanese Products To Stay Warm Health & Beauty New Year, New Japanese Skincare Brands You Need To Try Health & Beauty Japanese Perfumes To Relive Your Favorite Memories of Japan Health & Beauty 10 



[1m[95m# Agent:[00m [1m[92mLocal Expert at this city[00m
[95m## Final Answer:[00m [92m
# Comprehensive City Guide to Tokyo, Japan (December 12 - December 20, 2023)

---

**1. Arrival and Flight Costs:**
Traveling from California, the average round-trip flight price from San Francisco to Tokyo is approximately **$900**. Major airlines such as **United**, **ANA**, and **American Airlines** have routes available. Booking early can secure better prices, and keeping flexibility in dates can help find lower tickets.

---

**2. Weather Forecast:**
Tokyo typically experiences mild but cold weather in December. Here’s the forecast to help you pack accordingly:
- **December 12:** High of 15°C (59°F), Low of 7°C (45°F), Mostly Sunny
- **December 13:** High of 14°C (57°F), Low of 8°C (46°F), Partly Cloudy
- **December 14:** High of 14°C (57°F), Low of 6°C (43°F), Rain Expected
- **December 15:** High of 12°C (54°F), Low of 5°C (41°F), Showers Possible
- **December 16:** High of 13°C (55

  if mode == Mode.FUNCTIONS:
  if mode == Mode.FUNCTIONS:


/Users/mikhailrozhkov/.pyenv/versions/3.11.6/lib/python3.11/site-packages/pydantic/main.py:1087: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.8/migration/


{'raw': "# Comprehensive City Guide to Tokyo, Japan (December 12 - December 20, 2023)\n\n---\n\n**1. Arrival and Flight Costs:**\nTraveling from California, the average round-trip flight price from San Francisco to Tokyo is approximately **$900**. Major airlines such as **United**, **ANA**, and **American Airlines** have routes available. Booking early can secure better prices, and keeping flexibility in dates can help find lower tickets.\n\n---\n\n**2. Weather Forecast:**\nTokyo typically experiences mild but cold weather in December. Here’s the forecast to help you pack accordingly:\n- **December 12:** High of 15°C (59°F), Low of 7°C (45°F), Mostly Sunny\n- **December 13:** High of 14°C (57°F), Low of 8°C (46°F), Partly Cloudy\n- **December 14:** High of 14°C (57°F), Low of 6°C (43°F), Rain Expected\n- **December 15:** High of 12°C (54°F), Low of 5°C (41°F), Showers Possible\n- **December 16:** High of 13°C (55°F), Low of 6°C (43°F), Mostly Sunny\n- **December 17:** High of 11°C (52°

## Updated example 

In [5]:
!pip install crewai==0.108.0 crewai-tools duckduckgo-search langchain-openai



In [18]:
import os
from textwrap import dedent
from crewai import Agent, Crew, Task, Process
from crewai_tools import WebsiteSearchTool, DuckDuckGoSearchRunTool
# You might need specific LangChain components depending on your LLM setup
from langchain_openai import ChatOpenAI # Example for OpenAI or compatible APIs like Ollama

print("Libraries imported.")


ImportError: cannot import name 'DuckDuckGoSearchRunTool' from 'crewai_tools' (/Users/mikhailrozhkov/.pyenv/versions/3.11.6/lib/python3.11/site-packages/crewai_tools/__init__.py)

In [16]:
# Turn on auto tracing by calling mlflow.crewai.autolog()
mlflow.crewai.autolog()

mlflow.set_experiment("CrewAI Agents")

2025/03/29 20:39:42 INFO mlflow.tracking.fluent: Experiment with name 'CrewAI Agents' does not exist. Creating a new experiment.


<Experiment: artifact_location='mlflow-artifacts:/12', creation_time=1743277182597, experiment_id='12', last_update_time=1743277182597, lifecycle_stage='active', name='CrewAI Agents', tags={}>

In [8]:
# --- VERY IMPORTANT: Configure Your LLM ---
# Option A: Set OpenAI API Key as Environment Variable (Recommended for Simplicity)
# Make sure your OPENAI_API_KEY environment variable is set *before* launching Jupyter/VSCode.
# CrewAI will automatically pick it up if no specific 'llm' is passed to Agents/Crew.
# You can uncomment the line below to set it temporarily for this session *only*
# os.environ["OPENAI_API_KEY"] = "sk-YOUR_API_KEY_HERE"

# Option B: Configure for a Local or Specific LLM (Example using Ollama via ChatOpenAI)
# Ensure Ollama is running (e.g., `ollama run llama3`)
# ollama_llm = ChatOpenAI(
#     model="llama3", # Or whichever model you are running in Ollama
#     base_url="http://localhost:11434/v1",
#     api_key="NA" # Standard practice for Ollama via langchain-openai
# )
# print("LLM Configuration (Example - Ollama): Set up.")
# If using Option B, you'll need to pass `llm=ollama_llm` when creating Agents below.

# Check if the key is set (optional check for Option A)
api_key = os.environ.get("OPENAI_API_KEY")
if api_key:
    print("OPENAI_API_KEY found. Agents will use the default OpenAI LLM unless specified otherwise.")
else:
    print("WARNING: OPENAI_API_KEY environment variable not found.")
    print("Ensure an LLM is configured either via environment variables or by passing an 'llm' object to Agents.")

OPENAI_API_KEY found. Agents will use the default OpenAI LLM unless specified otherwise.


## Setup WebsiteSearchTool

In [9]:
# --- Cell 3: Instantiate Tools ---
search_tool = WebsiteSearchTool()
# If you were using Serper:
# from crewai_tools import SerperDevTool
# os.environ["SERPER_API_KEY"] = "YOUR_SERPER_KEY" # Set environment variable
# search_tool = SerperDevTool()

print("Tools instantiated (WebsiteSearchTool).")

Tools instantiated (WebsiteSearchTool).


In [10]:
# --- Cell 4: Define Agent Creation Class ---
class MLflowIntegrationAgents:
    def trend_scout_agent(self, llm_config=None): # Pass LLM config if not relying on default
        return Agent(
            role="AI Technology Trend Scout",
            goal=dedent(
                """Identify a specific, promising new tool, library, or technique
                within a given area of AI/ML technology."""
            ),
            backstory=dedent(
                """An expert researcher constantly scanning the horizon
                for emerging AI/ML technologies, publications, and popular
                open-source projects. You prioritize novelty and potential impact."""
            ),
            tools=[search_tool],
            allow_delegation=False,
            verbose=True,
            llm=llm_config # Pass the specific LLM object here if needed
        )

    def technical_analyst_agent(self, llm_config=None): # Pass LLM config if not relying on default
        return Agent(
            role="AI Technical Analyst",
            goal=dedent(
                """Analyze the technical capabilities, architecture, pros, cons,
                and primary use cases of a specific AI tool or technique."""
            ),
            backstory=dedent(
                """A meticulous engineer who dives deep into documentation,
                tutorials, and technical blogs to understand how technologies
                work under the hood. You focus on practical implementation details."""
            ),
            tools=[search_tool],
            allow_delegation=False,
            verbose=True,
            llm=llm_config # Pass the specific LLM object here if needed
        )

    def mlflow_integration_assessor_agent(self, llm_config=None): # Pass LLM config if not relying on default
        return Agent(
            role="MLflow Integration Assessor",
            goal=dedent(
                """Assess how a given AI tool/technique could be integrated
                into a standard MLflow workflow. Identify potential logging points,
                artifact types, customizability needs, or existing MLflow plugins."""
            ),
            backstory=dedent(
                """A seasoned MLOps engineer with deep expertise in MLflow.
                You understand the ML lifecycle and how various components
                can be tracked and managed using MLflow runs, artifacts, models,
                and parameters. You think practically about integration points."""
            ),
            tools=[search_tool], # May need search for finding MLflow plugins
            allow_delegation=False,
            verbose=True,
            llm=llm_config # Pass the specific LLM object here if needed
        )

print("MLflowIntegrationAgents class defined.")
# Example instantiation check (if using Option B from Cell 2):
# agent_creator = MLflowIntegrationAgents()
# scout = agent_creator.trend_scout_agent(llm_config=ollama_llm)
# print(scout) # Check agent creation

MLflowIntegrationAgents class defined.


In [11]:
# --- Cell 5: Define Task Creation Class ---
class MLflowIntegrationTasks:
    # Task 1: No context needed as it's the first step
    def scout_task(self, agent, area_of_interest):
        return Task(
            description=dedent(
                f"""
                Identify one specific, noteworthy, and relatively new open-source tool,
                library, or technique related to '{area_of_interest}'.
                Provide its name and a brief (1-2 sentence) description of what it does.
                Focus on things gaining traction or offering novel capabilities.
                Avoid general concepts; find a concrete example.

                Example Areas: Vector Databases, LLM Serving Frameworks,
                               Data Versioning for ML, Explainable AI Libraries.

                Your final output MUST be the name of the tool/technique and the brief description ONLY.
            """
            ),
            agent=agent,
            expected_output="The name of a specific tool/library/technique and a 1-2 sentence description.",
            # cache=True # Optional: Cache the output of this task
        )

    # Task 2: Depends on the output of scout_task
    def analyze_task(self, agent, area_of_interest, context_task): # Added context_task parameter
        return Task(
            description=dedent(
                f"""
                Based on the tool/technique identified in the previous step for the area '{area_of_interest}',
                perform a technical analysis. Research its core features, how it works
                (high-level architecture if possible), main benefits, and potential drawbacks or limitations.
                Use web search to find its documentation, tutorials, or technical articles.

                Your final output must be a bulleted list summarizing:
                - Core Features
                - How it Works (Briefly)
                - Key Benefits
                - Potential Drawbacks/Limitations
            """
            ),
            agent=agent,
            expected_output="Bulleted list summarizing features, workings, benefits, and drawbacks.",
            context=[context_task] # Explicitly state dependency on the previous task
            # cache=True # Optional: Cache the output of this task
        )

    # Task 3: Depends on the output of analyze_task
    def assess_mlflow_integration_task(self, agent, context_task): # Added context_task parameter
        return Task(
            description=dedent(
                f"""
                Considering the identified tool/technique and its technical analysis from the previous steps:
                Assess how this tool/technique could be integrated with or tracked by MLflow.
                Think about the typical ML lifecycle (data prep, training, evaluation, deployment, monitoring).
                Specifically suggest:
                1.  What parameters related to this tool could be logged to MLflow?
                2.  What metrics could be tracked?
                3.  What kind of artifacts could be logged (e.g., config files, model files specific to the tool, evaluation plots)?
                4.  Are there any known MLflow plugins or standard integration patterns? (Perform a quick search if unsure).
                5.  What are the key challenges or considerations for integration?

                Your final output must be a report addressing these 5 points clearly.
                Focus solely on the MLflow integration aspect.
            """
            ),
            agent=agent,
            expected_output="A report detailing potential MLflow integration points (params, metrics, artifacts), known integrations, and challenges.",
            context=[context_task] # Explicitly state dependency on the previous task
        )

print("MLflowIntegrationTasks class defined.")

MLflowIntegrationTasks class defined.


In [17]:
# --- Cell 6: Define the Crew Class (with MLflow additions) ---
import mlflow
import time
import json # Needed for logging dicts
from crewai import __version__ as crewai_version # Get CrewAI version

class MLflowIntegrationCrew:
    def __init__(self, area_of_interest):
        self.area_of_interest = area_of_interest
        # Determine LLM config (as before)
        self.llm_config = None # Default to env vars

        # Store agent/task definitions for logging
        self.agents_config = {}
        self.tasks_config = {}

    def _log_configs(self):
        """Logs agent and task configurations as artifacts."""
        print("Logging configurations to MLflow...")
        # Log agent configs
        for name, agent_obj in self.agents_config.items():
            # Convert Agent object to a dictionary (might need refinement based on Agent structure)
            # Simple example: extracting key attributes
            agent_dict = {
                "role": agent_obj.role,
                "goal": agent_obj.goal,
                "backstory": agent_obj.backstory,
                "tools": [tool.name for tool in agent_obj.tools] if agent_obj.tools else [],
                "llm": str(agent_obj.llm) if agent_obj.llm else "Default",
                "verbose": agent_obj.verbose,
                "allow_delegation": agent_obj.allow_delegation
            }
            mlflow.log_dict(agent_dict, f"agent_configs/{name}_config.json")

        # Log task configs
        for name, task_obj in self.tasks_config.items():
            task_dict = {
                "description": task_obj.description,
                "expected_output": task_obj.expected_output,
                "agent": task_obj.agent.role if task_obj.agent else "N/A", # Log agent role
                 # Context might be complex to serialize, log its task name/description if possible
                "context_task_descriptions": [ctx.description for ctx in task_obj.context] if task_obj.context else [],
            }
            mlflow.log_dict(task_dict, f"task_configs/{name}_config.json")

    def run(self):
        start_time = time.time()
        final_result = None
        status = "FAILURE" # Default status

        # --- MLflow Run Start ---
        # You might want to set experiment name outside the class or pass it in
        # mlflow.set_experiment("CrewAI MLflow Demo")
        with mlflow.start_run(run_name=f"CrewAI_Scout_{self.area_of_interest[:30]}") as run:
            print(f"MLflow Run started: {run.info.run_id}")

            # 1. Log Parameters & Tags
            print("Logging parameters and tags...")
            mlflow.log_param("area_of_interest", self.area_of_interest)
            mlflow.log_param("crew_process", "sequential") # Assuming sequential
            mlflow.log_param("crewai_version", crewai_version)
            mlflow.set_tag("crew_input_area", self.area_of_interest)
            mlflow.set_tag("crewai_version", crewai_version)
            # Add LLM info if available/consistent
            # llm_model_name = self.llm_config.model_name if self.llm_config else "Default"
            # mlflow.log_param("llm_model", llm_model_name)
            # mlflow.set_tag("llm_model", llm_model_name)

            # Instantiate agents & tasks, store their configs
            agent_creator = MLflowIntegrationAgents()
            task_creator = MLflowIntegrationTasks()

            self.agents_config = {
                "trend_scout": agent_creator.trend_scout_agent(self.llm_config),
                "tech_analyst": agent_creator.technical_analyst_agent(self.llm_config),
                "mlflow_assessor": agent_creator.mlflow_integration_assessor_agent(self.llm_config),
            }
            # Assign agents to local variables for clarity if needed
            trend_scout = self.agents_config["trend_scout"]
            tech_analyst = self.agents_config["tech_analyst"]
            mlflow_assessor = self.agents_config["mlflow_assessor"]

            # Define tasks sequentially and store configs
            scout_task = task_creator.scout_task(trend_scout, self.area_of_interest)
            analyze_task = task_creator.analyze_task(tech_analyst, self.area_of_interest, context_task=scout_task)
            assess_task = task_creator.assess_mlflow_integration_task(mlflow_assessor, context_task=analyze_task)

            self.tasks_config = {
                "scout_task": scout_task,
                "analyze_task": analyze_task,
                "assess_task": assess_task
            }

            # Log Agent/Task Counts
            mlflow.log_metric("agent_count", len(self.agents_config))
            mlflow.log_metric("task_count", len(self.tasks_config))
            mlflow.set_tag("crew_agents", "|".join(self.agents_config.keys()))

            # Log full configs as artifacts
            self._log_configs()

            # Form the crew
            crew = Crew(
                agents=list(self.agents_config.values()),
                tasks=list(self.tasks_config.values()),
                process=Process.sequential,
                verbose=True, # Keep verbose True to capture logs if needed
                # llm=self.llm_config # Optional crew-level LLM
            )

            # 2. Execute the Crew & Capture Output/Logs
            print("\nAttempting to kickoff the crew...")
            try:
                # --- Capture verbose output ---
                # This part can be tricky in standard notebooks.
                # Option A: Use a context manager if available (e.g., from libraries like `io`, `contextlib`)
                # Option B: Run as script and redirect stdout/stderr (easier)
                # Option C: Rely on parsing the 'final_result' if it contains verbose output (less common now)

                # Simple approach: run and hope final_result is useful, log separately if needed
                final_result = crew.kickoff()
                status = "SUCCESS"
                print("Crew kickoff successful.")

            except Exception as e:
                status = "FAILURE"
                final_result = f"Error Type: {type(e).__name__}\nError Details: {e}\n\nTraceback:\n{traceback.format_exc()}"
                print(f"\n--- Crew Execution Failed ---")
                print(final_result)
                # Optionally re-raise the exception if you want the notebook cell to fail
                # raise e
            finally:
                # 3. Log Results & Metrics
                end_time = time.time()
                execution_time = end_time - start_time
                print(f"Execution Time: {execution_time:.2f} seconds")
                print("Logging results and metrics...")

                mlflow.log_metric("execution_time_seconds", execution_time)
                mlflow.set_tag("crew_status", status)
                mlflow.log_metric("success", 1 if status == "SUCCESS" else 0)

                # Log final output as text artifact
                if isinstance(final_result, str):
                    mlflow.log_text(final_result, "final_output.txt")
                else:
                     # Try logging as JSON if it's dict-like, else convert to string
                    try:
                        mlflow.log_dict(dict(final_result), "final_output.json") # Requires result implement items()
                    except:
                         mlflow.log_text(str(final_result), "final_output.txt")

                # --- How to log intermediate steps/verbose log? ---
                # If running as a script redirecting output:
                # with open("execution_log.txt", "r") as f:
                #    mlflow.log_artifact("execution_log.txt")

                # Placeholder - Add logic here if you capture verbose logs some other way
                print("Placeholder: Add logic to capture and log verbose execution output if needed.")

                print(f"MLflow Run completed: {run.info.run_id}")

        return final_result, status # Return status along with result




Running the crew with MLflow Tracing...
MLflow Run started: 6e32160883d942f49e54bf3dcf5e1b45
Logging parameters and tags...
Logging configurations to MLflow...





Attempting to kickoff the crew...


[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Task:[00m [92m
Identify one specific, noteworthy, and relatively new open-source tool,
library, or technique related to 'Explainable AI (XAI) libraries for Transformers'.
Provide its name and a brief (1-2 sentence) description of what it does.
Focus on things gaining traction or offering novel capabilities.
Avoid general concepts; find a concrete example.

Example Areas: Vector Databases, LLM Serving Frameworks,
               Data Versioning for ML, Explainable AI Libraries.

Your final output MUST be the name of the tool/technique and the brief description ONLY.
[00m


[91m 

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://github.com\"}"[00m
[95m## Tool Output:[00m [92m

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific websi

[91m 

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://github.com\"}"[00m
[95m## Tool Output:[00m [92m

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific websi



[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://arxiv.org\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
announcements from the TensorFlow team and community. Explore the ecosystem Discover production-tested tools to accelerate modeling, deployment, and other workflows. Library TensorFlow.js Train and run models directly in the browser using JavaScript or Node.js. Library LiteRT Deploy ML on mobile and edge devices such as Android, iOS, Raspberry Pi, and Edge TPU. API tf.data Preprocess data and create input pipelines for ML models. Library TFX Create production ML pipelines and implement MLOps best practices. API tf.keras Create ML models with TensorFlow's high-level API. Resource Kaggle Models Find pre-trained models ready for fine-tuning and deployment. Resource TensorFlow

[91m 

I encountered an error while trying to use the tool. This was the error: 502 Server Error: Bad Gateway for url: https://towardsdatascience.com/.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI library Transformers\", \"website\": \"https://towardsdatascience.com\"}"[00m
[95m## Tool Output:[00m [92m

I encountered an error while trying to use the tool. This was the error: 502 Server Error: Bad Gateway fo



[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI Transformer\", \"website\": \"https://www.kdnuggets.com\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
The latest in Machine Learning | Papers With Code Subscribe to the PwC Newsletter × Join the community × You need to log in to edit. You can create a new account if you don't have one. Top New Greatest Trending Research Subscribe InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity bytedance/infiniteyou • • 20 Mar 2025 Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. Image Generation 1,262 3.92 stars / hour Paper Code VGGT: Visual Geometry Grounded Transformer facebookresearch/vggt • • 14 Mar 2025 We present VGGT, a feed-forward neural network that dire

[91m 

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI Transformer libraries\", \"website\": \"https://www.github.com\"}"[00m
[95m## Tool Output:[00m [92m

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific websit

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  2.58it/s]




[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI Transformers\", \"website\": \"https://www.researchgate.net\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
The latest in Machine Learning | Papers With Code Subscribe to the PwC Newsletter × Join the community × You need to log in to edit. You can create a new account if you don't have one. Top New Greatest Trending Research Subscribe InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity bytedance/infiniteyou • • 20 Mar 2025 Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. Image Generation 1,262 3.92 stars / hour Paper Code VGGT: Visual Geometry Grounded Transformer facebookresearch/vggt • • 14 Mar 2025 We present VGGT, a feed-forward neural network that 

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  2.89it/s]




[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://www.semanticscholar.org\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
announcements from the TensorFlow team and community. Explore the ecosystem Discover production-tested tools to accelerate modeling, deployment, and other workflows. Library TensorFlow.js Train and run models directly in the browser using JavaScript or Node.js. Library LiteRT Deploy ML on mobile and edge devices such as Android, iOS, Raspberry Pi, and Edge TPU. API tf.data Preprocess data and create input pipelines for ML models. Library TFX Create production ML pipelines and implement MLOps best practices. API tf.keras Create ML models with TensorFlow's high-level API. Resource Kaggle Models Find pre-trained models ready for fine-tuning and deployment. Resou

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  3.31it/s]




[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI Transformers\", \"website\": \"https://huggingface.co\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
The latest in Machine Learning | Papers With Code Subscribe to the PwC Newsletter × Join the community × You need to log in to edit. You can create a new account if you don't have one. Top New Greatest Trending Research Subscribe InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity bytedance/infiniteyou • • 20 Mar 2025 Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. Image Generation 1,262 3.92 stars / hour Paper Code VGGT: Visual Geometry Grounded Transformer facebookresearch/vggt • • 14 Mar 2025 We present VGGT, a feed-forward neural network that direct



[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI Transformers\", \"website\": \"https://huggingface.co\"}"[00m
[95m## Tool Output:[00m [92m
I tried reusing the same input, I must stop using this action input. I'll try something else instead.




You ONLY have access to the following tools, and should NEVER make up tools that are not listed here:

Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.

IMPORTANT: Use the following format in your response:

```
Thought: you should always think about what to do
Action: the action 



[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI Transformers\", \"website\": \"https://arxiv.org\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
The latest in Machine Learning | Papers With Code Subscribe to the PwC Newsletter × Join the community × You need to log in to edit. You can create a new account if you don't have one. Top New Greatest Trending Research Subscribe InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity bytedance/infiniteyou • • 20 Mar 2025 Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. Image Generation 1,262 3.92 stars / hour Paper Code VGGT: Visual Geometry Grounded Transformer facebookresearch/vggt • • 14 Mar 2025 We present VGGT, a feed-forward neural network that directly in

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  1.41it/s]




[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI Transformers\", \"website\": \"https://www.microsoft.com/en-us/research/\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
The latest in Machine Learning | Papers With Code Subscribe to the PwC Newsletter × Join the community × You need to log in to edit. You can create a new account if you don't have one. Top New Greatest Trending Research Subscribe InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity bytedance/infiniteyou • • 20 Mar 2025 Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. Image Generation 1,262 3.92 stars / hour Paper Code VGGT: Visual Geometry Grounded Transformer facebookresearch/vggt • • 14 Mar 2025 We present VGGT, a feed-forward neural 



[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI Transformers\", \"website\": \"https://www.kdnuggets.com\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
The latest in Machine Learning | Papers With Code Subscribe to the PwC Newsletter × Join the community × You need to log in to edit. You can create a new account if you don't have one. Top New Greatest Trending Research Subscribe InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity bytedance/infiniteyou • • 20 Mar 2025 Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. Image Generation 1,262 3.92 stars / hour Paper Code VGGT: Visual Geometry Grounded Transformer facebookresearch/vggt • • 14 Mar 2025 We present VGGT, a feed-forward neural network that dir

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  2.03it/s]




[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI Transformers libraries\", \"website\": \"https://www.nvidia.com/en-us/research/\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
The latest in Machine Learning | Papers With Code Subscribe to the PwC Newsletter × Join the community × You need to log in to edit. You can create a new account if you don't have one. Top New Greatest Trending Research Subscribe InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity bytedance/infiniteyou • • 20 Mar 2025 Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. Image Generation 1,262 3.92 stars / hour Paper Code VGGT: Visual Geometry Grounded Transformer facebookresearch/vggt • • 14 Mar 2025 We present VGGT, a feed-forward 

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  3.08it/s]




[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://doi.org\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
announcements from the TensorFlow team and community. Explore the ecosystem Discover production-tested tools to accelerate modeling, deployment, and other workflows. Library TensorFlow.js Train and run models directly in the browser using JavaScript or Node.js. Library LiteRT Deploy ML on mobile and edge devices such as Android, iOS, Raspberry Pi, and Edge TPU. API tf.data Preprocess data and create input pipelines for ML models. Library TFX Create production ML pipelines and implement MLOps best practices. API tf.keras Create ML models with TensorFlow's high-level API. Resource Kaggle Models Find pre-trained models ready for fine-tuning and deployment. Resource TensorFlow D



[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Final Answer:[00m [92m
I was unable to identify a specific, noteworthy, and relatively new open-source tool, library, or technique related to Explainable AI (XAI) libraries for Transformers.[00m




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Task:[00m [92m
Based on the tool/technique identified in the previous step for the area 'Explainable AI (XAI) libraries for Transformers',
perform a technical analysis. Research its core features, how it works
(high-level architecture if possible), main benefits, and potential drawbacks or limitations.
Use web search to find its documentation, tutorials, or technical articles.

Your final output must be a bulleted list summarizing:
- Core Features
- How it Works (Briefly)
- Key Benefits
- Potential Drawbacks/Limitations
[00m


Inserting batches in chromadb: 100%|████████████████████████████████████████████| 4/4 [00:06<00:00,  1.54s/it]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mI need to find information on Explainable AI (XAI) libraries specifically designed for Transformers. I will look for documentation, tutorials, or technical articles focusing on these tools.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://huggingface.co/docs\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
is designed to be functionally equivalent to Hugging Face’s transformers python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as: 📝 Natural Language Processing : text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation. 🖼️ Computer Vision : image classificatio

[91m 

I encountered an error while trying to use the tool. This was the error: 502 Server Error: Bad Gateway for url: https://towardsdatascience.com/.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mThought: I need to gather more specific information regarding the explainable AI libraries for transformers. I will search for further resources that focus on any existing libraries of this type.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainab

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  1.22it/s]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mThought: I need to continue searching for additional information about Explainable AI libraries for Transformers to fulfill the requirement of analyzing core features, workings, benefits, and limitations.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://arxiv.org/\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
is designed to be functionally equivalent to Hugging Face’s transformers python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as: 📝 Natural Language Processing : text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation. 🖼️ Computer Vision : image classif

[91m 

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mThought: I need to perform another search to collect more focused information on Explainable AI libraries for Transformers, particularly ones that detail their architecture, benefits, and limitations.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  2.61it/s]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mThought: I need to continue searching for concrete resources regarding Explainable AI libraries for Transformers to identify their features, workings, benefits, and drawbacks.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://medium.com/\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
is designed to be functionally equivalent to Hugging Face’s transformers python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as: 📝 Natural Language Processing : text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation. 🖼️ Computer Vision : image classification, object detection, s

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  1.45it/s]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mThought: I need to search for more relevant literature on Explainable AI libraries for Transformers in order to collect comprehensive information on their features, workings, benefits, and limitations.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://www.analyticsvidhya.com/\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
is designed to be functionally equivalent to Hugging Face’s transformers python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as: 📝 Natural Language Processing : text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation. 🖼️ Computer Vision : im

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  4.62it/s]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mThought: As I haven't been able to gather substantial information yet, I will conduct one more search to find specific resources on Explainable AI libraries for Transformers.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://www.researchgate.net/\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
is designed to be functionally equivalent to Hugging Face’s transformers python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as: 📝 Natural Language Processing : text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation. 🖼️ Computer Vision : image classification, object det

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  4.60it/s]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mThought: It seems that I am facing challenges in retrieving specific information about Explainable AI libraries for Transformers. I need to make a final attempt to collect useful insights by searching on another platform that may have relevant documentation or articles.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries Transformer models\", \"website\": \"https://www.semanticscholar.org/\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
is designed to be functionally equivalent to Hugging Face’s transformers python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as: 📝 Natural Language Processing : text classification, named entity recognition, question answering, language modeling, summarization, tra

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  2.71it/s]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mThought: I have exhausted multiple sources to find specific information about Explainable AI libraries for Transformers. It may be better to synthesize the known capabilities and general insights based on prior knowledge and the information gathered.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Explainable AI libraries for Transformers\", \"website\": \"https://www.microsoft.com/en-us/research\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
is designed to be functionally equivalent to Hugging Face’s transformers python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as: 📝 Natural Language Processing : text classification, named entity recognition, question answering, language modeling, summarization, translation, mult



[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Final Answer:[00m [92m
Here is a summary of the general aspects of Explainable AI (XAI) libraries for Transformers:

- **Core Features:**
  - Provides model interpretability for transformer-based architectures.
  - Incorporation of techniques such as Layer-wise Relevance Propagation (LRP), SHAP values, and LIME.
  - User-friendly APIs consistent with existing transformer libraries (e.g., Hugging Face Transformers).
  - Supports visualization tools to enhance understanding of model decisions.
  - Compatibility with multi-modal data sources like text and images.

- **How it Works (Briefly):**
  - XAI libraries typically integrate with transformers to provide an explanation interface. When a model makes a prediction, these libraries analyze and decompose the output to highlight input features that influenced the result, often visualizing these contributions. This explanation can involve statistical measures and relevanc

[1m[95m# Agent:[00m [1m[92mMLflow Integration Assessor[00m
[95m## Task:[00m [92m
Considering the identified tool/technique and its technical analysis from the previous steps:
Assess how this tool/technique could be integrated with or tracked by MLflow.
Think about the typical ML lifecycle (data prep, training, evaluation, deployment, monitoring).
Specifically suggest:
1.  What parameters related to this tool could be logged to MLflow?
2.  What metrics could be tracked?
3.  What kind of artifacts could be logged (e.g., config files, model files specific to the tool, evaluation plots)?
4.  Are there any known MLflow plugins or standard integration patterns? (Perform a quick search if unsure).
5.  What are the key challenges or considerations for integration?

Your final output must be a report addressing these 5 points clearly.
Focus solely on the MLflow integration aspect.
[00m


Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  1.89it/s]




[1m[95m# Agent:[00m [1m[92mMLflow Integration Assessor[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"MLflow integration Explainable AI libraries for transformers\", \"website\": \"https://www.mlflow.org/\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
The library is integrated with 🤗 transformers . You can also explore TRL-related models, datasets, and demos in the TRL Hugging Face organization . Learn Learn post-training with TRL and other libraries in 🤗 smol course . Contents The documentation is organized into the following sections: Getting Started : installation and quickstart guide. Conceptual Guides : dataset formats, training FAQ, and understanding logs. How-to Guides : reducing memory usage, speeding up training, distributing

is designed to be functionally equivalent to Hugging Face’s transformers python library, meaning you can run the same pretrained models using a very similar API



[1m[95m# Agent:[00m [1m[92mMLflow Integration Assessor[00m
[95m## Final Answer:[00m [92m
**Report on Integration of Explainable AI with MLflow**

1. **Parameters to Log to MLflow:**
   - **Model Parameters**: Configuration settings for transformer models such as number of layers, hidden units, dropout rate, and optimizer specifications.
   - **Explanation Method Parameters**: Settings for XAI techniques used, e.g., parameters for SHAP (number of samples, kernel approximation), LIME (perturbation type, sample size), or LRP (layer selection).
   - **Preprocessing Parameters**: Tokenization settings, padding, and truncation options used for input data preparation.

2. **Metrics to Track:**
   - **Model Performance Metrics**: Accuracy, F1-score, precision, recall, ROC-AUC for evaluating model predictions.
   - **Explanation Fidelity Metrics**: Metrics to quantify the quality of explanations provided, such as LIME’s stability score or SHAP’s feature importance scores.
   - **Comp

Crew kickoff successful.
Execution Time: 119.47 seconds
Logging results and metrics...
Placeholder: Add logic to capture and log verbose execution output if needed.
MLflow Run completed: 6e32160883d942f49e54bf3dcf5e1b45
🏃 View run CrewAI_Scout_Explainable AI (XAI) libraries at: https://public-tracking.mlflow-e00e997y0znyf15mwg.backbone-e00rt16p71ew33dg9e.msp.eu-north1.nebius.cloud/#/experiments/12/runs/6e32160883d942f49e54bf3dcf5e1b45
🧪 View experiment at: https://public-tracking.mlflow-e00e997y0znyf15mwg.backbone-e00rt16p71ew33dg9e.msp.eu-north1.nebius.cloud/#/experiments/12


########################
## Crew Run Status: SUCCESS ##
########################

Final Output (also logged to MLflow):
**Report on Integration of Explainable AI with MLflow**

1. **Parameters to Log to MLflow:**
   - **Model Parameters**: Configuration settings for transformer models such as number of layers, hidden units, dropout rate, and optimizer specifications.
   - **Explanation Method Parameters**: Setti

### Run 

#### Suggested area Topics:

LLM Focused:

- "Tools for fine-tuning open-source LLMs" (e.g., Axolotl, Unsloth, libraries for PEFT)
- "Frameworks for evaluating LLM outputs" (e.g., Ragas, DeepEval, TruLens)
- "LLM Observability platforms" (e.g., LangSmith, Helicone, Weights & Biases integrations)
- "Prompt engineering & management toolkits" (e.g., Promptfoo, LangChain/LlamaIndex prompt templates, specialized libraries)
- "Quantization libraries for LLMs" (e.g., AutoGPTQ, bitsandbytes wrappers, specific toolkits)

RAG Focused:
- "Scalable Vector Databases for RAG" (e.g., Qdrant, Weaviate, Milvus alternatives or new entrants)
- "Frameworks implementing advanced RAG techniques" (e.g., Self-RAG, Corrective RAG, ReAct pattern libraries)
- "Evaluation frameworks specifically for RAG pipelines" (e.g., Ragas, TruLens focus on RAG metrics)

Agentic AI Focused:
- "Alternative AI Agent frameworks to LangChain/CrewAI" (e.g., Autogen, BabyAGI variants, new research frameworks)
- "Tools for creating and managing agent actions/tools" (Libraries simplifying tool definition and secure execution)
- "Frameworks for multi-agent system orchestration" (e.g., Autogen, specialized simulation environments)

General GenAI Dev Tools:
- "Orchestration tools for complex LLM/RAG pipelines" (e.g., Kestra, Dagster integrations for LLMs)
- "Synthetic data generation tools using LLMs" (Libraries focused on generating structured/unstructured data)

Tips for Choosing:
- Specificity: More specific topics often yield better results than very broad ones.
- Recency: Focus on areas where new tools are actively emerging.
- MLflow Relevance: While all can be tracked, areas like "Observability," "Evaluation," "Fine-tuning," and "Orchestration" have very direct conceptual links to MLflow's core purpose.


In [19]:
# --- Cell 8: Run the Crew and Display Results (Modified) ---
import traceback # Needed for logging exception tracebacks

print("Running the crew with MLflow Tracing...")
area = "Evaluation frameworks specifically for RAG pipelines"
integration_crew = MLflowIntegrationCrew(area_of_interest=area)

# Run and capture status
final_result, run_status = integration_crew.run()

print("\n\n########################")
print(f"## Crew Run Status: {run_status} ##")
print("########################\n")

print("Final Output (also logged to MLflow):")
print(final_result)

# You can now go to the MLflow UI (usually `mlflow ui` in terminal) to see the run details.

Running the crew with MLflow Tracing...
MLflow Run started: 5150278cd9af488596c2557c538da876
Logging parameters and tags...
Logging configurations to MLflow...

Attempting to kickoff the crew...


[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Task:[00m [92m
Identify one specific, noteworthy, and relatively new open-source tool,
library, or technique related to 'Evaluation frameworks specifically for RAG pipelines'.
Provide its name and a brief (1-2 sentence) description of what it does.
Focus on things gaining traction or offering novel capabilities.
Avoid general concepts; find a concrete example.

Example Areas: Vector Databases, LLM Serving Frameworks,
               Data Versioning for ML, Explainable AI Libraries.

Your final output MUST be the name of the tool/technique and the brief description ONLY.
[00m


[91m 

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Evaluation frameworks for RAG pipelines\", \"website\": \"https://github.com\"}"[00m
[95m## Tool Output:[00m [92m

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website

[91m 

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Evaluation frameworks for RAG pipelines\", \"website\": \"https://github.com\"}"[00m
[95m## Tool Output:[00m [92m

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website

[91m 

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Evaluation frameworks for RAG pipelines\", \"website\": \"https://github.com\"}"[00m
[95m## Tool Output:[00m [92m

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:01<00:00,  1.14s/it]




[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Evaluation frameworks for RAG pipelines\", \"website\": \"https://pypi.org\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Evaluate documentation 🤗 Evaluate Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Datasets Diffusers Distilabel Evaluate Gradio Hub Hub Python Library Hugging Face Generative AI Services (HUGS) Huggingface.js Inference API (serverless) Inference Endpoints (dedicated) Leaderboards Lighteval Optimum PEFT Safetensors Sentence Transformers TRL Tasks Text Embeddings Inference Text Generation Inference

toolbox that supports both. Our library is designed with a focus on usability over performance , simple over easy , and customizability over abstractions . The library has three main co

[91m 

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"evaluation frameworks RAG pipelines\", \"website\": \"https://github.com\"}"[00m
[95m## Tool Output:[00m [92m

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Too



[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"evaluation frameworks RAG pipelines\", \"website\": \"https://arxiv.org\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Evaluate documentation 🤗 Evaluate Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Datasets Diffusers Distilabel Evaluate Gradio Hub Hub Python Library Hugging Face Generative AI Services (HUGS) Huggingface.js Inference API (serverless) Inference Endpoints (dedicated) Leaderboards Lighteval Optimum PEFT Safetensors Sentence Transformers TRL Tasks Text Embeddings Inference Text Generation Inference

toolbox that supports both. Our library is designed with a focus on usability over performance , simple over easy , and customizability over abstractions . The library has three main compo

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  3.47it/s]




[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"evaluation frameworks RAG pipelines\", \"website\": \"https://paperswithcode.com\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Evaluate documentation 🤗 Evaluate Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Datasets Diffusers Distilabel Evaluate Gradio Hub Hub Python Library Hugging Face Generative AI Services (HUGS) Huggingface.js Inference API (serverless) Inference Endpoints (dedicated) Leaderboards Lighteval Optimum PEFT Safetensors Sentence Transformers TRL Tasks Text Embeddings Inference Text Generation Inference

toolbox that supports both. Our library is designed with a focus on usability over performance , simple over easy , and customizability over abstractions . The library has three m



[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"evaluation frameworks for RAG pipelines\", \"website\": \"https://arxiv.org\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Evaluate documentation 🤗 Evaluate Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Datasets Diffusers Distilabel Evaluate Gradio Hub Hub Python Library Hugging Face Generative AI Services (HUGS) Huggingface.js Inference API (serverless) Inference Endpoints (dedicated) Leaderboards Lighteval Optimum PEFT Safetensors Sentence Transformers TRL Tasks Text Embeddings Inference Text Generation Inference

toolbox that supports both. Our library is designed with a focus on usability over performance , simple over easy , and customizability over abstractions . The library has three main c

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  1.45it/s]




[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"evaluation frameworks RAG\", \"website\": \"https://huggingface.co\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Evaluate documentation 🤗 Evaluate Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Datasets Diffusers Distilabel Evaluate Gradio Hub Hub Python Library Hugging Face Generative AI Services (HUGS) Huggingface.js Inference API (serverless) Inference Endpoints (dedicated) Leaderboards Lighteval Optimum PEFT Safetensors Sentence Transformers TRL Tasks Text Embeddings Inference Text Generation Inference

between documentation themes Sign Up to get started TRL - Transformer Reinforcement Learning TRL is a full stack library where we provide a set of tools to train transformer language models wit



[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"evaluation framework RAG pipelines\", \"website\": \"https://huggingface.co\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
toolbox that supports both. Our library is designed with a focus on usability over performance , simple over easy , and customizability over abstractions . The library has three main components: State-of-the-art diffusion pipelines for inference with just a few lines of code. There are many pipelines in 🤗 Diffusers, check out the table in the pipeline overview for a complete list of available pipelines and the task they solve. Interchangeable noise schedulers for balancing trade-offs between

pipeline for sentiment-analysis pipe = pipeline( 'sentiment-analysis' )

Evaluate documentation 🤗 Evaluate Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker

[91m 

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
Tool Arguments: {'search_query': {'description': 'Mandatory search query you want to use to search a specific website', 'type': 'str'}, 'website': {'description': 'Mandatory valid website URL you want to search on', 'type': 'str'}}
Tool Description: A tool that can be used to semantic search a query from a specific URL content.
[00m


[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"evaluation frameworks RAG open-source\", \"website\": \"https://github.com\"}"[00m
[95m## Tool Output:[00m [92m

I encountered an error while trying to use the tool. This was the error: GITHUB.
 Tool Search in a specific website accepts these inputs: Tool Name: Search in a specific website
T

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  3.29it/s]




[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"RAG evaluation framework tools\", \"website\": \"https://www.arxiv.org\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Evaluate documentation 🤗 Evaluate Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Datasets Diffusers Distilabel Evaluate Gradio Hub Hub Python Library Hugging Face Generative AI Services (HUGS) Huggingface.js Inference API (serverless) Inference Endpoints (dedicated) Leaderboards Lighteval Optimum PEFT Safetensors Sentence Transformers TRL Tasks Text Embeddings Inference Text Generation Inference

models and datasets. With a single line of code, you get access to dozens of evaluation methods for different domains (NLP, Computer Vision, Reinforcement Learning, and more!). Be it on you



[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"open-source evaluation framework RAG\", \"website\": \"https://huggingface.co\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
HUGS make it easy to keep your applications at the cutting edge of Generative AI by offering updates when new battle-tested open models become available. Built for Open Models Compatible with a wide range of popular open AI models, including: LLMs: Llama, Gemma, Mistral, Mixtral, Qwen, Deepseek (soon), T5 (soon), Yi (soon), Phi (soon), Command R (soon) (Soon) Multimodal Models: Idefics, Llava (Soon) Embedding Models: BGE, GTE, Mixbread, Arctic, Jina, Nomic Getting Started To start using HUGS,

Evaluate documentation 🤗 Evaluate Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Da



[1m[95m# Agent:[00m [1m[92mAI Technology Trend Scout[00m
[95m## Final Answer:[00m [92m
Hugging Face's "Evaluate" library is an emerging tool designed to support diverse evaluation metrics across various domains, including NLP and RAG pipelines. It allows users to consistently and reproducibly evaluate models with a single line of code, providing access to dozens of metrics through an interactive and user-friendly interface.[00m




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Task:[00m [92m
Based on the tool/technique identified in the previous step for the area 'Evaluation frameworks specifically for RAG pipelines',
perform a technical analysis. Research its core features, how it works
(high-level architecture if possible), main benefits, and potential drawbacks or limitations.
Use web search to find its documentation, tutorials, or technical articles.

Your final output must be a bulleted list summarizing:
- Core Features
- How it Works (Briefly)
- Key Benefits
- Potential Drawbacks/Limitations
[00m




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mI need to gather specific information about the "Evaluate" library by Hugging Face, particularly in relation to evaluation frameworks for RAG pipelines. I will search for documentation and technical articles on this library to compile a comprehensive analysis.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Evaluate library site:huggingface.co\", \"website\": \"https://huggingface.co\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Evaluate documentation 🤗 Evaluate Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Datasets Diffusers Distilabel Evaluate Gradio Hub Hub Python Library Hugging Face Generative AI Services (HUGS) Huggingface.js Inference API (serverless) Inference Endpoints (dedicated) Leaderboards Lighteval Opti

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 4/4 [00:05<00:00,  1.41s/it]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mI need to perform another search to find more detailed information specifically about the "Evaluate" library documentation, which I can then analyze for the required insights.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Hugging Face Evaluate library documentation\", \"website\": \"https://huggingface.co/docs/evaluate\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Evaluate documentation Using the `evaluator` Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Datasets Diffusers Distilabel Evaluate Gradio Hub Hub Python Library Hugging Face Generative AI Services (HUGS) Huggingface.js Inference API (serverless) Inference Endpoints (dedicated) Leaderboards Lighteval Optimum PEFT Safetensors Sentence Transformers TRL Tasks 

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  1.49it/s]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mI need to gather specific technical content and documentation about the "Evaluate" library from Hugging Face. It appears I did not get useful detail in my previous searches. Now I need to attempt another search focusing on general technical articles that might discuss the library in-depth.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"technical article Hugging Face Evaluate library\", \"website\": \"https://huggingface.co/blog\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Hugging Face – The AI community building the future. The AI community building the future. The platform where the machine learning community collaborates on models, datasets, and applications. Explore AI Apps or Browse 1M+ models Trending onthis week Models Updated 3 days ago • 60.5k • 1.98k Updated 1 day ago • 27.9k • 813 Updated 9 days ago • 1

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 4/4 [00:04<00:00,  1.21s/it]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Hugging Face Evaluate library features\", \"website\": \"https://huggingface.co/docs/evaluate\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Evaluate documentation Visualization methods Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Datasets Diffusers Distilabel Evaluate Gradio Hub Hub Python Library Hugging Face Generative AI Services (HUGS) Huggingface.js Inference API (serverless) Inference Endpoints (dedicated) Leaderboards Lighteval Optimum PEFT Safetensors Sentence Transformers TRL Tasks Text Embeddings Inference Text Generation

Evaluate documentation Loading methods Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions 

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  1.69it/s]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mI need to continue searching for the specific features and technical details regarding the "Evaluate" library since I haven't gathered sufficient information yet.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Hugging Face Evaluate library\", \"website\": \"https://huggingface.co\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Hugging Face – The AI community building the future. The AI community building the future. The platform where the machine learning community collaborates on models, datasets, and applications. Explore AI Apps or Browse 1M+ models Trending onthis week Models Updated 3 days ago • 60.5k • 1.97k Updated 1 day ago • 27.9k • 810 Updated 9 days ago • 11.6k • 790 Updated 6 days ago • 48.4k • 1.03k Updated 4 days ago • 467 Browse 1M+ models Spaces 541 InfiniteYou-FLUX 📸 Flexible Photo Recrafting While 

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 4/4 [00:05<00:00,  1.46s/it]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mI continue to gather relevant details on the "Evaluate" library, but I still need to acquire specific insights regarding its functionalities, benefits, and limitations. I will make one final search on a comprehensive source to gather the necessary information.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Hugging Face Evaluate library analysis\", \"website\": \"https://huggingface.co/docs/evaluate\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Evaluate documentation Using the `evaluator` Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Datasets Diffusers Distilabel Evaluate Gradio Hub Hub Python Library Hugging Face Generative AI Services (HUGS) Huggingface.js Inference API (serverless) Inference Endpoints (dedicated) 

Inserting batches in chromadb: 100%|████████████████████████████████████████████| 4/4 [00:05<00:00,  1.26s/it]




[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Thought:[00m [92mThought: I am not obtaining the requisite detailed information about the "Evaluate" library from Hugging Face. I will conduct one more specific search on their main documentation site to gather all necessary insights for my analysis.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Overview of Evaluate library\", \"website\": \"https://huggingface.co/docs/evaluate\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
The goal of the 🤗 Evaluate library is to support different types of evaluation, depending on different goals, datasets and models. Here are the types of evaluations that are currently supported with a few examples for each: Metrics A metric measures the performance of a model on a given dataset. This is often based on an existing ground truth (i.e. a set of references), but there are also *referenceless metrics



[1m[95m# Agent:[00m [1m[92mAI Technical Analyst[00m
[95m## Final Answer:[00m [92m
- Core Features:
  - Supports a wide range of evaluation metrics tailored for different types of models and datasets.
  - Allows for both standard evaluation using ground truth data and referenceless evaluation using pretrained models.
  - User-friendly interface for easy access and manipulation of evaluation functions.
  - Ability to create custom evaluation suites for complex evaluations.
  - Integration with existing Hugging Face tools and libraries for seamless model evaluation.

- How it Works (Briefly):
  - The Evaluate library functions by allowing users to input models and datasets, and then apply various metrics to assess model performance.
  - Provides a simple API to load metrics and apply them directly to model outputs.
  - Supports both batch and streaming evaluation, making it versatile for different model testing environments.

- Key Benefits:
  - Simplifies the evaluation proces

[1m[95m# Agent:[00m [1m[92mMLflow Integration Assessor[00m
[95m## Task:[00m [92m
Considering the identified tool/technique and its technical analysis from the previous steps:
Assess how this tool/technique could be integrated with or tracked by MLflow.
Think about the typical ML lifecycle (data prep, training, evaluation, deployment, monitoring).
Specifically suggest:
1.  What parameters related to this tool could be logged to MLflow?
2.  What metrics could be tracked?
3.  What kind of artifacts could be logged (e.g., config files, model files specific to the tool, evaluation plots)?
4.  Are there any known MLflow plugins or standard integration patterns? (Perform a quick search if unsure).
5.  What are the key challenges or considerations for integration?

Your final output must be a report addressing these 5 points clearly.
Focus solely on the MLflow integration aspect.
[00m


Inserting batches in chromadb: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  1.73it/s]




[1m[95m# Agent:[00m [1m[92mMLflow Integration Assessor[00m
[95m## Thought:[00m [92mI need to conduct a search to gather information on potential MLflow integrations, existing plugins, and patterns related to the Evaluate library described in the context provided. This will help me formulate a detailed report addressing all required points.[00m
[95m## Using tool:[00m [92mSearch in a specific website[00m
[95m## Tool Input:[00m [92m
"{\"search_query\": \"Evaluate library MLflow integration\", \"website\": \"https://mlflow.org\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Evaluate documentation Using the `evaluator` with custom pipelines Evaluate 🏡 View all docs AWS Trainium & Inferentia Accelerate Amazon SageMaker Argilla AutoTrain Bitsandbytes Chat UI Competitions Dataset viewer Datasets Diffusers Distilabel Evaluate Gradio Hub Hub Python Library Hugging Face Generative AI Services (HUGS) Huggingface.js Inference API (serverless) Inference Endpoints (dedic



[1m[95m# Agent:[00m [1m[92mMLflow Integration Assessor[00m
[95m## Final Answer:[00m [92m
**Report: Integration of the Evaluate Library with MLflow**

1. **Parameters Related to the Tool to be Logged to MLflow:**
   - `evaluation_metric`: The type of evaluation metric used (e.g., accuracy, F1 score, etc.).
   - `batch_size`: The size of the data batch during evaluation.
   - `model_type`: The type of model being evaluated (e.g., BERT, ResNet).
   - `dataset_name`: The specific dataset used for evaluation in the context of reproducibility.
   - `evaluation_strategy`: Whether the evaluation is ground truth or referenceless.
   - `custom_metrics`: Any custom metrics defined by the user for specific evaluations.

2. **Metrics to be Tracked:**
   - **Standard Metrics**: Accuracy, precision, recall, F1 score, AUC-ROC, etc.
   - **Custom Metrics**: Any user-defined metrics included in the evaluation suite.
   - **Performance Metrics**: Latency and throughput for batch and streaming 

Crew kickoff successful.
Execution Time: 124.86 seconds
Logging results and metrics...
Placeholder: Add logic to capture and log verbose execution output if needed.
MLflow Run completed: 5150278cd9af488596c2557c538da876
🏃 View run CrewAI_Scout_Evaluation frameworks specific at: https://public-tracking.mlflow-e00e997y0znyf15mwg.backbone-e00rt16p71ew33dg9e.msp.eu-north1.nebius.cloud/#/experiments/12/runs/5150278cd9af488596c2557c538da876
🧪 View experiment at: https://public-tracking.mlflow-e00e997y0znyf15mwg.backbone-e00rt16p71ew33dg9e.msp.eu-north1.nebius.cloud/#/experiments/12


########################
## Crew Run Status: SUCCESS ##
########################

Final Output (also logged to MLflow):
**Report: Integration of the Evaluate Library with MLflow**

1. **Parameters Related to the Tool to be Logged to MLflow:**
   - `evaluation_metric`: The type of evaluation metric used (e.g., accuracy, F1 score, etc.).
   - `batch_size`: The size of the data batch during evaluation.
   - `model_

# Example 5: AI Stack Assistant 

In [8]:
# Turn on auto tracing by calling mlflow.crewai.autolog()
mlflow.crewai.autolog()

mlflow.set_experiment("AI-Stack")

2025/04/15 13:40:03 INFO mlflow.tracking.fluent: Experiment with name 'AI-Stack' does not exist. Creating a new experiment.


<Experiment: artifact_location='mlflow-artifacts:/4', creation_time=1744717203269, experiment_id='4', last_update_time=1744717203269, lifecycle_stage='active', name='AI-Stack', tags={}>

In [17]:
from crewai import Agent, Crew, Task, Process
from crewai_tools import SerperDevTool, WebsiteSearchTool
from textwrap import dedent
import os

# Initialize search tools
search_tool = WebsiteSearchTool()
serper_tool = SerperDevTool()  # Requires SERPER_API_KEY in environment variables

class AIToolResearchAgents:
    def researcher_agent(self):
        """Agent responsible for discovering relevant AI tools for specific tasks."""
        return Agent(
            role="AI Tool Researcher",
            goal="Discover and evaluate relevant AI tools (libraries, frameworks) for specific tasks, considering the existing AI stack used in the project.",
            backstory=dedent("""
                You are an AI research specialist with extensive knowledge of the AI ecosystem. 
                Your expertise lies in finding the most appropriate tools for specific tasks, 
                considering compatibility with existing technologies. You excel at evaluating 
                open-source projects, understanding their purpose, and identifying their potential 
                use cases. You always back your recommendations with solid research.
            """),
            tools=[search_tool, serper_tool],
            verbose=True,
            allow_delegation=False,
            max_iter=3
        )

    def analyst_agent(self):
        """Agent responsible for analyzing tools and creating detailed reports."""
        return Agent(
            role="Technical Analyst",
            goal="Thoroughly analyze AI tools and prepare comprehensive reports highlighting strengths, weaknesses, and integration opportunities.",
            backstory=dedent("""
                You are a technical expert specializing in AI technologies evaluation. 
                You have a deep understanding of AI libraries and frameworks, and can 
                quickly assess their technical merits. Your analytical skills allow you 
                to identify pros and cons of each tool, and you're known for creating 
                clear, well-structured reports that help teams make informed decisions 
                about which technologies to adopt.
            """),
            tools=[search_tool, serper_tool],
            verbose=True,
            allow_delegation=False,
            max_iter=3
        )

class AIToolResearchTasks:
    def search_tools_task(self, agent, task, ai_stack):
        """Task to discover relevant tools for a specific task considering the existing AI stack."""
        return Task(
            description=dedent(f"""
                Research and identify 3-5 relevant open-source AI tools (libraries, frameworks) for the task: '{task}'.
                Consider compatibility with the existing AI stack: '{ai_stack}'.
                
                For each tool found, identify:
                - Name
                - Primary URL (GitHub repository or official website)
                - Primary use case
                - Brief description
                
                Your output should be formatted as a JSON list of objects:
                [
                  {{
                    "name": "Tool Name",
                    "url": "https://github.com/org/repo",
                    "use_case": "Primary use case in one sentence",
                    "description": "Brief description of the tool (2-3 sentences)"
                  }},
                  ...
                ]
                
                Focus on finding the most appropriate and well-maintained tools for the task.
                Consider GitHub stars, recent activity, documentation quality, and community support.
            """),
            agent=agent,
            expected_output="A JSON list containing details of 3-5 relevant AI tools with their names, URLs, use cases, and descriptions.",
            output_file="output/tool_candidates.json"
        )

    def analyze_tools_task(self, agent, task, ai_stack):
        """Task to perform in-depth analysis of discovered tools."""
        return Task(
            description=dedent(f"""
                Read the tool list from the previous task's result and perform a detailed analysis of each tool.
                
                For each tool:
                1. Research its capabilities, limitations, community adoption, documentation quality
                2. Evaluate how well it addresses the specified task: '{task}'
                3. Consider its compatibility with '{ai_stack}'
                4. Identify pros and cons
                
                Generate a detailed JSON analysis with these fields for each tool:
                - name: Tool name
                - url: Primary URL
                - description: Comprehensive description
                - features: List of key features
                - pros: List of advantages
                - cons: List of limitations
                - integration_complexity: "Low"/"Medium"/"High"
                - community_support: Assessment of community and maintenance
                - documentation_quality: Assessment of documentation
                - recommendation_score: 1-10 integer
                - recommendation_justification: Explanation of the score
                
                Your output should be a JSON list of these detailed analysis objects.
            """),
            agent=agent,
            expected_output="A JSON list containing detailed analysis of each tool with comprehensive information about features, pros, cons, and recommendation scores.",
            output_file="output/tool_analysis.json"
        )

    def create_report_task(self, agent, task, ai_stack):
        """Task to create a comprehensive report with recommendations."""
        return Task(
            description=dedent(f"""
                Read the analysis from the previous task's result and create a comprehensive Markdown report.
                
                The report should include:
                
                1. An introduction explaining the task ('{task}') and existing stack ('{ai_stack}')
                2. For each tool, create a section with:
                   - Tool name and URL as a heading
                   - Description
                   - Features (as bullet points)
                   - Pros (as bullet points)
                   - Cons (as bullet points)
                   - Integration complexity
                   - Recommendation score with justification
                3. A summary/conclusion comparing the tools and providing final recommendations
                
                Use proper Markdown formatting with headings, bullet points, and emphasis where appropriate.
                Sort tools by recommendation score (descending).
                
                Your output should be a complete, well-formatted Markdown document.
            """),
            agent=agent,
            expected_output="A comprehensive Markdown report analyzing each tool with recommendations, properly formatted with headings, bullet points, and clear sections.",
            output_file="output/tool_recommendation_report.md"
        )

class AIToolResearchCrew:
    def __init__(self, task, ai_stack):
        """
        Initialize the crew with the task description and existing AI stack.
        
        Args:
            task (str): Description of the task requiring AI tools
            ai_stack (str): Comma-separated list of existing tools/frameworks used
        """
        self.task = task
        self.ai_stack = ai_stack
        
        # Ensure output directory exists
        os.makedirs("output", exist_ok=True)

    def run(self):
        """Execute the research, analysis, and reporting process."""
        # Initialize agents
        agents = AIToolResearchAgents()
        researcher = agents.researcher_agent()
        analyst = agents.analyst_agent()
        
        # Initialize tasks
        tasks = AIToolResearchTasks()
        search_task = tasks.search_tools_task(researcher, self.task, self.ai_stack)
        
        # Create the crew first
        crew = Crew(
            agents=[researcher, analyst],
            tasks=[search_task],
            verbose=True,
            process=Process.sequential,
            memory=True
        )
        
        # Now add the dependent tasks that need context
        analyze_task = tasks.analyze_tools_task(analyst, self.task, self.ai_stack)
        crew.tasks.append(analyze_task)
        
        report_task = tasks.create_report_task(analyst, self.task, self.ai_stack)
        crew.tasks.append(report_task)
        
        result = crew.kickoff()
        return result

# # Example usage:
# if __name__ == "__main__":
#     task_description = "Develop a conversational RAG system that can answer questions based on a large PDF document collection"
#     existing_stack = "LangChain, OpenAI, Pinecone, FastAPI"
    
#     crew = AIToolResearchCrew(task_description, existing_stack)
#     result = crew.run()
    
#     print("\n=== Research Complete ===")
#     print(f"Check the output directory for detailed results.")
#     print("Final Report Path: output/tool_recommendation_report.md")

In [29]:
# Example usage:

task_description = "Develop a conversational RAG system that can answer questions based on a large PDF document collection"
existing_stack = "LangChain, OpenAI, Pinecone, FastAPI"

ai_dev_crew = AIToolResearchCrew(task_description, existing_stack)
result = ai_dev_crew.run()




🏃 View run Tool_Research_1744744212 at: https://public-tracking.mlflow-e00rfd6s4w3cdtk9fx.backbone-e00rt16p71ew33dg9e.msp.eu-north1.nebius.cloud/#/experiments/5/runs/5145b0c944ae4c16a9cb7e37da282df2
🧪 View experiment at: https://public-tracking.mlflow-e00rfd6s4w3cdtk9fx.backbone-e00rt16p71ew33dg9e.msp.eu-north1.nebius.cloud/#/experiments/5


In [19]:

print("\n=== Research Complete ===")
print(f"Check the output directory for detailed results.")
print("Final Report Path: output/tool_recommendation_report.md")


=== Research Complete ===
Check the output directory for detailed results.
Final Report Path: output/tool_recommendation_report.md


## Add "autolog" for LLM calls tracing

## Organize Runs and add artifacts tracking

In [25]:
from crewai import Agent, Crew, Task, Process
from crewai_tools import SerperDevTool, WebsiteSearchTool
from textwrap import dedent
import os
import mlflow
import json
import time
import logging
import io
import sys
import contextlib

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Initialize search tools
search_tool = WebsiteSearchTool()
serper_tool = SerperDevTool()  # Requires SERPER_API_KEY in environment variables

class AIToolResearchAgents:
    def researcher_agent(self):
        """Agent responsible for discovering relevant AI tools for specific tasks."""
        return Agent(
            role="AI Tool Researcher",
            goal="Discover and evaluate relevant AI tools (libraries, frameworks) for specific tasks, considering the existing AI stack used in the project.",
            backstory=dedent("""
                You are an AI research specialist with extensive knowledge of the AI ecosystem. 
                Your expertise lies in finding the most appropriate tools for specific tasks, 
                considering compatibility with existing technologies. You excel at evaluating 
                open-source projects, understanding their purpose, and identifying their potential 
                use cases. You always back your recommendations with solid research.
            """),
            tools=[search_tool, serper_tool],
            verbose=True,
            allow_delegation=False,
            max_iter=3
        )

    def analyst_agent(self):
        """Agent responsible for analyzing tools and creating detailed reports."""
        return Agent(
            role="Technical Analyst",
            goal="Thoroughly analyze AI tools and prepare comprehensive reports highlighting strengths, weaknesses, and integration opportunities.",
            backstory=dedent("""
                You are a technical expert specializing in AI technologies evaluation. 
                You have a deep understanding of AI libraries and frameworks, and can 
                quickly assess their technical merits. Your analytical skills allow you 
                to identify pros and cons of each tool, and you're known for creating 
                clear, well-structured reports that help teams make informed decisions 
                about which technologies to adopt.
            """),
            tools=[search_tool, serper_tool],
            verbose=True,
            allow_delegation=False,
            max_iter=3
        )

class AIToolResearchTasks:
    def search_tools_task(self, agent, task, ai_stack):
        """Task to discover relevant tools for a specific task considering the existing AI stack."""
        return Task(
            description=dedent(f"""
                Research and identify 3-5 relevant open-source AI tools (libraries, frameworks) for the task: '{task}'.
                Consider compatibility with the existing AI stack: '{ai_stack}'.
                
                For each tool found, identify:
                - Name
                - Primary URL (GitHub repository or official website)
                - Primary use case
                - Brief description
                
                Your output should be formatted as a JSON list of objects:
                [
                  {{
                    "name": "Tool Name",
                    "url": "https://github.com/org/repo",
                    "use_case": "Primary use case in one sentence",
                    "description": "Brief description of the tool (2-3 sentences)"
                  }},
                  ...
                ]
                
                Focus on finding the most appropriate and well-maintained tools for the task.
                Consider GitHub stars, recent activity, documentation quality, and community support.
            """),
            agent=agent,
            expected_output="A JSON list containing details of 3-5 relevant AI tools with their names, URLs, use cases, and descriptions.",
            output_file="output/tool_candidates.json"
        )

    def analyze_tools_task(self, agent, task, ai_stack):
        """Task to perform in-depth analysis of discovered tools."""
        return Task(
            description=dedent(f"""
                Read the tool list from the previous task's result and perform a detailed analysis of each tool.
                
                For each tool:
                1. Research its capabilities, limitations, community adoption, documentation quality
                2. Evaluate how well it addresses the specified task: '{task}'
                3. Consider its compatibility with '{ai_stack}'
                4. Identify pros and cons
                
                Generate a detailed JSON analysis with these fields for each tool:
                - name: Tool name
                - url: Primary URL
                - description: Comprehensive description
                - features: List of key features
                - pros: List of advantages
                - cons: List of limitations
                - integration_complexity: "Low"/"Medium"/"High"
                - community_support: Assessment of community and maintenance
                - documentation_quality: Assessment of documentation
                - recommendation_score: 1-10 integer
                - recommendation_justification: Explanation of the score
                
                Your output should be a JSON list of these detailed analysis objects.
            """),
            agent=agent,
            expected_output="A JSON list containing detailed analysis of each tool with comprehensive information about features, pros, cons, and recommendation scores.",
            output_file="output/tool_analysis.json"
        )

    def create_report_task(self, agent, task, ai_stack):
        """Task to create a comprehensive report with recommendations."""
        return Task(
            description=dedent(f"""
                Read the analysis from the previous task's result and create a comprehensive Markdown report.
                
                The report should include:
                
                1. An introduction explaining the task ('{task}') and existing stack ('{ai_stack}')
                2. For each tool, create a section with:
                   - Tool name and URL as a heading
                   - Description
                   - Features (as bullet points)
                   - Pros (as bullet points)
                   - Cons (as bullet points)
                   - Integration complexity
                   - Recommendation score with justification
                3. A summary/conclusion comparing the tools and providing final recommendations
                
                Use proper Markdown formatting with headings, bullet points, and emphasis where appropriate.
                Sort tools by recommendation score (descending).
                
                Your output should be a complete, well-formatted Markdown document.
            """),
            agent=agent,
            expected_output="A comprehensive Markdown report analyzing each tool with recommendations, properly formatted with headings, bullet points, and clear sections.",
            output_file="output/tool_recommendation_report.md"
        )

class AIToolResearchCrew:
    def __init__(self, task, ai_stack):
        """
        Initialize the crew with the task description and existing AI stack.
        
        Args:
            task (str): Description of the task requiring AI tools
            ai_stack (str): Comma-separated list of existing tools/frameworks used
        """
        self.task = task
        self.ai_stack = ai_stack
        self.run_id = None
        self.execution_log = io.StringIO()
        self.crew = None
        
        # Ensure output directory exists
        os.makedirs("output", exist_ok=True)

    def run(self):
        """Execute the research, analysis, and reporting process with MLflow tracking."""
        # Set up MLflow
        mlflow.set_experiment("AI Tool Research Crew")
        
        # Start MLflow run
        with mlflow.start_run(run_name=f"Tool_Research_{int(time.time())}") as mlflow_run:
            self.run_id = mlflow_run.info.run_id
            
            # Log parameters
            mlflow.log_param("task", self.task)
            mlflow.log_param("ai_stack", self.ai_stack)
            
            # Capture console output
            with contextlib.redirect_stdout(self.execution_log), contextlib.redirect_stderr(self.execution_log):
                try:
                    # Initialize agents
                    logger.info("Initializing agents...")
                    agents = AIToolResearchAgents()
                    researcher = agents.researcher_agent()
                    analyst = agents.analyst_agent()
                    
                    # Log agent configurations
                    mlflow.log_dict(
                        {
                            "researcher": {
                                "role": researcher.role,
                                "goal": researcher.goal,
                                "backstory": researcher.backstory,
                                "allow_delegation": researcher.allow_delegation,
                                "max_iter": researcher.max_iter
                            },
                            "analyst": {
                                "role": analyst.role,
                                "goal": analyst.goal,
                                "backstory": analyst.backstory,
                                "allow_delegation": analyst.allow_delegation,
                                "max_iter": analyst.max_iter
                            }
                        },
                        "agent_configurations.json"
                    )
                    
                    # Track execution time
                    start_time = time.time()
                    
                    # Initialize tasks
                    logger.info("Setting up tasks...")
                    tasks = AIToolResearchTasks()
                    search_task = tasks.search_tools_task(researcher, self.task, self.ai_stack)
                    
                    # Create the crew
                    logger.info("Creating crew...")
                    crew = Crew(
                        agents=[researcher, analyst],
                        tasks=[search_task],
                        verbose=True,
                        process=Process.sequential,
                        memory=True
                    )
                    
                    # Add dependent tasks
                    analyze_task = tasks.analyze_tools_task(analyst, self.task, self.ai_stack)
                    crew.tasks.append(analyze_task)
                    
                    report_task = tasks.create_report_task(analyst, self.task, self.ai_stack)
                    crew.tasks.append(report_task)
                    
                    # Start the crew
                    logger.info("Starting crew execution...")
                    result = crew.kickoff()
                    logger.info("Crew execution completed.")
                    
                    # Log execution time
                    execution_time = time.time() - start_time
                    mlflow.log_metric("execution_time_seconds", execution_time)
                    
                    # Log task artifacts if they exist
                    self._log_artifacts()
                    
                    # Set success tag
                    if os.path.exists("output/tool_recommendation_report.md"):
                        mlflow.set_tag("status", "SUCCESS")
                        mlflow.log_metric("success", 1)
                    else:
                        mlflow.set_tag("status", "FAILED")
                        mlflow.log_metric("success", 0)
                    
                    # Log crew metrics
                    if hasattr(crew, "usage_metrics") and crew.usage_metrics:
                        metrics = crew.usage_metrics
                        if hasattr(metrics, "total_tokens"):
                            mlflow.log_metric("total_tokens", metrics.total_tokens)
                        if hasattr(metrics, "prompt_tokens"):
                            mlflow.log_metric("prompt_tokens", metrics.prompt_tokens)
                        if hasattr(metrics, "completion_tokens"):
                            mlflow.log_metric("completion_tokens", metrics.completion_tokens)
                        if hasattr(metrics, "total_cost"):
                            mlflow.log_metric("total_cost", metrics.total_cost)

                    self.crew = crew
                    
                    return result
                
                except Exception as e:
                    logger.error(f"Error during crew execution: {str(e)}", exc_info=True)
                    mlflow.set_tag("status", "FAILED")
                    mlflow.log_metric("success", 0)
                    raise e
                
                finally:
                    # Log execution trace
                    logger.info("Logging execution trace...")
                    execution_log = self.execution_log.getvalue()
                    mlflow.log_text(execution_log, "execution_log.txt")

    def _log_artifacts(self):
        """Log task output artifacts to MLflow."""
        artifact_files = [
            ("output/tool_candidates.json", "Task 1: Tool Discovery"),
            ("output/tool_analysis.json", "Task 2: Tool Analysis"),
            ("output/tool_recommendation_report.md", "Task 3: Final Report")
        ]
        
        for file_path, description in artifact_files:
            if os.path.exists(file_path):
                logger.info(f"Logging artifact: {file_path}")
                mlflow.log_artifact(file_path)
                
                # For JSON files, also log as parameters for easier viewing
                if file_path.endswith('.json'):
                    try:
                        with open(file_path, 'r') as f:
                            data = json.load(f)
                            
                        if file_path == "output/tool_candidates.json":
                            # Log tool names as parameters
                            tool_names = [tool.get('name', f"Tool {i+1}") for i, tool in enumerate(data)]
                            mlflow.log_param("discovered_tools", ", ".join(tool_names))
                            
                        elif file_path == "output/tool_analysis.json":
                            # Log recommendation scores as metrics
                            for i, tool in enumerate(data):
                                tool_name = tool.get('name', f"Tool {i+1}")
                                score = tool.get('recommendation_score', 0)
                                if isinstance(score, (int, float)):
                                    mlflow.log_metric(f"{tool_name}_score", score)
                    except Exception as e:
                        logger.warning(f"Error logging JSON data from {file_path}: {str(e)}")
            else:
                logger.warning(f"Artifact file not found: {file_path}")

# Example usage:
# if __name__ == "__main__":
    # task_description = "Develop a conversational RAG system that can answer questions based on a large PDF document collection"
    # existing_stack = "LangChain, OpenAI, Pinecone, FastAPI"
    
    # crew = AIToolResearchCrew(task_description, existing_stack)
    # result = crew.run()
    
    # print("\n=== Research Complete ===")
    # print(f"Check the output directory for detailed results.")
    # print("Final Report Path: output/tool_recommendation_report.md")
    # print(f"MLflow Run ID: {crew.run_id}")
    # print(f"View details at: http://localhost:5000/#/experiments/1/runs/{crew.run_id}")

In [30]:
# Let's run it once without MLflow to see the typical verbose console output.

# Turn on auto tracing by calling mlflow.crewai.autolog()
mlflow.crewai.autolog()
mlflow.set_experiment("Autolog")

ai_dev_crew = AIOpsResearchCrew(task_description, existing_stack)
result = ai_dev_crew.run()

NameError: name 'AIOpsResearchCrew' is not defined

In [None]:
ai_dev_crew

In [26]:
task_description = "Develop a conversational RAG system that can answer questions based on a large PDF document collection"
existing_stack = "LangChain, OpenAI, Pinecone, FastAPI"

crew = AIToolResearchCrew(task_description, existing_stack)
result = crew.run()




🏃 View run Tool_Research_1744718194 at: https://public-tracking.mlflow-e00rfd6s4w3cdtk9fx.backbone-e00rt16p71ew33dg9e.msp.eu-north1.nebius.cloud/#/experiments/5/runs/fd43eb499a324e8190f976820803fb7d
🧪 View experiment at: https://public-tracking.mlflow-e00rfd6s4w3cdtk9fx.backbone-e00rt16p71ew33dg9e.msp.eu-north1.nebius.cloud/#/experiments/5


In [27]:
print("\n=== Research Complete ===")
print(f"Check the output directory for detailed results.")
print("Final Report Path: output/tool_recommendation_report.md")
print(f"MLflow Run ID: {crew.run_id}")


=== Research Complete ===
Check the output directory for detailed results.
Final Report Path: output/tool_recommendation_report.md
MLflow Run ID: fd43eb499a324e8190f976820803fb7d


## Add metrics

In [28]:
# crew = AIToolResearchCrew(task_description, existing_stack)
# result = crew.kickoff()

print(crew.usage_metrics)

AttributeError: 'AIToolResearchCrew' object has no attribute 'usage_metrics'

## Update prompts

## Evaluate quality of the Agent

In [None]:
import os
import json
import time
import logging
import io
import sys
import contextlib

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)


class AIOpsResearchCrew:
    def __init__(self, task, ai_stack, llm_config=None, evaluator_llm_config=None):
        """
        Initialize the crew with the task description and existing AI stack.
        
        Args:
            task (str): Description of the task requiring AI tools
            ai_stack (str): Comma-separated list of existing tools/frameworks used
            llm_config (Optional): LLM configuration to use for the agents
            evaluator_llm_config (Optional): LLM configuration to use for the evaluator (defaults to strong model)
        """
        self.task = task
        self.ai_stack = ai_stack
        self.llm_config = llm_config
        self.evaluator_llm_config = evaluator_llm_config
        self.run_id = None
        
        # Get LLM model name for logging
        self.llm_name = getattr(llm_config, "model_name", "default") if llm_config else "default"
        
        # Ensure output directory exists
        os.makedirs("output", exist_ok=True)
        
        # Initialize MLflow
        mlflow.set_experiment("AIOps Tool Research")

    def run(self, evaluate=True):
        """
        Execute the research, analysis, and reporting process.
        
        Args:
            evaluate (bool): Whether to evaluate the final report
            
        Returns:
            The result from the crew execution
        """
        # Start MLflow tracking
        with mlflow.start_run(run_name=f"Tool_Research_{self.llm_name}_{int(time.time())}") as run:
            self.run_id = run.info.run_id
            
            # Log parameters
            mlflow.log_param("task", self.task)
            mlflow.log_param("ai_stack", self.ai_stack)
            mlflow.log_param("llm_model", self.llm_name)
            
            # Capture logs
            log_buffer = io.StringIO()
            start_time = time.time()
            
            with contextlib.redirect_stdout(log_buffer), contextlib.redirect_stderr(log_buffer):
                try:
                    # Initialize agents
                    agents = AIOpsResearchAgents()
                    researcher = agents.researcher_agent(self.llm_config)
                    analyst = agents.analyst_agent(self.llm_config)
                    
                    # Initialize tasks
                    tasks = AIOpsResearchTasks()
                    search_task = tasks.search_tools_task(researcher, self.task, self.ai_stack)
                    
                    # Create the crew
                    crew = Crew(
                        agents=[researcher, analyst],
                        tasks=[search_task],
                        verbose=True,
                        process=Process.sequential,
                        memory=True
                    )
                    
                    # Add dependent tasks
                    analyze_task = tasks.analyze_tools_task(analyst, self.task, self.ai_stack)
                    crew.tasks.append(analyze_task)
                    
                    report_task = tasks.create_report_task(analyst, self.task, self.ai_stack)
                    crew.tasks.append(report_task)
                    
                    # Execute the crew
                    result = crew.kickoff()
                    
                    # Log artifacts
                    if os.path.exists("output/tool_candidates.json"):
                        mlflow.log_artifact("output/tool_candidates.json")
                    
                    if os.path.exists("output/tool_analysis.json"):
                        mlflow.log_artifact("output/tool_analysis.json")
                        
                    if os.path.exists("output/tool_recommendation_report.md"):
                        mlflow.log_artifact("output/tool_recommendation_report.md")
                        mlflow.set_tag("status", "SUCCESS")
                    else:
                        mlflow.set_tag("status", "FAILED")
                    
                    # Evaluate the report if requested
                    if evaluate and os.path.exists("output/tool_recommendation_report.md"):
                        # Create evaluator agent with a strong model
                        evaluator = agents.evaluator_agent()
                        
                        # Create and run evaluation task
                        eval_task = tasks.evaluate_report_task(evaluator, self.task, self.ai_stack)
                        eval_crew = Crew(
                            agents=[evaluator],
                            tasks=[eval_task],
                            verbose=True
                        )
                        eval_result = eval_crew.kickoff()
                        
                        # Log evaluation results
                        if os.path.exists("output/report_evaluation.json"):
                            mlflow.log_artifact("output/report_evaluation.json")
                            
                            # Also log scores as metrics for easier comparison
                            try:
                                with open("output/report_evaluation.json", 'r') as f:
                                    eval_data = json.load(f)
                                    
                                for score_name, score_value in eval_data.items():
                                    if score_name.endswith('_score'):
                                        mlflow.log_metric(score_name, score_value)
                                
                                # Log reasoning as text
                                if 'reasoning' in eval_data:
                                    mlflow.log_text(eval_data['reasoning'], "evaluation_reasoning.txt")
                            except Exception as e:
                                logger.error(f"Error logging evaluation metrics: {str(e)}")
                    
                    # Log execution time
                    execution_time = time.time() - start_time
                    mlflow.log_metric("execution_time_seconds", execution_time)
                    
                    # Log usage metrics if available
                    if hasattr(crew, "usage_metrics") and crew.usage_metrics:
                        metrics = crew.usage_metrics
                        if hasattr(metrics, "total_tokens"):
                            mlflow.log_metric("total_tokens", metrics.total_tokens)
                        if hasattr(metrics, "total_cost"):
                            mlflow.log_metric("total_cost", metrics.total_cost)
                    
                    return result
                    
                except Exception as e:
                    logger.error(f"Error executing crew: {str(e)}", exc_info=True)
                    mlflow.set_tag("status", "ERROR")
                    mlflow.log_text(str(e), "error.txt")
                    raise e
                
                finally:
                    # Always log the execution logs
                    mlflow.log_text(log_buffer.getvalue(), "execution_log.txt")

In [None]:
import os
import json
import time
import logging
import io
import sys
import contextlib

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)


class AIOpsResearchCrew:
    def __init__(self, task, ai_stack):
        """
        Initialize the crew with the task description and existing AI stack.
        
        Args:
            task (str): Description of the task requiring AI tools
            ai_stack (str): Comma-separated list of existing tools/frameworks used
        """
        self.task = task
        self.ai_stack = ai_stack
        self.run_id = None
        self.execution_log = io.StringIO()
        self.crew = None
        
        # Ensure output directory exists
        os.makedirs("output", exist_ok=True)

    def run(self):
        """Execute the research, analysis, and reporting process with MLflow tracking."""
        
        # --- UPDATE: Start MLflow run ---
        with mlflow.start_run(run_name=f"Tool_Research_{int(time.time())}") as mlflow_run:
            self.run_id = mlflow_run.info.run_id
            
            # --- UPDATE: Log parameters ---
            mlflow.log_param("task", self.task)
            mlflow.log_param("ai_stack", self.ai_stack)
            
            # --- UPDATE: Capture console output ---
            with contextlib.redirect_stdout(self.execution_log), contextlib.redirect_stderr(self.execution_log):
                try:
                    # Initialize agents
                    logger.info("Initializing agents...")
                    agents = AIOpsResearchAgents()
                    researcher = agents.researcher_agent()
                    analyst = agents.analyst_agent()
                    
                    # Initialize tasks
                    logger.info("Setting up tasks...")
                    tasks = AIOpsResearchTasks()
                    search_task = tasks.search_tools_task(researcher, self.task, self.ai_stack)
                    
                    # Create the crew
                    logger.info("Creating crew...")
                    crew = Crew(
                        agents=[researcher, analyst],
                        tasks=[search_task],
                        verbose=True,
                        process=Process.sequential,
                        memory=True
                    )
                    self.crew = crew
                    
                    # Add dependent tasks
                    analyze_task = tasks.analyze_tools_task(analyst, self.task, self.ai_stack)
                    crew.tasks.append(analyze_task)
                    
                    report_task = tasks.create_report_task(analyst, self.task, self.ai_stack)
                    crew.tasks.append(report_task)
                    
                    # Start the crew
                    logger.info("Starting crew execution...")
                    result = crew.kickoff()
  
                    # --- UPDATE: Log metrics ---
                    logger.info("Starting log metrics...")
                    if hasattr(crew, "usage_metrics") and crew.usage_metrics:
                        mlflow.log_metrics(json.loads(crew.usage_metrics.json()))

                    
                    # --- UPDATE: Log task artifacts if they exist ---
                    self._log_artifacts()

                    # --- UPDATE: Set success tag ---
                    if os.path.exists("output/tool_recommendation_report.md"):
                        mlflow.set_tag("status", "SUCCESS")
                    else:
                        mlflow.set_tag("status", "FAILED")
                    
                    return result
                
                # --- UPDATE: Log execution status and logs 
                except Exception as e:
                    logger.error(f"Error during crew execution: {str(e)}", exc_info=True)
                    mlflow.set_tag("status", "FAILED")
                    mlflow.log_metric("success", 0)
                    raise e
                
                finally:
                    # Log execution trace
                    logger.info("Logging execution trace...")
                    execution_log = self.execution_log.getvalue()
                    mlflow.log_text(execution_log, "execution_log.txt")

    def _log_artifacts(self):
        """Log task output artifacts to MLflow."""
        artifact_files = [
            ("output/tool_candidates.json", "Task 1: Tool Discovery"),
        ]
        
        for file_path, description in artifact_files:
            if os.path.exists(file_path):
                logger.info(f"Logging artifact: {file_path}")
                mlflow.log_artifact(file_path)
                
                # For JSON files, also log as parameters for easier viewing
                if file_path.endswith('.json'):
                    try:
                        with open(file_path, 'r') as f:
                            data = json.load(f)
                            
                        # Log tool names as parameters
                        tool_names = [tool.get('name', f"Tool {i+1}") for i, tool in enumerate(data)]
                        mlflow.log_param("discovered_tools", ", ".join(tool_names))

                    except Exception as e:
                        logger.warning(f"Error logging JSON data from {file_path}: {str(e)}")
            else:
                logger.warning(f"Artifact file not found: {file_path}")