# Ragas

Ragas is a library that provides tools to supercharge the evaluation of Large Language Model (LLM) applications.

In [None]:
pip install ragas

In [2]:
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.dataset_schema import  SingleTurnSample, MultiTurnSample
from ragas.messages import HumanMessage, AIMessage, ToolCall

In [3]:
llm = AzureChatOpenAI(
        deployment_name="gpt-4o",
        api_version="2023-06-01-preview"
    )

embeddings = AzureOpenAIEmbeddings(model="ada-002", openai_api_version="2024-06-01")

evaluator_llm = LangchainLLMWrapper(llm)
evaluator_embeddings = LangchainEmbeddingsWrapper(embeddings)

In [34]:
agent_samples = {
    "rag": [
        SingleTurnSample(
            user_input="cual es el objetivo del proyecto de Fabian?",
            response="El objetivo del proyecto de Fabián es desarrollar un chatbot especializado que pueda ser entrenado con documentos propietarios de una empresa y que, en base al contenido ingestado, interprete correctamente consultas de usuarios y proporcione respuestas precisas y relevantes.",
            retrieved_contexts=[
                '1. Descripci´ on t´ ecnica-conceptual del proyecto a realizar\nEl objetivo de este proyecto es desarrollar un chatbot especializado que pueda ser entrenado\ncon documentos propietarios de una empresa, y que, en base al contenido ingestado, interprete\ncorrectamente consultas de usuarios y proporcione respuestas precisas y relevantes.', 
                'Acta de constituci´ on del proyecto\nBuenos Aires, 23 de abril de 2024\nPor medio de la presente se acuerda con el Ing. Fabi´ an Alejandro Massotto que su Trabajo\nFinal de la Carrera de Especializaci´ on en Inteligencia Artificial se titular´ a “Desarrollo de un\nchatbot especializado para optimizar la b´ usqueda de informaci´ on en documentos propietarios” y',
                'La propuesta de valor de este proyecto radica en su capacidad para mejorar la eficiencia operativa\nde una empresa. Al facilitar el acceso a la informaci´ on, se reduce el tiempo dedicado a la\nb´ usqueda y se incrementa el tiempo disponible para tareas cr´ ıticas y estrat´ egicas. Adem´ as,\nla capacidad de adecuar el chatbot seg´ un las necesidades y documentos de cada empresa lo'
            ], 
        ),
        SingleTurnSample(
            user_input="cuando se entrego la planificacion del proyecto?",
            response="La planificación del proyecto se completó el 28 de mayo de 2024.",
            retrieved_contexts=[
                'Plan de proyecto del Trabajo Final\nCarrera de Especializaci´ on en Inteligencia Artificial\nIng. Fabi´ an Alejandro Massotto\nRegistros de cambios\nRevisi´ on Detalles de los cambios realizados Fecha\n0 Creaci´ on del documento 23 de abril de 2024\n1 Se completa hasta el punto 5 inclusive 7 de mayo de 2024\n2 Se completa hasta el punto 9 inclusive 14 de mayo de 2024\n3 Se completa hasta el punto 12 inclusive 21 de mayo de 2024\n4 Se completa el plan 28 de mayo de 2024\nP´ agina 3 de 18', 
                '- Se har´ a entrega de la documentaci´ on correspondiente.\nIdentificaci´ on de las t´ ecnicas y procedimientos ´ utiles e in´ utiles que se emplearon, los\nproblemas que surgieron y c´ omo se solucionaron:\n- Persona a cargo: Ing. Fabi´ an Alejandro Massotto.\n- Se documentar´ a en la memoria t´ ecnica.\nActo de agradecimiento a todos los interesados:\n- Persona a cargo: Ing. Fabi´ an Alejandro Massotto.\n- Se realizar´ a durante la presentaci´ on final organizada por las autoridades del posgrado.',
                'Se adjunta a esta acta la planificaci´ on inicial.\nDr. Ing. Ariel Lutenberg\nDirector posgrado FIUBAIng. Sebasti´ an Andr´ es Mendez\nExxonMobil\nEsp. Ing. Ezequiel Guinsburg\nDirector del Trabajo Final\nP´ agina 4 de 18'
            ], 
        ),
        SingleTurnSample(
            user_input="tiempo de realizacion estimado?",
            response="El tiempo de realización estimado del proyecto es desde el 23 de abril de 2024 hasta el 10 de abril de 2025.",
            retrieved_contexts=[
                'consistir´ a en la implementaci´ on de un modelo de inteligencia artificial que interprete consultas de\nusuarios y proporcione respuestas precisas y relevantes basadas en el contenido de documentos\nde car´ acter muy espec´ ıfico. El trabajo tendr´ a un presupuesto preliminar estimado de 607 horas\ny un costo estimado de $16.996.000, con fecha de inicio el 23 de abril de 2024 y fecha de\npresentaci´ on p´ ublica el 10 de abril de 2025.\nSe adjunta a esta acta la planificaci´ on inicial.',
                '3.4. Desarrollo de la l´ ogica para b´ usqueda por similitud. (30 h)\n3.5. Desarrollo de una API para posterior integraci´ on con el frontend . (25 h)\n3.6. Entrenamiento inicial del modelo. (15 h)\n3.7. Pruebas y ajustes. (40 h)\n4. Desarrollo de la interfaz de usuario. (125 h)\n4.1. Configuraci´ on del entorno de desarrollo. (5 h)\n4.2. Dise˜ no de la interfaz. (20 h)', 
                'Informe de avance.\nMemoria del trabajo final.\n9. Desglose del trabajo en tareas\n1. Planificaci´ on general. (32 h)\n1.1. Redacci´ on de la descripci´ on t´ ecnica-conceptual, prop´ osito, alcance y supuestos del\nproyecto. (8 h)\n1.2. Definici´ on de requerimientos, historias de usuarios y entregables. (8 h)\n1.3. Estimaci´ on de tiempos y presupuesto. (8 h)\n1.4. Definici´ on de gesti´ on de riesgos, calidad y procesos de cierre. (8 h)\n2. Preparaci´ on de los datos e investigaci´ on previa. (120 h)'
            ], 
        ),
        SingleTurnSample(
            user_input="presupuesto estimado?",
            response="El presupuesto estimado del proyecto es de $16.996.000.",
            retrieved_contexts=[
                'consistir´ a en la implementaci´ on de un modelo de inteligencia artificial que interprete consultas de\nusuarios y proporcione respuestas precisas y relevantes basadas en el contenido de documentos\nde car´ acter muy espec´ ıfico. El trabajo tendr´ a un presupuesto preliminar estimado de 607 horas\ny un costo estimado de $16.996.000, con fecha de inicio el 23 de abril de 2024 y fecha de\npresentaci´ on p´ ublica el 10 de abril de 2025.\nSe adjunta a esta acta la planificaci´ on inicial.', 
                '12. Presupuesto detallado del proyecto\nA continuaci´ on, se detalla el presupuesto calculado para el proyecto, expresado en pesos\nargentinos.\nCOSTOS DIRECTOS\nDescripci´ on Cantidad Valor unitario Valor total\nHoras de ingenier´ ıa 607 $20.000 $12.140.000\nSUBTOTAL $12.140.000\nCOSTOS INDIRECTOS\nDescripci´ on Cantidad Valor unitario Valor total', 
                'Informe de avance.\nMemoria del trabajo final.\n9. Desglose del trabajo en tareas\n1. Planificaci´ on general. (32 h)\n1.1. Redacci´ on de la descripci´ on t´ ecnica-conceptual, prop´ osito, alcance y supuestos del\nproyecto. (8 h)\n1.2. Definici´ on de requerimientos, historias de usuarios y entregables. (8 h)\n1.3. Estimaci´ on de tiempos y presupuesto. (8 h)\n1.4. Definici´ on de gesti´ on de riesgos, calidad y procesos de cierre. (8 h)\n2. Preparaci´ on de los datos e investigaci´ on previa. (120 h)'
            ], 
        ),
        SingleTurnSample(
            user_input="que riesgos presenta el proyecto?",
            response="""El proyecto presenta los siguientes riesgos:
                1. Baja adopción por parte de los usuarios: Si los usuarios no adoptan el chatbot, el proyecto no logrará su objetivo de mejorar la eficiencia operativa. Severidad: 6, Probabilidad de ocurrencia: 4.
                2. El proyecto no se finaliza en tiempo y forma: Debido a imprevistos, el proyecto podría retrasarse. Severidad: 3.
                3. Problemas de integración: Para mitigarlo, se realizarán pruebas de integración tempranas. Severidad: 9, Probabilidad de ocurrencia después de mitigación: 2.""",
            retrieved_contexts=[
                '4- Baja adopci´ on por parte de los usuarios. 64 24\n5- El proyecto no se finaliza en tiempo y forma. 33 9\nCriterio adoptado:\nSe tomar´ an medidas de mitigaci´ on en los riesgos cuyos n´ umeros de RPN sean mayores a 40.\nNota: los valores marcados con (*) en la tabla corresponden luego de haber aplicado la\nmitigaci´ on.\nc) Plan de mitigaci´ on de los riesgos que originalmente exced´ ıan el RPN m´ aximo establecido:\nRiesgo 1: utilizar modelos m´ as avanzados y actualizados, como por ejemplo GPT-4o.', 
                'Severidad (S): 6.\nSi los usuarios no adoptan el chatbot, el proyecto no lograr´ a su objetivo de mejorar la\neficiencia operativa.\nProbabilidad de ocurrencia (O): 4.\nAunque se espera que el chatbot sea ´ util, siempre existe el riesgo de resistencia al cambio\no preferencia por m´ etodos tradicionales.\nRiesgo 5: el proyecto no se finaliza en tiempo y forma debido a imprevistos.\nSeveridad (S): 3.\nAl tratarse de un proyecto de car´ acter personal, los tiempos de desarrollo son flexibles.', 
                'Riesgo 3: realizar pruebas de integraci´ on desde las primeras fases del desarrollo para identificar\ny resolver problemas a medida que surgen.\nSeveridad (S*): 9.\nLa severidad no se ve afectada.\nProbabilidad de ocurrencia (O*): 2.\nLas pruebas tempranas disminuir´ an significativamente la probabilidad de problemas de\nintegraci´ on.\n14. Gesti´ on de la calidad'
            ], 
        )
    ],
    "sql": [
        SingleTurnSample(
            user_input="cual es el producto mas vendido?",
            response="El producto más vendido es el que tiene el ProductID 715, con un total de 10 ventas.",
            retrieved_contexts=[
                """
                ```sql
                    SELECT TOP 1 sod.ProductID, COUNT(sod.ProductID) AS TotalSales
                    FROM SalesLT.SalesOrderDetail sod
                    GROUP BY sod.ProductID
                    ORDER BY TotalSales DESC;
                ```
                """, 
                "[(715, 10)]",
            ], 
        ),
        SingleTurnSample(
            user_input="nombre y precio de los 5 productos mas vendidos",
            response="""Los nombres, precios y ventas de los 5 productos más vendidos son:
                1. Classic Vest, S - $63.50
                2. Short-Sleeve Classic Jersey, XL - $53.99
                3. Bike Wash - Dissolver - $7.95
                4. Water Bottle - 30 oz. - $4.99
                5. AWC Logo Cap - $8.99""",
            retrieved_contexts=[
                """
                ```sql
                    SELECT TOP 5 p.Name, p.ListPrice
                    FROM SalesLT.Product p
                    JOIN SalesLT.SalesOrderDetail sod ON p.ProductID = sod.ProductID
                    GROUP BY p.Name, p.ListPrice
                    ORDER BY SUM(sod.OrderQty) DESC
                ```
                """, 
                "[('Classic Vest, S', Decimal('63.5000')), ('Short-Sleeve Classic Jersey, XL', Decimal('53.9900')), ('Bike Wash - Dissolver', Decimal('7.9500')), ('Water Bottle - 30 oz.', Decimal('4.9900')), ('AWC Logo Cap', Decimal('8.9900'))]",
            ], 
        ),
        SingleTurnSample(
            user_input="quien es el cliente con mas compras?",
            response="El cliente con más compras es Terry Eminhizer, con un total de 119960.8240 en compras.",
            retrieved_contexts=[
                """
                ```sql
                    SELECT TOP 1 C.CustomerID, C.FirstName, C.LastName, SUM(SOH.TotalDue) AS TotalCompras
                    FROM SalesLT.Customer C
                    JOIN SalesLT.SalesOrderHeader SOH ON C.CustomerID = SOH.CustomerID
                    GROUP BY C.CustomerID, C.FirstName, C.LastName
                    ORDER BY TotalCompras DESC;
                ```
                """, 
                "[(29736, 'Terry', 'Eminhizer', Decimal('119960.8240'))]",
            ], 
        ),
        SingleTurnSample(
            user_input="cual es el producto mas economico?",
            response="El producto más económico es el \"Patch Kit/8 Patches\" con un precio de 2.2900.",
            retrieved_contexts=[
                """
                ```sql
                    SELECT TOP 1 Name, ListPrice
                    FROM SalesLT.Product
                    ORDER BY ListPrice ASC;
                ```
                """, 
                "[('Patch Kit/8 Patches', Decimal('2.2900'))]",
            ], 
        ),
        SingleTurnSample(
            user_input="cuantos productos hay a la venta?",
            response="Hay 197 productos a la venta.",
            retrieved_contexts=[
                """
                ```sql
                    SELECT COUNT(*)
                    FROM SalesLT.Product
                    WHERE SellEndDate IS NULL;
                ```
                """, 
                "[(197,)]",
            ], 
        )
    ],
    "csv": [
        SingleTurnSample(
            user_input="cuantos superheroes llamados Fabian?",
            response="""Hay 5 superhéroes llamados Fabian:
                1. Fabian LaMuerto (Marvel)
                2. Fabian Stankowicz (Marvel)
                3. Fabian Marechal-Julbin (Marvel)
                4. Fabian Cortez (Marvel)
                5. Fabian Nicieza (Marvel)""",
            retrieved_contexts=[
                """
                ```python
                    import pandas as pd
                    from io import StringIO
                    from azure.storage.blob import BlobServiceClient

                    connection_string = "***"    
                    blob_service_client = BlobServiceClient.from_connection_string(connection_string)
                    blob_container = "csv"

                    def load_csv_file(file_name, blob_container, blob_service_client):
                        blob_client = blob_service_client.get_blob_client(container=blob_container, blob=file_name)
                        blob_data = blob_client.download_blob().content_as_text()
                        csv_data = StringIO(blob_data)
                        return pd.read_csv(csv_data)

                    dc_df = load_csv_file('dc.csv', blob_container, blob_service_client)
                    marvel_df = load_csv_file('marvel.csv', blob_container, blob_service_client)

                    # Filter superheroes named 'Fabian'
                    dc_fabian = dc_df[dc_df['Name'].str.contains('Fabian', case=False, na=False)]
                    marvel_fabian = marvel_df[marvel_df['Name'].str.contains('Fabian', case=False, na=False)]

                    # Combine results
                    fabian_heroes = pd.concat([dc_fabian, marvel_fabian])
                    result = fabian_heroes
                ```
                """, 
                """
                Id                    Name  Identity Alignment   Eyes   Hair   Sex Alive  Appearances First_appeared     Planet Universe
                2199    3191         Fabian LaMuerto    Public       Bad  Black  Black  Male   Yes            2     1998, July  Earth-616   Marvel
                4336    6288       Fabian Stankowicz    Public      Good   Blue   Grey  Male   Yes           52    1982, March  Earth-616   Marvel
                4698    6823  Fabian Marechal-Julbin    Secret       Bad   Blue  Brown  Male    No           16    1991, March  Earth-616   Marvel
                5076    7405           Fabian Cortez    Public       Bad   Blue  Brown  Male    No           49  1991, October  Earth-616   Marvel
                10248  14781          Fabian Nicieza  Non-dual   Neutral   Blue   Grey  Male   Yes            1      1993, May  Earth-616   Marvel
                """,
            ], 
        ),
        SingleTurnSample(
            user_input="son buenos o malos?",
            response="""De los superhéroes llamados Fabian:
                - Fabian LaMuerto es malo.
                - Fabian Stankowicz es bueno.
                - Fabian Marechal-Julbin es malo.
                - Fabian Cortez es malo.
                - Fabian Nicieza es neutral.""",
            retrieved_contexts=[
                """
                ```python
                    import pandas as pd
                    from azure.storage.blob import BlobServiceClient
                    from io import StringIO

                    def load_csv_file(file_name, blob_container, blob_service_client):
                        blob_client = blob_service_client.get_blob_client(container=blob_container, blob=file_name)
                        blob_data = blob_client.download_blob().content_as_text()
                        csv_data = StringIO(blob_data)
                        return pd.read_csv(csv_data)

                    # Set up the Azure Blob service client
                    connection_string = "***"    
                    blob_service_client = BlobServiceClient.from_connection_string(connection_string)
                    blob_container = "csv"

                    # Load the marvel.csv file
                    marvel_df = load_csv_file("marvel.csv", blob_container, blob_service_client)

                    # Filter the DataFrame for superheroes named "Fabian"
                    fabians = marvel_df[marvel_df['Name'].str.contains('Fabian', case=False, na=False)]

                    # Retrieve their alignments
                    result = fabians[['Name', 'Alignment']]
                ```
                """, 
                """
                                Name Alignment
                2199          Fabian LaMuerto       Bad
                4336        Fabian Stankowicz      Good
                4698   Fabian Marechal-Julbin       Bad
                5076            Fabian Cortez       Bad
                10248          Fabian Nicieza   Neutral
                """,
            ], 
        ),
        SingleTurnSample(
            user_input="quien tiene mas villanos, Marvel o DC?",
            response="Marvel tiene más villanos que DC.",
            retrieved_contexts=[
                """
                ```python
                    from azure.storage.blob import BlobServiceClient
                    import pandas as pd
                    from io import StringIO

                    def load_csv_file(file_name, blob_container, blob_service_client):
                        blob_client = blob_service_client.get_blob_client(container=blob_container, blob=file_name)
                        blob_data = blob_client.download_blob().content_as_text()
                        csv_data = StringIO(blob_data)
                        return pd.read_csv(csv_data)

                    connection_string = "***"    
                    blob_service_client = BlobServiceClient.from_connection_string(connection_string)
                    blob_container = "csv"

                    dc_df = load_csv_file("dc.csv", blob_container, blob_service_client)
                    marvel_df = load_csv_file("marvel.csv", blob_container, blob_service_client)

                    dc_villains_count = dc_df[dc_df['Alignment'] == 'Bad'].shape[0]
                    marvel_villains_count = marvel_df[marvel_df['Alignment'] == 'Bad'].shape[0]

                    result = "Marvel" if marvel_villains_count > dc_villains_count else "DC"
                ```
                """, 
                "Marvel",
            ], 
        ),
        SingleTurnSample(
            user_input="cuantos villanos hay en Marvel?",
            response="Hay 7258 villanos en Marvel.",
            retrieved_contexts=[
                """
                ```python
                    from azure.storage.blob import BlobServiceClient
                    import pandas as pd
                    from io import StringIO

                    def load_csv_file(file_name, blob_container, blob_service_client):
                        blob_client = blob_service_client.get_blob_client(container=blob_container, blob=file_name)
                        blob_data = blob_client.download_blob().content_as_text()
                        csv_data = StringIO(blob_data)
                        return pd.read_csv(csv_data)

                    connection_string = "***"    
                    blob_service_client = BlobServiceClient.from_connection_string(connection_string)

                    file_name = "marvel.csv"
                    blob_container = "csv"
                    marvel_df = load_csv_file(file_name, blob_container, blob_service_client)

                    result = marvel_df[marvel_df['Alignment'] == 'Bad'].shape[0]
                ```
                """, 
                "7258",
            ], 
        ),
        SingleTurnSample(
            user_input="hay algun personaje que exista en ambos Marvel y DC?",
            response="Sí, hay personajes que existen en ambos Marvel y DC. Algunos de ellos son Growler y Mordred.",
            retrieved_contexts=[
                """
                ```python
                    import pandas as pd
                    from azure.storage.blob import BlobServiceClient
                    from io import StringIO

                    # Azure storage setup
                    connection_string = "***"    
                    blob_service_client = BlobServiceClient.from_connection_string(connection_string)
                    blob_container = "csv"

                    def load_csv_file(file_name, blob_container, blob_service_client):
                        blob_client = blob_service_client.get_blob_client(container=blob_container, blob=file_name)
                        blob_data = blob_client.download_blob().content_as_text()
                        csv_data = StringIO(blob_data)
                        return pd.read_csv(csv_data)

                    # Load the data
                    dc_df = load_csv_file('dc.csv', blob_container, blob_service_client)
                    marvel_df = load_csv_file('marvel.csv', blob_container, blob_service_client)

                    # Check for common characters by name
                    common_characters = pd.merge(dc_df, marvel_df, on='Name', how='inner')

                    result = common_characters[['Name']]
                ```
                """, 
                """
                Name
                0         Growler
                1    Fidel Castro
                2         Mordred
                3         Cossack
                4     Jack Jordan
                ..            ...
                443        Gareth
                444         Bruno
                445         Bruno
                446          Axis
                447      Lancelot
                """,
            ], 
        )
    ],
    "api": [
        SingleTurnSample(
            user_input="cuantos repositorios tiene el usuario fabimass?",
            response="El usuario fabimass tiene 25 repositorios.",
            retrieved_contexts=[
                """
                ```python
                    import requests

                    username = 'fabimass'
                    url = f'https://api.github.com/users/{username}/repos'
                    response = requests.get(url)

                    if response.status_code == 200:
                        repos = response.json()
                        result = len(repos)
                    else:
                        result = f'Error: {response.status_code}'

                    result
                ```
                """, 
                "25",
            ], 
        ),
        SingleTurnSample(
            user_input="cuantas estrellas tiene el usuario fabimass?",
            response="El usuario fabimass ha dado estrellas a 5 repositorios.",
            retrieved_contexts=[
                """
                ```python
                    import requests

                    username = "fabimass"
                    url = f"https://api.github.com/users/{username}/starred"
                    headers = {
                        "Accept": "application/vnd.github.star+json"
                    }

                    response = requests.get(url, headers=headers)
                    if response.status_code == 200:
                        starred_repos = response.json()
                        result = len(starred_repos)
                    else:
                        result = f"Error: {response.status_code}"

                    result
                ```
                """, 
                "5",
            ], 
        ),
        SingleTurnSample(
            user_input="cual es el repo de fabimass mas activo?",
            response="El repositorio más activo de fabimass es \"ai-chatbot-multiagent\".",
            retrieved_contexts=[
                """
                ```python
                    import requests

                    username = 'fabimass'
                    url = f'https://api.github.com/users/{username}/repos'
                    params = {
                        'sort': 'pushed',
                        'direction': 'desc'
                    }

                    response = requests.get(url, params=params)
                    repos = response.json()

                    if repos:
                        result = repos[0]
                    else:
                        result = "No repositories found."

                    result
                ```
                """, 
                "{'id': 798025203, 'node_id': 'R_kgDOL5Dl8w', 'name': 'ai-chatbot-multiagent', 'full_name': 'fabimass/ai-chatbot-multiagent', 'private': False, 'owner': {'login': 'fabimass', 'id': 62024284, 'node_id': 'MDQ6VXNlcjYyMDI0Mjg0', 'avatar_url': 'https://avatars.githubusercontent.com/u/62024284?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/fabimass', 'html_url': 'https://github.com/fabimass', 'followers_url': 'https://api.github.com/users/fabimass/followers', 'following_url': 'https://api.github.com/users/fabimass/following{/other_user}', 'gists_url': 'https://api.github.com/users/fabimass/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/fabimass/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/fabimass/subscriptions', 'organizations_url': 'https://api.github.com/users/fabimass/orgs', 'repos_url': 'https://api.github.com/users/fabimass/repos', 'events_url': 'https://api.github.com/users/fabimass/events{/privacy}', 'received_events_url': 'https://api.github.com/users/fabimass/received_events', 'type': 'User', 'user_view_type': 'public', 'site_admin': False}, 'html_url': 'https://github.com/fabimass/ai-chatbot-multiagent', 'description': 'An AI-powered chatbot aimed to deliver accurate and contextually relevant responses by combining the strengths of pre-trained language models with specialized agents', 'fork': False, 'url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent', 'forks_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/forks', 'keys_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/keys{/key_id}', 'collaborators_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/collaborators{/collaborator}', 'teams_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/teams', 'hooks_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/hooks', 'issue_events_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/issues/events{/number}', 'events_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/events', 'assignees_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/assignees{/user}', 'branches_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/branches{/branch}', 'tags_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/tags', 'blobs_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/git/blobs{/sha}', 'git_tags_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/git/tags{/sha}', 'git_refs_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/git/refs{/sha}', 'trees_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/git/trees{/sha}', 'statuses_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/statuses/{sha}', 'languages_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/languages', 'stargazers_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/stargazers', 'contributors_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/contributors', 'subscribers_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/subscribers', 'subscription_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/subscription', 'commits_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/commits{/sha}', 'git_commits_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/git/commits{/sha}', 'comments_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/comments{/number}', 'issue_comment_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/issues/comments{/number}', 'contents_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/contents/{+path}', 'compare_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/compare/{base}...{head}', 'merges_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/merges', 'archive_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/{archive_format}{/ref}', 'downloads_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/downloads', 'issues_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/issues{/number}', 'pulls_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/pulls{/number}', 'milestones_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/milestones{/number}', 'notifications_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/notifications{?since,all,participating}', 'labels_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/labels{/name}', 'releases_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/releases{/id}', 'deployments_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/deployments', 'created_at': '2024-05-09T00:10:24Z', 'updated_at': '2025-01-19T18:12:21Z', 'pushed_at': '2025-01-19T18:12:20Z', 'git_url': 'git://github.com/fabimass/ai-chatbot-multiagent.git', 'ssh_url': 'git@github.com:fabimass/ai-chatbot-multiagent.git', 'clone_url': 'https://github.com/fabimass/ai-chatbot-multiagent.git', 'svn_url': 'https://github.com/fabimass/ai-chatbot-multiagent', 'homepage': '', 'size': 6431, 'stargazers_count': 0, 'watchers_count': 0, 'language': 'Python', 'has_issues': True, 'has_projects': True, 'has_downloads': True, 'has_wiki': True, 'has_pages': False, 'has_discussions': False, 'forks_count': 0, 'mirror_url': None, 'archived': False, 'disabled': False, 'open_issues_count': 0, 'license': {'key': 'gpl-3.0', 'name': 'GNU General Public License v3.0', 'spdx_id': 'GPL-3.0', 'url': 'https://api.github.com/licenses/gpl-3.0', 'node_id': 'MDc6TGljZW5zZTk='}, 'allow_forking': True, 'is_template': False, 'web_commit_signoff_required': False, 'topics': ['ai-agents', 'ai-assistant', 'artificial-intelligence', 'chatbot', 'conversational-ai', 'langchain'], 'visibility': 'public', 'forks': 0, 'open_issues': 0, 'watchers': 0, 'default_branch': 'main'}",
            ], 
        ),
        SingleTurnSample(
            user_input="cual es el ultimo commit hecho en el repositorio ai-chatbot-multiagent?",
            response="El último commit en el repositorio \"ai-chatbot-multiagent\" tiene el mensaje \"cleanup logs\" y fue hecho el 19 de enero de 2025 por fabimass.",
            retrieved_contexts=[
                """
                ```python
                    import requests

                    username = 'fabimass'
                    repo = 'ai-chatbot-multiagent'
                    url = f'https://api.github.com/repos/{username}/{repo}/commits'

                    response = requests.get(url)
                    commits = response.json()

                    if commits:
                        result = commits[0]
                    else:
                        result = 'No commits found'
                ```
                """, 
                "{'sha': '643901ce8a89088360dfc79c47895c0a408d3cd6', 'node_id': 'C_kwDOL5Dl89oAKDY0MzkwMWNlOGE4OTA4ODM2MGRmYzc5YzQ3ODk1YzBhNDA4ZDNjZDY', 'commit': {'author': {'name': 'fabimass', 'email': 'massotto.fabian@gmail.com', 'date': '2025-01-19T17:48:38Z'}, 'committer': {'name': 'fabimass', 'email': 'massotto.fabian@gmail.com', 'date': '2025-01-19T17:48:38Z'}, 'message': 'cleanup logs', 'tree': {'sha': '95c0daca15c2918a12266bb5279ac341a27b42d0', 'url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/git/trees/95c0daca15c2918a12266bb5279ac341a27b42d0'}, 'url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/git/commits/643901ce8a89088360dfc79c47895c0a408d3cd6', 'comment_count': 0, 'verification': {'verified': False, 'reason': 'unsigned', 'signature': None, 'payload': None, 'verified_at': None}}, 'url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/commits/643901ce8a89088360dfc79c47895c0a408d3cd6', 'html_url': 'https://github.com/fabimass/ai-chatbot-multiagent/commit/643901ce8a89088360dfc79c47895c0a408d3cd6', 'comments_url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/commits/643901ce8a89088360dfc79c47895c0a408d3cd6/comments', 'author': {'login': 'fabimass', 'id': 62024284, 'node_id': 'MDQ6VXNlcjYyMDI0Mjg0', 'avatar_url': 'https://avatars.githubusercontent.com/u/62024284?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/fabimass', 'html_url': 'https://github.com/fabimass', 'followers_url': 'https://api.github.com/users/fabimass/followers', 'following_url': 'https://api.github.com/users/fabimass/following{/other_user}', 'gists_url': 'https://api.github.com/users/fabimass/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/fabimass/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/fabimass/subscriptions', 'organizations_url': 'https://api.github.com/users/fabimass/orgs', 'repos_url': 'https://api.github.com/users/fabimass/repos', 'events_url': 'https://api.github.com/users/fabimass/events{/privacy}', 'received_events_url': 'https://api.github.com/users/fabimass/received_events', 'type': 'User', 'user_view_type': 'public', 'site_admin': False}, 'committer': {'login': 'fabimass', 'id': 62024284, 'node_id': 'MDQ6VXNlcjYyMDI0Mjg0', 'avatar_url': 'https://avatars.githubusercontent.com/u/62024284?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/fabimass', 'html_url': 'https://github.com/fabimass', 'followers_url': 'https://api.github.com/users/fabimass/followers', 'following_url': 'https://api.github.com/users/fabimass/following{/other_user}', 'gists_url': 'https://api.github.com/users/fabimass/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/fabimass/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/fabimass/subscriptions', 'organizations_url': 'https://api.github.com/users/fabimass/orgs', 'repos_url': 'https://api.github.com/users/fabimass/repos', 'events_url': 'https://api.github.com/users/fabimass/events{/privacy}', 'received_events_url': 'https://api.github.com/users/fabimass/received_events', 'type': 'User', 'user_view_type': 'public', 'site_admin': False}, 'parents': [{'sha': '0ba34539faca23fb7e40c264746afb453ddebdf5', 'url': 'https://api.github.com/repos/fabimass/ai-chatbot-multiagent/commits/0ba34539faca23fb7e40c264746afb453ddebdf5', 'html_url': 'https://github.com/fabimass/ai-chatbot-multiagent/commit/0ba34539faca23fb7e40c264746afb453ddebdf5'}]}",
            ], 
        ),
        SingleTurnSample(
            user_input="que informacion conoces del usuario fabimass?",
            response="El usuario fabimass se llama Fabian Massotto, es de Argentina y tiene un blog en https://fabianmassotto.vercel.app/. Tiene 25 repositorios públicos, 2 seguidores y sigue a 1 usuario. Fue creado el 10 de marzo de 2020.",
            retrieved_contexts=[
                """
                ```python
                    import requests

                    username = "fabimass"
                    url = f"https://api.github.com/users/{username}"

                    response = requests.get(url)

                    if response.status_code == 200:
                        result = response.json()
                    else:
                        result = None
                ```
                """, 
                "{'login': 'fabimass', 'id': 62024284, 'node_id': 'MDQ6VXNlcjYyMDI0Mjg0', 'avatar_url': 'https://avatars.githubusercontent.com/u/62024284?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/fabimass', 'html_url': 'https://github.com/fabimass', 'followers_url': 'https://api.github.com/users/fabimass/followers', 'following_url': 'https://api.github.com/users/fabimass/following{/other_user}', 'gists_url': 'https://api.github.com/users/fabimass/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/fabimass/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/fabimass/subscriptions', 'organizations_url': 'https://api.github.com/users/fabimass/orgs', 'repos_url': 'https://api.github.com/users/fabimass/repos', 'events_url': 'https://api.github.com/users/fabimass/events{/privacy}', 'received_events_url': 'https://api.github.com/users/fabimass/received_events', 'type': 'User', 'user_view_type': 'public', 'site_admin': False, 'name': 'Fabian Massotto', 'company': None, 'blog': 'https://fabianmassotto.vercel.app/', 'location': 'Argentina', 'email': None, 'hireable': None, 'bio': None, 'twitter_username': None, 'public_repos': 25, 'public_gists': 0, 'followers': 2, 'following': 1, 'created_at': '2020-03-10T14:57:19Z', 'updated_at': '2025-01-04T15:28:49Z'}"
            ], 
        )
    ]
}

In [None]:
chatbot_samples = [
    [
        HumanMessage(content="cuantos repositorios tiene el usuario fabimass?"),
        AIMessage(content="El usuario fabimass tiene 25 repositorios"),
        HumanMessage(content="cuantos repositorios tiene el usuario fabimass?"),
        AIMessage(content="El usuario fabimass tiene 25 repositorios"),
        HumanMessage(content="cual es la capital de Francia?"),
        AIMessage(content="No lo se"),
    ],
    [
        HumanMessage(content="cuantos repositorios tiene el usuario fabimass?"),
        AIMessage(content="El usuario fabimass tiene 25 repositorios"),
        HumanMessage(content="cuantos repositorios tiene el usuario fabimass?"),
        AIMessage(content="El usuario fabimass tiene 25 repositorios"),
        HumanMessage(content="cual es la capital de Francia?"),
        AIMessage(content="No lo se"),
    ],
    [
        HumanMessage(content="cuantos repositorios tiene el usuario fabimass?"),
        AIMessage(content="El usuario fabimass tiene 25 repositorios"),
        HumanMessage(content="cuantos repositorios tiene el usuario fabimass?"),
        AIMessage(content="El usuario fabimass tiene 25 repositorios"),
        HumanMessage(content="cual es la capital de Francia?"),
        AIMessage(content="No lo se"),
    ],
    [
        HumanMessage(content="cuantos repositorios tiene el usuario fabimass?"),
        AIMessage(content="El usuario fabimass tiene 25 repositorios"),
        HumanMessage(content="cuantos repositorios tiene el usuario fabimass?"),
        AIMessage(content="El usuario fabimass tiene 25 repositorios"),
        HumanMessage(content="cual es la capital de Francia?"),
        AIMessage(content="No lo se"),
    ],
]

## [Context Precision](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/context_precision)

Context Precision is a metric that measures the proportion of relevant chunks in the `retrieved_contexts`. It is calculated as the mean of the precision@k for each chunk in the context. Precision@k is the ratio of the number of relevant chunks at rank k to the total number of chunks at rank k.

$$
\text{Context Precision@K} = \frac{\sum_{k=1}^{K} \left( \text{Precision@k} \times v_k \right)}{\text{Total number of relevant items in the top } K \text{ results}}
$$

$$
\text{Precision@k} = {\text{true positives@k} \over  (\text{true positives@k} + \text{false positives@k})}
$$

Where $K$ is the total number of chunks in `retrieved_contexts` and $v_k \in \{0, 1\}$ is the relevance indicator at rank $k$.

The following metrics uses LLM to identify if a retrieved context is relevant or not.

In [7]:
from ragas.metrics import LLMContextPrecisionWithoutReference

context_precision = LLMContextPrecisionWithoutReference(llm=evaluator_llm)

In [38]:
print("RAG agent:")
for sample in samples["rag"]:
    print(await context_precision.single_turn_ascore(sample))

RAG agent:
0.99999999995


In [35]:
print("SQL agent:")
for sample in agent_samples["sql"]:
    print(await context_precision.single_turn_ascore(sample))

SQL agent:
0.9999999999
0.99999999995
0.99999999995
0.49999999995
0.99999999995


In [22]:
print("CSV agent:")
for sample in agent_samples["csv"]:
    print(await context_precision.single_turn_ascore(sample))

CSV agent:
0.99999999995
0.49999999995
0.9999999999
0.99999999995
0.49999999995


In [32]:
print("API agent:")
for sample in agent_samples["api"]:
    print(await context_precision.single_turn_ascore(sample))

API agent:
0.99999999995
0.99999999995
0.99999999995
0.49999999995
0.49999999995


## [Response Relevancy](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/answer_relevance)

The `ResponseRelevancy` metric measures how relevant a response is to the user input. Higher scores indicate better alignment with the user input, while lower scores are given if the response is incomplete or includes redundant information.  

This metric is calculated using the `user_input` and the `response` as follows:  

1. Generate a set of artificial questions (default is 3) based on the response. These questions are designed to reflect the content of the response.  
2. Compute the cosine similarity between the embedding of the user input ($E_o$) and the embedding of each generated question ($E_{g_i}$).  
3. Take the average of these cosine similarity scores to get the **Answer Relevancy**:  

$$
\text{Answer Relevancy} = \frac{1}{N} \sum_{i=1}^{N} \text{cosine similarity}(E_{g_i}, E_o)
$$  

$$
\text{Answer Relevancy} = \frac{1}{N} \sum_{i=1}^{N} \frac{E_{g_i} \cdot E_o}{\|E_{g_i}\| \|E_o\|}
$$  

Where:  
- $E_{g_i}$: Embedding of the $i^{th}$ generated question.  
- $E_o$: Embedding of the user input.  
- $N$: Number of generated questions (default is 3).  

**Note**: While the score usually falls between 0 and 1, it is not guaranteed due to cosine similarity's mathematical range of -1 to 1.

An answer is considered relevant if it directly and appropriately addresses the original question. This metric focuses on how well the answer matches the intent of the question, without evaluating factual accuracy. It penalizes answers that are incomplete or include unnecessary details.

In [10]:
from ragas.metrics import ResponseRelevancy

scorer = ResponseRelevancy(llm=evaluator_llm, embeddings=evaluator_embeddings)

In [42]:
print("RAG agent:")
for sample in agent_samples["rag"]:
    print(await scorer.single_turn_ascore(sample))

RAG agent:
0.9678979668947822


In [36]:
print("SQL agent:")
for sample in agent_samples["sql"]:
    print(await scorer.single_turn_ascore(sample))

SQL agent:
0.9589774927638101
0.8832749159883928
0.9422496884815134
0.9275096994863962
0.9612148571283591


In [21]:
print("CSV agent:")
for sample in agent_samples["csv"]:
    print(await scorer.single_turn_ascore(sample))

CSV agent:
0.9315784215807223
0.7411933726693859
0.9655352038974514
0.9732890637929135
0.9577689828663951


In [33]:
print("API agent:")
for sample in agent_samples["api"]:
    print(await scorer.single_turn_ascore(sample))

API agent:
0.9826498514154302
0.931099881808738
0.9602784588053447
0.9346806414165897
0.85745108966607


## Topic Adherence

AI systems deployed in real-world applications are expected to adhere to domains of interest while interacting with users but LLMs sometimes may answer general queries by ignoring this limitation. The topic adherence metric evaluates the ability of the AI to stay on predefined domains during the interactions. This metric is particularly important in conversational AI systems, where the AI is expected to only provide assistance to queries related to predefined domains.

`TopicAdherenceScore` requires a predefined set of topics that the AI system is expected to adhere to which is provided using `reference_topics` along with `user_input`. The metric can compute precision, recall, and F1 score for topic adherence, defined as 
    
$$
\text{Precision } = {|\text{Queries that are answered and are adheres to any present reference topics}| \over |\text{Queries that are answered and are adheres to any present reference topics}| + |\text{Queries that are answered and do not adheres to any present reference topics}|}
$$

$$
\text{Recall } = {|\text{Queries that are answered and are adheres to any present reference topics}| \over |\text{Queries that are answered and are adheres to any present reference topics}| + |\text{Queries that were refused and should have been answered}|}
$$

$$
\text{F1 Score } = {2 \times \text{Precision} \times \text{Recall} \over \text{Precision} + \text{Recall}}
$$

In [81]:

from ragas.metrics import TopicAdherenceScore





sample = MultiTurnSample(user_input=sample_input, reference_topics=["github"])
scorer = TopicAdherenceScore(llm = evaluator_llm, mode="precision")
await scorer.multi_turn_ascore(sample)

0.9999999999

In [73]:
from ragas.metrics import ToolCallAccuracy
from ragas.dataset_schema import  MultiTurnSample
from ragas.messages import HumanMessage,AIMessage,ToolMessage,ToolCall

sample = [
    HumanMessage(content="What's the weather like in New York right now?"),
    AIMessage(content="The current temperature in New York is 75°F and it's partly cloudy.", tool_calls=[
        ToolCall(name="weather_check", args={"location": "New York"})
    ]),
    HumanMessage(content="Can you translate that to Celsius?"),
    AIMessage(content="Let me convert that to Celsius for you.", tool_calls=[
        ToolCall(name="temperature_conversion", args={"temperature_fahrenheit": 75})
    ]),
    ToolMessage(content="75°F is approximately 23.9°C."),
    AIMessage(content="75°F is approximately 23.9°C.")
]

sample = MultiTurnSample(
    user_input=sample,
    reference_tool_calls=[
        ToolCall(name="fabi", args={"location": "New York"}),
        ToolCall(name="temperature_conversion", args={"temperature_fahrenheit": 75})
    ]
)

scorer = ToolCallAccuracy()
await scorer.multi_turn_ascore(sample)

0.0