This guide explains how to use the DataSet API to manage datasets for evaluating your AI Assistants. You'll find examples for each endpoint, demonstrating how to interact with the API using Python.

# Initialization

Before you begin, set up your environment with the necessary parameters:

### Set the base URL and API token

1. `global_url`:  Set this to the URL of your Globant Enterprise AI environment (e.g., "https://eval-api.saia.ai"). This is represented by the `$BASE_URL` variable.

2. `global_token`:  Provide your organization's API token, represented by the `$SAIA_PROJECT_APITOKEN` variable.

In [3]:
global_url = "https://eval-api.saia.ai"
global_token = "geai_-NfFwi7jgdkGA6VAP0gR4ZtQyiaXKHgpQu5moZtCF_3enyAqsAKXDFMEXmv0vfrtuLipPjKSlArvyKJpZ0_9yg"


import requests
import urllib3
import json

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {global_token}",
    "Accept": "application/json; charset=UTF-8"
}

# Working with Datasets

The DataSet API provides a comprehensive set of endpoints for managing the datasets you'll use to evaluate your AI Assistants. You can create, retrieve, update, and delete datasets, as well as manage the individual rows within each dataset.

## Listing Datasets

This endpoint retrieves a list of all datasets.

In [10]:
# Service URL
url = f"{global_url}/dataSetApi/dataSets"

try:
    # Make the GET request
    response = requests.get(url, headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)  # Displays the response content in case of an error

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")

Service response:
[{'dataSetActive': True, 'dataSetCreateDate': '2025-05-12', 'dataSetDescription': 'geai_01', 'dataSetId': 'fbaa7af6-1f11-4e7b-b24d-fd5071759fc4', 'dataSetName': 'geai_01', 'dataSetType': 'T', 'dataSetUpdateDate': '2025-05-12', 'rows': [{'dataSetRowContextDocument': 'Globant Enterprise AI Overview', 'dataSetRowExpectedAnswer': 'Globant Enterprise AI is a business platform designed to facilitate the implementation of AI assistants tailored to specific needs and areas of expertise. It allows users to create AI assistants that can integrate and interact with current operations, processes, systems, and documents, paving the way for innovation and productivity. A key feature of Globant Enterprise AI is the ability to select a Large Language Model (LLM) and switch to another without needing to change the definitions of existing assistants. This platform acts as a secure bridge, connecting enterprise applications to LLMs while ensuring the protection of data, which will not b

## Creating a Dataset

This endpoint creates a new dataset.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet"

# JSON data to be sent in the request body
payload = {
	"dataSetName": "Test Dataset",
	"dataSetDescription": "the policy you use",
	"dataSetType": "T",
	"dataSetActive": True,
	"rows": [
	]
}

# Make the POST request
try:
    response = requests.post(url, json=payload, headers=headers, verify=False)

    # Check the response status code
    if response.status_code == 200:
        print("Request successful. Server response:")
        print(json.dumps(response.json(), indent=4, ensure_ascii=False))  # Print the response in JSON format

        # Load the JSON response
        response_data = response.json()

        # Extract the dataSetId
        data_set_id = response_data.get("dataSetId")

        # Save it in a global variable or file
        # Global variable
        global_data_set_id = data_set_id  # This will be available in the Notebook

    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")

## Retrieving a Dataset

This endpoint retrieves a specific dataset by its `Id`.

In [9]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}"

try:
    # Make the GET request
    response = requests.get(url, headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)  # Displays the response content in case of an error

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")

Service response:
{'dataSetActive': True, 'dataSetCreateDate': '2025-05-12', 'dataSetDescription': 'geai_01', 'dataSetId': 'fbaa7af6-1f11-4e7b-b24d-fd5071759fc4', 'dataSetName': 'geai_01', 'dataSetType': 'T', 'dataSetUpdateDate': '2025-05-12', 'rows': [{'dataSetRowContextDocument': 'Globant Enterprise AI Overview', 'dataSetRowExpectedAnswer': 'Globant Enterprise AI is a business platform designed to facilitate the implementation of AI assistants tailored to specific needs and areas of expertise. It allows users to create AI assistants that can integrate and interact with current operations, processes, systems, and documents, paving the way for innovation and productivity. A key feature of Globant Enterprise AI is the ability to select a Large Language Model (LLM) and switch to another without needing to change the definitions of existing assistants. This platform acts as a secure bridge, connecting enterprise applications to LLMs while ensuring the protection of data, which will not be

## Updating a Dataset

This endpoint updates an existing dataset.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}"

# JSON data to be sent in the request body
payload = {
    "dataSetName": "TEST_B",
    "dataSetDescription": "TEST_B",
    "dataSetType": "T",
    "dataSetActive": True
}

# Make the PUT request
try:
    response = requests.put(url, json=payload, headers=headers, verify=False)

    # Check the response status code
    if response.status_code == 200:
        print("Request successful. Server response:")
        print(json.dumps(response.json(), indent=4))  # Print the response in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")

In [5]:
global_data_set_id = "e2057ff6-4b34-45ea-a464-8e51474c2c63"

In [8]:
global_data_set_id = "fbaa7af6-1f11-4e7b-b24d-fd5071759fc4"

## Deleting a Dataset

This endpoint deletes a dataset.

In [6]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}"

try:
    # Make the DELETE request
    response = requests.delete(url, headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)  # Displays the response content in case of an error

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")

Service response:
{}


# Working with Dataset Rows

The DataSet API also provides endpoints for managing individual rows within a dataset. These endpoints allow you to:

* Create new rows in a dataset.
* List rows for a specific dataset.
* Retrieve, update, and delete individual dataset rows.

## Inserting a Dataset

This endpoint creates a new dataset to which you can then add rows.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet"

# JSON data to be sent in the request body
payload = {
    "dataSetName": "TESTA_A",
    "dataSetDescription": "TEST_A",
    "dataSetType": "T",
    "dataSetActive": True,
    "rows": []
}

# Make the POST request
try:
    response = requests.post(url, json=payload, headers=headers, verify=False)

    # Check the response status code
    if response.status_code == 200:
        print("Request successful. Server response:")
        print(json.dumps(response.json(), indent=4))  # Print the response in JSON format

        # Load the JSON response
        response_data = response.json()

        # Extract the dataSetId
        data_set_id = response_data.get("dataSetId")

        # Store it in a global variable or file
        global_data_set_id = data_set_id  # This will be available in the Notebook

    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")

## Listing Dataset Rows

You can retrieve a list of rows for a specific dataset using the following endpoint:

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRows"

try:
    # Make the GET request
    response = requests.get(url,  headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)  # Show the response content in case of an error

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")

## Inserting a Dataset Row

This endpoint adds a new row to a dataset.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow"

# JSON data to be sent in the request body
payload = {
    "dataSetRowExpectedAnswer": "ANSWER_B",
    "dataSetRowContextDocument": "",
    "dataSetRowInput": "INPUT_B",
    "expectedSources": [
        {
            "dataSetExpectedSourceName": "NAME_B",
            "dataSetexpectedSourceExtention": "JSON",
            "dataSetExpectedSourceValue": "VALUE_B"
        }
    ],
    "filterVariables": [
        {
            "dataSetMetadataType": "V",
            "dataSetRowFilterKey": "KEY_B",
            "dataSetRowFilterValue": "VALUE_B",
            "dataSetRowFilterOperator": "OPERATOR_B"
        }
    ]
}

# Make the POST request
try:
    response = requests.post(url, json=payload, headers=headers, verify=False)

    # Check the response status code
    if response.status_code == 200:
        print("Request successful. Server response:")
        print(json.dumps(response.json(), indent=4))  # Print the response in JSON format

        # Load the JSON response
        response_data = response.json()

        # Extract the dataSetRowId
        data_set_row_id = response_data.get("dataSetRowId")

        # Store it in a global variable or a file
        # Global variable
        global_data_set_row_id = data_set_row_id  # This will be available in the Notebook

    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


## Retrieving a Dataset Row

This endpoint retrieves a specific dataset row by its `Id`.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{data_set_row_id}"

try:
    # Make the GET request
    response = requests.get(url, headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)  # Display the response content in case of an error

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


## Updating a Dataset Row

This endpoint updates an existing dataset row.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{data_set_row_id}"

# JSON data to be sent in the request body
payload = {
    "dataSetRowInput": "INPUT_C",
    "dataSetRowExpectedAnswer": "ANSWER_C",
    "dataSetRowContextDocument": ""
}

# Make the PUT request
try:
    response = requests.put(url, json=payload, headers=headers, verify=False)

    # Check the response code
    if response.status_code == 200:
        print("Request successful. Server response:")
        print(json.dumps(response.json(), indent=4))  # Print the response in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


## Deleting a Dataset Row

This endpoint deletes a dataset row.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{data_set_row_id}"

try:
    # Make the DELETE request
    response = requests.delete(url, headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)  # Show the response content in case of an error

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


# Uploading Data via Files

The DataSet API allows you to efficiently populate datasets by uploading data directly from JSON files. You can upload either a complete dataset or multiple rows to an existing dataset.

## Uploading a Complete Dataset

This endpoint allows you to create a new dataset by uploading data from a JSON file.

In [None]:
# API URL
url = f"{global_url}/dataSetApi/dataSet/FileUpload"

# File to be uploaded
file_name = "DataSetExample.json"  # Replace with your file path

# Send the POST request with JSON data
try:
    # Read the JSON file
    with open(file_name, 'r') as file:
        json_data = json.load(file)  # Load JSON from the file

    # Send the POST request with the JSON data
    response = requests.post(url, json=json_data, headers=headers, verify=False)

    # Print the response code and content
    print(f"Response Code: {response.status_code}")
    print(response.text)

except FileNotFoundError:
    print(f"The file {file_name} was not found in the notebook directory.")
except requests.exceptions.RequestException as e:
    print(f"Request error occurred: {e}")
except json.JSONDecodeError:
    print(f"Error decoding JSON from the file. Please check the file format.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

## Uploading Multiple Dataset Rows

You can use this endpoint to add multiple rows to an existing dataset by uploading data from a JSON file.

In [None]:
# API URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/FileUpload"

# File to be uploaded
file_name = "Test.json"  # Replace with your file path

# Send the POST request with JSON data
try:
    # Read the JSON file
    with open(file_name, 'r') as file:
        json_data = json.load(file)  # Load JSON from the file

    # Send the POST request with the JSON data
    response = requests.post(url, json=json_data, headers=headers, verify=False)

    # Print the response code and content
    print(f"Response Code: {response.status_code}")
    print(response.text)

except FileNotFoundError:
    print(f"The file {file_name} was not found in the notebook directory.")
except requests.exceptions.RequestException as e:
    print(f"Request error occurred: {e}")
except json.JSONDecodeError:
    print(f"Error decoding JSON from the file. Please check the file format.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")


# Managing Expected Sources and Filter Variables

The API also offers endpoints for managing expected sources and filter variables associated with dataset rows. You can do the following:

*  **Expected Sources:** Create, list, retrieve, update, and delete expected sources for a dataset row.
*  **Filter Variables:** Create, list, retrieve, update, and delete filter variables for a dataset row.

## Working with Expected Sources

Expected sources provide information about the expected origins of the data in a dataset row. You can use the following endpoints to manage them:

###  Inserting a Dataset Row

This endpoint adds a new row to a dataset.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow"

# JSON data to send in the request body
payload = {
    "dataSetRowExpectedAnswer": "ANSWER_B",
    "dataSetRowContextDocument": "",
    "dataSetRowInput": "INPUT_B",
    "expectedSources": [],
    "filterVariables": []
}

# Make the POST request
try:
    response = requests.post(url, json=payload, headers=headers, verify=False)

    # Check the response code
    if response.status_code == 200:
        print("Request successful. Server response:")
        print(json.dumps(response.json(), indent=4))  # Print the response in JSON format

        # Load the JSON response
        response_data = response.json()

        # Extract the dataSetRowId
        data_set_row_id = response_data.get("dataSetRowId")

        # Save it to a global variable or file
        # Global variable
        global_data_set_row_id = data_set_row_id  # This will be available in the notebook

    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


### Listing Expected Sources

This endpoint retrieves a list of all expected sources for a specific dataset row.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{global_data_set_row_id}/dataSetRowExpectedSources"

try:
    # Make the GET request
    response = requests.get(url, headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)  # Display the content of the response in case of an error

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


### Inserting an Expected Source

This endpoint adds a new expected source to a dataset row.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{global_data_set_row_id}/dataSetRowExpectedSource"

# JSON data to send in the POST request
payload = {
    "dataSetExpectedSourceName": "NAME_C",
    "dataSetexpectedSourceExtention": "JSON",
    "dataSetExpectedSourceValue": "VALUE_C"
}

# Make the POST request
try:
    response = requests.post(url, json=payload, headers=headers, verify=False)

    # Check the response code
    if response.status_code == 200:
        print("Request successful. Server response:")
        print(json.dumps(response.json(), indent=4))  # Print the response in JSON format

        # Load the JSON response
        response_data = response.json()

        # Extract the dataSetExpectedSourceId
        data_set_row_expected_source_id = response_data.get("dataSetExpectedSourceId")

        # Store it in a global variable or file
        # Global variable
        global_data_set_row_expected_source_id = data_set_row_expected_source_id  # This will be available in the Notebook

    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


### Retrieving an Expected Source

This endpoint retrieves a specific expected source by its `Id`.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{global_data_set_row_id}/dataSetRowExpectedSource/{global_data_set_row_expected_source_id}"

try:
    # Make the GET request
    response = requests.get(url, headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Error in the request: {response.status_code}")
        print(response.text)  # Print the response content in case of an error

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


### Updating an Expected Source

This endpoint updates an existing expected source.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{global_data_set_row_id}/dataSetRowExpectedSource/{global_data_set_row_expected_source_id}"

# JSON data to send in the body of the request
payload = {
    "dataSetExpectedSourceName": "NAME_D",
    "dataSetexpectedSourceExtention": "TXT",
    "dataSetExpectedSourceValue": "VALUE_D"
}

# Make the PUT request
try:
    response = requests.put(url, json=payload, headers=headers, verify=False)

    # Check the response code
    if response.status_code == 200:
        print("Request successful. Server response:")
        print(json.dumps(response.json(), indent=4))  # Print the response in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


### Deleting an Expected Source

This endpoint removes an expected source from a dataset row.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{global_data_set_row_id}/dataSetRowExpectedSource/{global_data_set_row_expected_source_id}"

try:
    # Perform the DELETE request
    response = requests.delete(url, headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")

## Working with Filter Variables

Filter variables allow you to define criteria for filtering dataset rows during evaluation. You can manage them using the following endpoints:

### Listing Filter Variables

This endpoint retrieves a list of all filter variables for a specific dataset row.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{global_data_set_row_id}/dataSetRowFilterVariables"

try:
    # Perform the GET request
    response = requests.get(url, headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


### Inserting a Filter Variable

This endpoint adds a new filter variable to a dataset row.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{global_data_set_row_id}/dataSetRowFilterVariable"

# JSON data to be sent in the request body
payload = {
    "dataSetMetadataType": "V",
    "dataSetRowFilterKey": "KEY_C",
    "dataSetRowFilterValue": "VALUE_C",
    "dataSetRowFilterOperator": "OPERATOR_C"
}

# Make the POST request
try:
    response = requests.post(url, json=payload, headers=headers, verify=False)

    # Check the response code
    if response.status_code == 200:
        print("Request successful. Server response:")
        print(json.dumps(response.json(), indent=4))  # Print the response in JSON format

        # Load the JSON response
        response_data = response.json()

        # Extract the dataSetRowFilterVarId
        data_set_row_filter_variable_id = response_data.get("dataSetRowFilterVarId")

        # Save it in a global variable or file
        # Global variable
        global_data_set_row_filter_variable_id = data_set_row_filter_variable_id  # This will be available in the Notebook

    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


### Retrieving a Filter Variable

This endpoint retrieves a specific filter variable by its `Id`.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{global_data_set_row_id}/dataSetRowFilterVariable/{global_data_set_row_filter_variable_id}"

try:
    # Make the GET request
    response = requests.get(url, headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


### Updating a Filter Variable

This endpoint updates an existing filter variable.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{global_data_set_row_id}/dataSetRowFilterVariable/{global_data_set_row_filter_variable_id}"

# JSON data to be sent in the body of the PUT request
payload = {
    "dataSetMetadataType": "F",
    "dataSetRowFilterKey": "KEY_D",
    "dataSetRowFilterValue": "VALUE_D",
    "dataSetRowFilterOperator": "OPERATOR_D"
}

# Perform the PUT request
try:
    response = requests.put(url, json=payload, headers=headers, verify=False)

    # Check the response code
    if response.status_code == 200:
        print("Request successful. Server response:")
        print(json.dumps(response.json(), indent=4))  # Print the response in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")


### Deleting a Filter Variable

This endpoint removes a filter variable from a dataset row.

In [None]:
# Service URL
url = f"{global_url}/dataSetApi/dataSet/{global_data_set_id}/dataSetRow/{global_data_set_row_id}/dataSetRowFilterVariable/{global_data_set_row_filter_variable_id}"

try:
    # Perform the DELETE request
    response = requests.delete(url, headers=headers, verify=False)

    # Check if the request was successful
    if response.status_code == 200:
        print("Service response:")
        print(response.json())  # If the response is in JSON format
    else:
        print(f"Request error: {response.status_code}")
        print(response.text)

except requests.exceptions.RequestException as e:
    print(f"Error connecting to the service: {e}")