# Utilities for Label Studio

For a better understanding, follow the co-located [`README.md`](./README.md); this notebook is a companion of that general guide.

Table of contents:

- CSV/JSON Dataset for Import
  - Pre-Annotated Dataset as JSON
- API Usage
  - List projects and their information
  - List the tasks (labeled / samples to be labeled) of a project
  - Label tasks programmatically in bulk
- SDK Usage
  - Create a project
  - Add/Import tasks: empty of pre-annotated
  - Filters: see [Prepare and manage data with filters](https://labelstud.io/guide/sdk#Prepare-and-manage-data-with-filters)
- ML Backend

In [1]:
import os
import csv
import json
import pandas as pd
import random

from dotenv import load_dotenv
# Load .env file
load_dotenv()

True

## CSV/JSON Dataset for Import

This section shows how to create a CSV/JSON of the image URLs. Those CSV/JSON files need to be uploaded to Label Studio.
The URLs point to the images of the desired dataset and they are served with the co-located `./serve_local_files.py`.
Note that `SERVER_DIRECTORY` needs to be the same here and in `./serve_local_files.py`, pointing in both cases to the root folder were the paths are taken from.

In the case of the JSON file, it contains pre-annotated data.

In [36]:
def list_image_files(directory, server_directory,  base_url="http://localhost:8000/"):
    """
    Recursively lists all image URLs from a local server for the images in the given directory and its subdirectories.

    :param directory: Path to the directory.
    :param server_directory: Path to the directory from which the server is started.
    :return: List of URLs to image files served from a local server.
    """
    
    # List of common image extensions
    image_extensions = ['.jpg', '.jpeg', '.png', '.bmp', '.gif', '.tiff', '.webp']

    # Recursively walk through the directory
    image_urls = []
    for dirpath, _, filenames in os.walk(directory):
        for filename in filenames:
            if any(filename.lower().endswith(ext) for ext in image_extensions):
                # Convert file path to a URL path
                relative_path = os.path.relpath(os.path.join(dirpath, filename), server_directory)
                web_path = relative_path.replace('\\', '/')
                full_url = base_url + web_path
                
                image_urls.append(full_url)
                
    return image_urls

In [37]:
def save_to_csv(image_paths, output_file):
    """
    Save list of image paths to a CSV file.

    :param image_paths: List of image paths.
    :param output_file: Path to the output CSV file.
    """
    with open(output_file, 'w', newline='') as csvfile:
        csv_writer = csv.writer(csvfile)
        csv_writer.writerow(["image_path"])  # Writing the header
        for path in image_paths:
            csv_writer.writerow([path])

In [38]:
DIRECTORY_PATH = 'C:/Users/Msagardi/git_repositories/tool_guides/labelstudio/data/flowers/test/'
SERVER_DIRECTORY = 'C:/Users/Msagardi/git_repositories/tool_guides/labelstudio/data'
image_paths = list_image_files(DIRECTORY_PATH, SERVER_DIRECTORY)
print(image_paths[:5]) # ['http://localhost:8000/flowers/test/Image_1.jpg', 'http://localhost:8000/flowers/test/Image_10.jpg', ...

['http://localhost:8000/flowers/test/Image_1.jpg', 'http://localhost:8000/flowers/test/Image_10.jpg', 'http://localhost:8000/flowers/test/Image_100.jpg', 'http://localhost:8000/flowers/test/Image_101.jpg', 'http://localhost:8000/flowers/test/Image_102.jpg']


In [22]:
output_csv_path = 'image_paths.csv'
save_to_csv(image_paths, output_csv_path)

In [35]:
# Now, in ./serve_local_files.py, we need to set
#   SERVER_DIRECTORY
# with the same path as here.
# Then, we execute it:
#   python serve_local_files.py
# With that, we're going to get the images served in the URLs

For reference, that `serve_local_files.py` contains the following code:

```python
from flask import Flask, send_from_directory
from flask_cors import CORS

SERVER_DIRECTORY = 'C:/Users/Msagardi/git_repositories/tool_guides/labelstudio/data/'

app = Flask(__name__)
CORS(app)  # This will enable CORS for all routes

@app.route('/<path:path>')
def serve_file(path):
    return send_from_directory(SERVER_DIRECTORY, path)

if __name__ == '__main__':
    app.run(port=8000)

```

### Pre-Annotated Dataset as JSON

In [29]:
def create_dataset_dataframe(image_paths):
    """
    Create a DataFrame with given image paths, and two additional columns:
    prediction and cluster.

    :param image_paths: List of image URLs.
    :return: DataFrame with columns: image_paths, prediction, cluster.
    """
    
    flower_types = ['daisy', 'dandelion', 'rose', 'sunflower', 'tulip']
    predictions = [random.choice(flower_types) for _ in image_paths]
    clusters = [random.randint(0, 5) for _ in image_paths]
    
    df = pd.DataFrame({
        'image_paths': image_paths,
        'prediction': predictions,
        'cluster': clusters
    })

    return df

df = create_dataset_dataframe(image_paths)

In [30]:
df.head()

Unnamed: 0,image_paths,prediction,cluster
0,http://localhost:8000/flowers/test/Image_1.jpg,sunflower,0
1,http://localhost:8000/flowers/test/Image_10.jpg,tulip,4
2,http://localhost:8000/flowers/test/Image_100.jpg,tulip,1
3,http://localhost:8000/flowers/test/Image_101.jpg,rose,0
4,http://localhost:8000/flowers/test/Image_102.jpg,daisy,2


In [32]:
def save_to_json_preannotated(df, output_filepath):
    """
    Save DataFrame to a JSON file suitable for importing into Label Studio.

    :param df: DataFrame with columns: image_paths, prediction, cluster.
    :param output_filepath: Path to the output JSON file.
    """
    
    output_data = []

    for _, row in df.iterrows():
        item = {
            "data": {
                "image_path": row['image_paths'],
                "cluster": row['cluster']  # can be accessed in the Filters!
            },
            "predictions": [
                {
                    "result": [
                        {
                            "from_name": "class",
                            "to_name": "image",
                            "type": "choices",
                            "value": {
                                "choices": [row['prediction']]
                            }
                        }
                    ]
                }
            ]
        }
        output_data.append(item)
    
    with open(output_filepath, 'w') as outfile:
        json.dump(output_data, outfile, indent=4)


output_filepath = "images_paths_preannotated.json"
save_to_json_preannotated(df, output_filepath)

## API Usage

Label Studio starts a REST API which we can interact with.
For all calls, we need to set our `LABEL_STUDIO_API_TOKEN`, obtained in the Label Studio account settings.
Here, the token is in the environment variables.

All API calls are listed here: [Label Studio API](https://labelstud.io/api).
Examples shown here:

- 1. List projects and their information.
- 2. List the tasks (labeled / samples to be labeled) of a project.
- 3. Label tasks programmatically in bulk.
- But there are much more options!

In [3]:
import requests

In [4]:
# Access the LABEL_STUDIO_API_TOKEN
LABEL_STUDIO_API_TOKEN = os.getenv("LABEL_STUDIO_API_TOKEN")

In [8]:
# Base URL
base_url = "http://localhost:8080"

# Setup headers with the API token
headers = {
    "Authorization": f"Token {LABEL_STUDIO_API_TOKEN}"
}

In [13]:
## -- List projects and their information
response = requests.get(f"{base_url}/api/projects", headers=headers)

if response.status_code == 200:
    projects = response.json()
    # projects = {count, next, previous, results}
    for project_dict in projects["results"]: # for all projects
        #print(project_dict)  # dict
        project_json = json.dumps(project, indent=4) # json, for nice print
        print(project_json)
else:
    print(f"Failed to retrieve projects. Status code: {response.status_code}")
    print(response.text)  # This might give additional info about the error.

{
    "id": 6,
    "title": "Flowers",
    "description": "",
    "label_config": "<View>\n  <Image name=\"image\" value=\"$image_path\" zoom=\"true\" zoomControl=\"true\" rotateControl=\"true\"/>\n  <Choices name=\"class\" toName=\"image\">\n    <Choice value=\"daisy\"/>\n    <Choice value=\"dandelion\"/>\n    <Choice value=\"rose\"/>\n    <Choice value=\"sunflower\"/>\n    <Choice value=\"tulip\"/>\n  </Choices>\n</View>",
    "expert_instruction": "",
    "show_instruction": false,
    "show_skip_button": true,
    "enable_empty_annotation": true,
    "show_annotation_history": false,
    "organization": 1,
    "color": "#FFFFFF",
    "maximum_annotations": 1,
    "is_published": false,
    "model_version": "undefined",
    "is_draft": false,
    "created_by": {
        "id": 1,
        "first_name": "",
        "last_name": "",
        "email": "mxagar@gmail.com",
        "avatar": null
    },
    "created_at": "2023-09-29T17:03:54.864681Z",
    "min_annotations_to_start_training":

In [24]:
## -- List the tasks in a project

def get_all_tasks(project_id, limit=None):
    page = 1
    tasks = []
    num_fetched_tasks = 0

    def determine_page_size(project_id):
        response = requests.get(f"{base_url}/api/projects/{project_id}/tasks?page=1", headers=headers)
        
        if response.status_code != 200:
            print(f"Failed to retrieve tasks to determine page size. Status code: {response.status_code}")
            return None

        tasks = response.json()
        return len(tasks)

    page_size = determine_page_size(project_id)

    while True:
        #response = requests.get(f"{base_url}/api/projects/{project_id}/tasks", headers=headers)
        response = requests.get(f"{base_url}/api/projects/{project_id}/tasks?page={page}", headers=headers)
        if response.status_code != 200:
            print(f"Failed to retrieve tasks on page {page}. Status code: {response.status_code}")
            break

        page_data = response.json()
        tasks.extend(page_data)
        num_fetched_tasks += len(page_data)

        # Check if there are more pages
        if not page_data or len(page_data) < page_size:
            break
        if limit is not None:
            if num_fetched_tasks >= limit:
                tasks = tasks[:num_fetched_tasks]
                break

        page += 1

    return tasks

In [26]:
project_id = 6 # can be obtained with previous call
tasks = get_all_tasks(project_id)
print(f"Total tasks: {len(tasks)}")

for task_dict in tasks[:10]:
    #print(task_dict) # dict
    task_json = json.dumps(task, indent=4) # nice formatting
    print(formatted_task)

Total tasks: 1848
{
    "id": 1802,
    "data": {
        "image_path": "http://localhost:8000/flowers/test/Image_89.jpg",
        "cluster": 1
    },
    "meta": {},
    "created_at": "2023-10-02T13:34:51.031380Z",
    "updated_at": "2023-10-02T13:34:51.031380Z",
    "is_labeled": false,
    "overlap": 1,
    "inner_id": 1802,
    "total_annotations": 0,
    "cancelled_annotations": 0,
    "total_predictions": 1,
    "comment_count": 0,
    "unresolved_comment_count": 0,
    "last_comment_updated_at": null,
    "project": 6,
    "updated_by": null,
    "file_upload": 6,
    "comment_authors": [],
    "annotations": [],
    "predictions": [
        {
            "id": 878,
            "model_version": "undefined",
            "created_ago": "2\u00a0hours, 59\u00a0minutes",
            "result": [
                {
                    "from_name": "class",
                    "to_name": "image",
                    "type": "choices",
                    "value": {
                      

In [29]:
## -- Programmatically annotate tasks in bulk

# Fetch (all) tasks for the specified project
# Note: limited to the first 90 tasks!
project_id = 6 # can be obtained with previous call
tasks = get_all_tasks(project_id, limit=90)

# Define the list of flower classes
flower_classes = ['daisy', 'dandelion', 'rose', 'sunflower', 'tulip']

# Iterate through each task and update its annotations
for task in tasks:
    task_id = task['id']
    cluster = task['data']['cluster']

    # Filter by cluster, if desired
    if cluster == 1:
        # Create a random annotation for the task
        annotation_class = random.choice(flower_classes)
        payload = {
            "result": [{
                "from_name": "class",
                "to_name": "image",
                "type": "choices",
                "value": {"choices": [annotation_class]}
            }],
            "last_action": "prediction",
            "task": task_id,
            "project": project_id
        }
        
        response = requests.post(f"{base_url}/api/tasks/{task_id}/annotations", headers=headers, json=payload)
        if response.status_code != 201:
            print(f"Failed to update task {task_id}. Status code: {response.status_code}")
            print(response.text)
        else:
            print(f"Successfully updated task {task_id} with annotation: {annotation_class}")


Successfully updated task 1802 with annotation: tulip
Successfully updated task 1818 with annotation: sunflower
Successfully updated task 1816 with annotation: daisy
Successfully updated task 1814 with annotation: sunflower
Successfully updated task 1813 with annotation: tulip
Successfully updated task 1824 with annotation: dandelion
Successfully updated task 1823 with annotation: rose
Successfully updated task 1836 with annotation: sunflower
Successfully updated task 1833 with annotation: dandelion
Successfully updated task 1843 with annotation: daisy
Successfully updated task 1840 with annotation: daisy
Successfully updated task 1780 with annotation: sunflower
Successfully updated task 1782 with annotation: daisy
Successfully updated task 1770 with annotation: dandelion
Successfully updated task 1777 with annotation: daisy
Successfully updated task 1761 with annotation: sunflower
Successfully updated task 1767 with annotation: dandelion


## SDK Usage

In [31]:
# Import the SDK and the client module
from label_studio_sdk import Client

In [32]:
# Access the LABEL_STUDIO_API_TOKEN
LABEL_STUDIO_API_TOKEN = os.getenv("LABEL_STUDIO_API_TOKEN")
LABEL_STUDIO_URL = 'http://localhost:8080'

In [34]:
# Connect to the Label Studio API and check the connection
ls = Client(url=LABEL_STUDIO_URL, api_key=LABEL_STUDIO_API_TOKEN)
ls.check_connection() # {'status': 'UP'}

{'status': 'UP'}

### Create a Project

In [35]:
# Create a project with a template
# More templates: https://labelstud.io/templates
# After running the code, check the new project in the web UI
project = ls.start_project(
    title='Flowers 2',
    label_config='''
    <View>
    <Image name="image" value="$image_path" zoom="true" zoomControl="true" rotateControl="true"/>
    <Choices name="class" toName="image">
        <Choice value="daisy"/>
        <Choice value="dandelion"/>
        <Choice value="rose"/>
        <Choice value="sunflower"/>
        <Choice value="tulip"/>
    </Choices>
    </View>
    '''
)

### Add/Import Tasks: Empty of Pre-Annotated

In Label Studio, tasks are *imported*.
We follow the [Label Studio JSON format](https://labelstud.io/guide/tasks#Basic-Label-Studio-JSON-format), but as Python objects.

```python
project.import_tasks(
    [
        {'image_path': 'http://localhost:8000/flowers/test/Image_1.jpg'},
        {'image_path': 'http://localhost:8000/flowers/test/Image_10.jpg'}
    ]
)
```

The field names should match the ones in the XML definition, I think:

- `image_path`
- `class`
- etc.

In [39]:
# Recall we already have a web server serving all images with URLs
print(image_paths[:5]) # ['http://localhost:8000/flowers/test/Image_1.jpg', 'http://localhost:8000/flowers/test/Image_10.jpg', ...

['http://localhost:8000/flowers/test/Image_1.jpg', 'http://localhost:8000/flowers/test/Image_10.jpg', 'http://localhost:8000/flowers/test/Image_100.jpg', 'http://localhost:8000/flowers/test/Image_101.jpg', 'http://localhost:8000/flowers/test/Image_102.jpg']


In [40]:
# -- If we have a list of image URLs, we can programmatically add/import tasks
# Check the web UI to see the updates in there
project.import_tasks(
    [{'image_path': image_paths[i]} for i in range(10)]
)

[1849, 1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858]

In [41]:
# -- We can create predictions for tasks == pre-annotations
task_ids = project.get_tasks_ids()
project.create_prediction(task_ids[0], result='tulip', score=0.9)

{'id': 925,
 'model_version': '',
 'created_ago': '0\xa0minutes',
 'result': [{'from_name': 'class',
   'to_name': 'image',
   'type': 'choices',
   'value': {'choices': ['tulip']}}],
 'score': 0.9,
 'cluster': None,
 'neighbors': None,
 'mislabeling': 0.0,
 'created_at': '2023-10-03T08:58:16.166245Z',
 'updated_at': '2023-10-03T08:58:16.166245Z',
 'task': 1849,
 'project': 7}

In [42]:
# -- We can also create/import tasks with pre-annotations/predictions

flower_classes = ['daisy', 'dandelion', 'rose', 'sunflower', 'tulip']

project.import_tasks(
    [{'image_path': image_paths[i], 'class': random.choice(flower_classes),} for i in range(10,20)],
    preannotated_from_fields=['class']
)

[1859, 1860, 1861, 1862, 1863, 1864, 1865, 1866, 1867, 1868]