# Utilities for Label Studio

For a better understanding, follow the co-located [`README.md`](./README.md); this notebook is a companion of that general guide.

Table of contents:

- CSV/JSON Dataset for Import
  - Pre-Annotated Dataset as JSON
- API Usage

In [1]:
import os
import csv
import json
import pandas as pd
import random

from dotenv import load_dotenv
# Load .env file
load_dotenv()

True

In [2]:
# Access the LABEL_STUDIO_API_TOKEN
LABEL_STUDIO_API_TOKEN = os.getenv("LABEL_STUDIO_API_TOKEN")

## CSV/JSON Dataset for Import

This section shows how to create a CSV/JSON of the image URLs. Those CSV/JSON files need to be uploaded to Label Studio.
The URLs point to the images of the desired dataset and they are served with the co-located `./serve_local_files.py`.
Note that `SERVER_DIRECTORY` needs to be the same here and in `./serve_local_files.py`, pointing in both cases to the root folder were the paths are taken from.

In the case of the JSON file, it contains pre-annotated data.

In [19]:
def list_image_files(directory, server_directory,  base_url="http://localhost:8000/"):
    """
    Recursively lists all image URLs from a local server for the images in the given directory and its subdirectories.

    :param directory: Path to the directory.
    :param server_directory: Path to the directory from which the server is started.
    :return: List of URLs to image files served from a local server.
    """
    
    # List of common image extensions
    image_extensions = ['.jpg', '.jpeg', '.png', '.bmp', '.gif', '.tiff', '.webp']

    # Recursively walk through the directory
    image_urls = []
    for dirpath, _, filenames in os.walk(directory):
        for filename in filenames:
            if any(filename.lower().endswith(ext) for ext in image_extensions):
                # Convert file path to a URL path
                relative_path = os.path.relpath(os.path.join(dirpath, filename), server_directory)
                web_path = relative_path.replace('\\', '/')
                full_url = base_url + web_path
                
                image_urls.append(full_url)
                
    return image_urls

In [20]:
def save_to_csv(image_paths, output_file):
    """
    Save list of image paths to a CSV file.

    :param image_paths: List of image paths.
    :param output_file: Path to the output CSV file.
    """
    with open(output_file, 'w', newline='') as csvfile:
        csv_writer = csv.writer(csvfile)
        csv_writer.writerow(["image_path"])  # Writing the header
        for path in image_paths:
            csv_writer.writerow([path])

In [21]:
DIRECTORY_PATH = 'C:/Users/Msagardi/git_repositories/tool_guides/labelstudio/data/flowers/test/'
SERVER_DIRECTORY = 'C:/Users/Msagardi/git_repositories/tool_guides/labelstudio/data'
image_paths = list_image_files(DIRECTORY_PATH, SERVER_DIRECTORY)
print(image_paths[:5]) # ['http://localhost:8000/flowers/test/Image_1.jpg', 'http://localhost:8000/flowers/test/Image_10.jpg', ...

['http://localhost:8000/flowers/test/Image_1.jpg', 'http://localhost:8000/flowers/test/Image_10.jpg', 'http://localhost:8000/flowers/test/Image_100.jpg', 'http://localhost:8000/flowers/test/Image_101.jpg', 'http://localhost:8000/flowers/test/Image_102.jpg']


In [22]:
output_csv_path = 'image_paths.csv'
save_to_csv(image_paths, output_csv_path)

In [35]:
# Now, in ./serve_local_files.py, we need to set
#   SERVER_DIRECTORY
# with the same path as here.
# Then, we execute it:
#   python serve_local_files.py
# With that, we're going to get the images served in the URLs

For reference, that `serve_local_files.py` contains the following code:

```python
from flask import Flask, send_from_directory
from flask_cors import CORS

SERVER_DIRECTORY = 'C:/Users/Msagardi/git_repositories/tool_guides/labelstudio/data/'

app = Flask(__name__)
CORS(app)  # This will enable CORS for all routes

@app.route('/<path:path>')
def serve_file(path):
    return send_from_directory(SERVER_DIRECTORY, path)

if __name__ == '__main__':
    app.run(port=8000)

```

### Pre-Annotated Dataset as JSON

In [29]:
def create_dataset_dataframe(image_paths):
    """
    Create a DataFrame with given image paths, and two additional columns:
    prediction and cluster.

    :param image_paths: List of image URLs.
    :return: DataFrame with columns: image_paths, prediction, cluster.
    """
    
    flower_types = ['daisy', 'dandelion', 'rose', 'sunflower', 'tulip']
    predictions = [random.choice(flower_types) for _ in image_paths]
    clusters = [random.randint(0, 5) for _ in image_paths]
    
    df = pd.DataFrame({
        'image_paths': image_paths,
        'prediction': predictions,
        'cluster': clusters
    })

    return df

df = create_dataset_dataframe(image_paths)

In [30]:
df.head()

Unnamed: 0,image_paths,prediction,cluster
0,http://localhost:8000/flowers/test/Image_1.jpg,sunflower,0
1,http://localhost:8000/flowers/test/Image_10.jpg,tulip,4
2,http://localhost:8000/flowers/test/Image_100.jpg,tulip,1
3,http://localhost:8000/flowers/test/Image_101.jpg,rose,0
4,http://localhost:8000/flowers/test/Image_102.jpg,daisy,2


In [32]:
def save_to_json_preannotated(df, output_filepath):
    """
    Save DataFrame to a JSON file suitable for importing into Label Studio.

    :param df: DataFrame with columns: image_paths, prediction, cluster.
    :param output_filepath: Path to the output JSON file.
    """
    
    output_data = []

    for _, row in df.iterrows():
        item = {
            "data": {
                "image_path": row['image_paths'],
                "cluster": row['cluster']  # can be accessed in the Filters!
            },
            "predictions": [
                {
                    "result": [
                        {
                            "from_name": "class",
                            "to_name": "image",
                            "type": "choices",
                            "value": {
                                "choices": [row['prediction']]
                            }
                        }
                    ]
                }
            ]
        }
        output_data.append(item)
    
    with open(output_filepath, 'w') as outfile:
        json.dump(output_data, outfile, indent=4)


output_filepath = "images_paths_preannotated.json"
save_to_json_preannotated(df, output_filepath)

## API Usage