

->

# Roboflow Filename Checker

This notebook helps you check which image files from your local directory are missing from your Roboflow dataset.

## Purpose
- Scan local image folders (car, left, right camera angles)
- Extract filename identifiers in the format `GX010152_36_378`
- Generate Roboflow search queries to check file existence
- Output queries in manageable batches for UI search

## Usage
1. Update the `base_path` variable to point to your local image directory
2. Run all cells to generate search queries
3. Copy the generated queries and paste them into Roboflow's search interface
4. Check which files are missing from your dataset


## Step 1: Collect Local Files

This cell scans your local directory structure to find all image files in the specified folders (car, left, right camera angles).

**What it does:**
- Sets the base path to your local image directory
- Defines subfolder names for different camera positions
- Walks through each folder recursively to collect all filenames
- Prints the total count of files found

**Note:** Update the `base_path` variable to match your local directory structure.


In [9]:
import os


base_path = r'C:\Users\gbo10\OneDrive\measurement_paper_images\images used for imageJ\check\stabilized\shai\measurements/1\carapace'
# Define the paths to the folders


folders = [

f'{base_path}/car',
f'{base_path}/left',
f'{base_path}/right'

]

# Collect all file names
local_files = []
for folder in folders:
    for root, dirs, files in os.walk(folder):
        for file in files:
            local_files.append(file)

print(f"Found {len(local_files)} files locally.")


Found 0 files locally.


## Step 2: Extract File Identifiers

This cell processes the collected filenames to extract the unique identifiers that Roboflow uses.

**What it does:**
- Takes filenames like `GX010152_36_378-jpg_gamma_jpg.rf.d49b41f3c5a08c7aa8fd8a1779b49804.jpg`
- Extracts the core identifier: `GX010152_36_378`
- Uses string splitting on underscores to get the 2nd, 3rd, and 4th parts
- Creates a clean list of identifiers for searching

**Output:** A list of processed filename identifiers ready for Roboflow queries.


In [10]:
# Assuming local_files contains the list of filenames
local_files_processed = [
    f"{file.split('_')[1]}_{file.split('_')[2]}_{file.split('_')[3].split('.')[0]}"
    for file in local_files
]

# Print the result to verify
print(local_files_processed)


[]


## Step 3: Generate Single Search Query

This cell creates one large search query containing all filenames joined with "or" operators.

**What it does:**
- Joins all processed identifiers with " or " separator
- Creates a single long query string like: `filename:GX010152_36_378 or filename:GX010155_18_219 or ...`
- Prints the complete query for copying

**Usage:** This query might be too long for some UI interfaces, so consider using the batched version below instead.


In [11]:
#split by underscore and take the second part and and third part and fourth part


# Construct the search query
search_query = " or ".join([f"filename:{filename}" for filename in local_files_processed])

# Print the query to use in the UI
print(search_query)





## Step 4: Generate Batched Search Queries

This cell splits the identifiers into smaller batches and creates multiple search queries that are easier to handle in the Roboflow UI.

**What it does:**
- Divides the identifier list into chunks of 5 (configurable via `batch_size`)
- Creates separate queries for each batch using " OR " separator
- Prints each batch query on a separate line
- Makes it easier to copy/paste manageable chunks into Roboflow

**Recommended:** Use these batched queries instead of the single large query above for better UI compatibility.


In [12]:


# Define the batch size
batch_size = 5  # Adjust this number based on what works in your UI

# Split the filenames into batches
for i in range(0, len(local_files_processed), batch_size):
    batch = local_files_processed[i:i+batch_size]
    search_query = " OR ".join([f"filename:{filename}" for filename in batch])
    print(f'{search_query}')
    # Here you would input the search_query into your UI
    # Perform the search in the Roboflow UI
