## 02 - Fruit and Vegetable Disease (Healthy vs Rotten) - Kaggle + Vertex AI Training (AutoML) Example

* Kaggle page:  https://www.kaggle.com/datasets/muhammad0subhan/fruit-and-vegetable-disease-healthy-vs-rotten
* dataset: https://www.kaggle.com/datasets/muhammad0subhan/fruit-and-vegetable-disease-healthy-vs-rotten/data
* notebook: https://www.kaggle.com/code/osamaabobakr/fruit-and-vegetable-disease-healthy-vs-rotten

by: Justin Marciszewski | justinjm@google.com | AI/ML Specialist CE

refs:

* https://cloud.google.com/vertex-ai/docs/training-overview
* https://cloud.google.com/vertex-ai/docs/tutorials/image-classification-automl/overview
* https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/automl_image_classification_online_prediction.ipynb


## Setup



### Install packages

In [None]:
packages = [
    ('numpy', 'numpy'),
    ('os', 'os-sys'), # os is built-in, this is for demonstration
    ('cv2', 'opencv-python'),
    ('re', 're'), # re is built-in, this is for demonstration
    ('random', 'random'), # random is built-in, this is for demonstration
    ('matplotlib.pyplot', 'matplotlib'),
    ('seaborn', 'seaborn'),
    ('kaggle.api.kaggle_api_extended', 'kaggle'),
    ('sklearn.model_selection', 'scikit-learn'),
    ('sklearn.utils', 'scikit-learn'),
    ('keras', 'keras'),
    ('tensorflow.keras', 'tensorflow'),
    ('tensorflow.keras.layers', 'tensorflow'),
    ('tensorflow.keras.models', 'tensorflow'),
    ('tensorflow.keras.applications', 'tensorflow'),
    ('tensorflow.keras.preprocessing.image', 'tensorflow')
]

import importlib
install = False
for package in packages:
    try:
        importlib.import_module(package[0])
    except ImportError:
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user

if install:
    print("Installation of missing packages complete. Please run the next cell to restart the kernel before proceeding.")

### Restart Kernel (If Installs Occured)
After a kernel restart the code submission can start with the next cell after this one.

In [None]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Setup 

### Set constants

In [2]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'demos-vertex-ai'

In [4]:
LOCATION = "us-central1"  
REGION = 'us-central1' 

SERIES = "02-kaggle-vertex-ai"
EXPERIMENT = "02-automl" # notebook number 

BUCKET_NAME = f"{PROJECT_ID}-fruit-and-veg-image-model"

## model training 
DESIRED_LABELS = [
    'Apple__Healthy', 'Apple__Rotten',
    'Banana__Healthy', 'Banana__Rotten',
    'Bellpepper__Healthy', 'Bellpepper__Rotten'
]
NUM_CLASSES = len(DESIRED_LABELS)

### Packages

In [6]:
# Data Ingestion
from datetime import datetime
import os
from pathlib import Path
import subprocess
import time
import json
import re
import random
import tempfile
import threading
import pandas as pd

from google.cloud import storage
from google.cloud.exceptions import NotFound

from kaggle.api.kaggle_api_extended import KaggleApi

# Data pre-processing
from PIL import Image  # For image loading and preprocessing

# Modeling 
from google.cloud import aiplatform
import base64
import tensorflow as tf
import numpy as np

### Parameters

In [7]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
URI = f"gs://{BUCKET_NAME}/{SERIES}/{EXPERIMENT}" 
DIR = f"temp/{EXPERIMENT}"

LOCAL_DATA_DIR = f"{DIR}/data"
LOCAL_CSV_IMAGE_DATA_PATH = f"{LOCAL_DATA_DIR}/labels.csv"

DATASET_CSV = f"{URI}/{TIMESTAMP}/labels.csv"

DATASET_DISPLAY_NAME = f"{SERIES}-{TIMESTAMP}"

### Experiment Tracking 

In [8]:
FRAMEWORK = 'tf'
TASK = 'classification'
MODEL_TYPE = 'tl'
EXPERIMENT_NAME = f'experiment-{SERIES}-{EXPERIMENT}-{FRAMEWORK}-{TASK}-{MODEL_TYPE}'
RUN_NAME = f'run-{TIMESTAMP}'

### Create a local directories for staging files 

* data files from creating labels.csv
* build files for creating custom container and running a custom job 
* model training output files and example input images for local inference

In [9]:
! rm -rf $LOCAL_DATA_DIR
! mkdir -p $LOCAL_DATA_DIR

In [10]:
if not os.path.exists(f"{DIR}/build"):
    os.makedirs(f"{DIR}/build")

In [11]:
if not os.path.exists(f"{DIR}/output"):
    os.makedirs(f"{DIR}/output")

## Clients 

In [12]:
#  Google Cloud Storage client
storage_client = storage.Client(project=PROJECT_ID)
aiplatform.init(project=PROJECT_ID, location=REGION)

## Create Storage Bucket

In [13]:
def check_and_create_bucket(bucket_name, location):
    try:
        storage_client.get_bucket(bucket_name)
        print(f"Bucket {bucket_name} already exists.")
    except NotFound:
        bucket = storage_client.create_bucket(bucket_or_name=bucket_name, location=location)
        print(f"Bucket {bucket_name} created.")

In [14]:
check_and_create_bucket(BUCKET_NAME, LOCATION)

Bucket demos-vertex-ai-fruit-and-veg-image-model created.


## Get Data from Kaggle

### Setup Kaggle credentials

You will need a Kaggle account and locate or create a kaggle.json file in the directory: `/home/jupyter/.config/kaggle`

Steps:

* manually download your credentail file from kaggle.com -> Profile
* run this command in terminal to move it to the correct location: `mv kaggle.json .config/kaggle/kaggle.json`


### Download images 

In [15]:
# Set up Kaggle credentials 
os.environ['KAGGLE_USERNAME'] = 'YOUR_KAGGLE_USERNAME' 
os.environ['KAGGLE_KEY'] = 'YOUR_KAGGLE_API_KEY'

# Initialize the Kaggle API
api = KaggleApi()
api.authenticate()

# Specify the dataset you want to download
dataset_slug = 'muhammad0subhan/fruit-and-vegetable-disease-healthy-vs-rotten'

# Download the dataset
api.dataset_download_files(dataset_slug, path=LOCAL_DATA_DIR, unzip=True)

Dataset URL: https://www.kaggle.com/datasets/muhammad0subhan/fruit-and-vegetable-disease-healthy-vs-rotten


### Convert images

In [16]:
def convert_image_to_rgb_and_jpeg(image_path):
    """Converts and saves an image to RGB JPEG format, overwriting the original."""
    try:
        img = Image.open(image_path)

        if img.mode != 'RGB':
            img = img.convert('RGB')

        img.save(image_path, format='JPEG')  # Overwrite the original
        # print(f'Converted and saved: {image_path}')

    except Exception as e:
        print(f'Error processing {image_path}: {e}')

def process_directory(root_dir, subdirs_to_convert, max_workers=None):
    """Processes images within specified subdirectories using multithreading."""
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        for root, dirs, files in os.walk(root_dir):
            # Filter directories based on the provided list
            dirs[:] = [d for d in dirs if d in subdirs_to_convert]

            for file in files:
                if file.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif')):  # Add more extensions if needed
                    image_path = Path(root) / file
                    executor.submit(convert_image_to_rgb_and_jpeg, image_path)

In [17]:
root_directory = f"{LOCAL_DATA_DIR}/Fruit And Vegetable Diseases Dataset"
subdirectories_to_convert = DESIRED_LABELS

process_directory(root_directory, subdirectories_to_convert)

## Load to GCS

Load only a subset of images (set by the `DESIRED_LABELS` list) for demonstration purposes, update the `DESIRED_LABELS` to include all the images in the Kaggle dataset.

In [18]:
# Loop over each subdirectory (label) and copy the contents using gsutil
for subdir in DESIRED_LABELS:
    source = f'"{LOCAL_DATA_DIR}/Fruit And Vegetable Diseases Dataset/{subdir}/*"'
    destination = f"{URI}/data/{subdir}/"
    print(destination)
    command = f"gsutil -m cp -r {source} {destination} > /dev/null 2>&1"
    
    # Execute the command using subprocess
    process = subprocess.run(command, shell=True)
    
    if process.returncode == 0:
        print(f"Successfully copied {subdir}")
    else:
        print(f"Failed to copy {subdir}")

gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy/
Successfully copied Apple__Healthy
gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Rotten/
Successfully copied Apple__Rotten
gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Banana__Healthy/
Successfully copied Banana__Healthy
gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Banana__Rotten/
Successfully copied Banana__Rotten
gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Bellpepper__Healthy/
Successfully copied Bellpepper__Healthy
gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Bellpepper__Rotten/
Successfully copied Bellpepper__Rotten


## Prepare data 

refs:

* https://cloud.google.com/vertex-ai/docs/image-data/classification/prepare-data 

### Create csv labels file and upload for use in model training

Create a csv file called `labels.csv` with the schema:  `gs://filename.jpg, label` 

This file should contain no headers and be located in GCS 

In [19]:
def get_file_list(bucket_name):
    # get list of all files from bucket
    bucket = storage_client.bucket(bucket_name)
    blobs = bucket.list_blobs()
    file_list = ['gs://' + bucket_name + '/' + blob.name for blob in blobs]
    
    return file_list

In [20]:
file_list = get_file_list(BUCKET_NAME)
file_list[:10]

['gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy/FreshApple (1).jpg',
 'gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy/FreshApple (10).jpg',
 'gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy/FreshApple (100).jpg',
 'gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy/FreshApple (101).jpg',
 'gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy/FreshApple (102).jpg',
 'gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy/FreshApple (103).jpg',
 'gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy/FreshApple (104).jpg',
 'gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy/FreshApple (105).jpg',
 'gs://demos-vertex-ai-frui

In [21]:
def create_dataframe(file_list, filter_pattern):
    # filter to include on filenames with jpg filename
    image_files = [file for file in file_list if file.endswith(('.jpg'))]
    df = pd.DataFrame(image_files, columns=['filename'])
    
    ## filter to only 3 foods per constants set above for demonstration purposes 
    df = df[df['filename'].str.contains(filter_pattern, regex=True)]
    
    # Extract the label from the GCS path (it's the second part after the bucket name)
    df['label'] = df['filename'].apply(lambda x: x.split('/')[6])  # Assuming the label is in the ith segment of the path
    
    return df

In [22]:
pd.options.display.max_colwidth = 100 # set option to view long strings 

df_labels = create_dataframe(file_list, 
                             filter_pattern = '|'.join(DESIRED_LABELS))
df_labels.head()

Unnamed: 0,filename,label
0,gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy...,Apple__Healthy
1,gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy...,Apple__Healthy
2,gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy...,Apple__Healthy
3,gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy...,Apple__Healthy
4,gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy...,Apple__Healthy


In [23]:
df_labels.shape[0]

3458

In [24]:
df_labels['label'].value_counts()

label
Banana__Healthy        796
Bellpepper__Healthy    603
Bellpepper__Rotten     591
Apple__Rotten          579
Banana__Rotten         570
Apple__Healthy         319
Name: count, dtype: int64

### Save labels.csv

Save labels.csv locally and to GCS Bucket for use in vertex ai training in next step

In [25]:
df_labels.to_csv(LOCAL_CSV_IMAGE_DATA_PATH, index=False, header=False)

In [26]:
bucket = storage_client.bucket(BUCKET_NAME)
blob = bucket.blob(f"{SERIES}/{EXPERIMENT}/{TIMESTAMP}/labels.csv")
blob.upload_from_filename(LOCAL_CSV_IMAGE_DATA_PATH)

## Create Vertex AI Dataset

Create a managed Vertex AI dataset. 

refs:

* https://cloud.google.com/vertex-ai/docs/image-data/classification/create-dataset#aiplatform_create_dataset_image_sample-python_vertex_ai_sdk

In [31]:
dataset = aiplatform.ImageDataset.create(
        display_name=f"{SERIES}_{EXPERIMENT}_{TIMESTAMP}",
        gcs_source=[DATASET_CSV],
        import_schema_uri=aiplatform.schema.dataset.ioformat.image.single_label_classification, 
        sync=True,
    )

Creating ImageDataset
Create ImageDataset backing LRO: projects/746038361521/locations/us-central1/datasets/7361767459390488576/operations/1075536948531036160
ImageDataset created. Resource name: projects/746038361521/locations/us-central1/datasets/7361767459390488576
To use this ImageDataset in another session:
ds = aiplatform.ImageDataset('projects/746038361521/locations/us-central1/datasets/7361767459390488576')
Importing ImageDataset data: projects/746038361521/locations/us-central1/datasets/7361767459390488576
Import ImageDataset data backing LRO: projects/746038361521/locations/us-central1/datasets/7361767459390488576/operations/994472155238367232
ImageDataset data imported. Resource name: projects/746038361521/locations/us-central1/datasets/7361767459390488576


## Model Training

Submit the AutoML training job to Vertex AI

refs

* https://cloud.google.com/vertex-ai/docs/image-data/classification/train-model#aiplatform_create_training_pipeline_image_classification_sample-python_vertex_ai_sdk



In [32]:
job = aiplatform.AutoMLImageTrainingJob(
    display_name=f"{SERIES}_{EXPERIMENT}_{TIMESTAMP}",
    model_type="CLOUD",
    prediction_type="classification",
    multi_label=False,
)

In [1]:
## manual set here if needed 
# dataset = aiplatform.ImageDataset(dataset_id)

In [None]:
model = job.run(
    dataset=dataset,
    model_display_name=f"{SERIES}_{EXPERIMENT}_{TIMESTAMP}",
    training_fraction_split=0.4,
    validation_fraction_split=0.3,
    test_fraction_split=0.3,
    budget_milli_node_hours=8000,
    disable_early_stopping=False,
    sync=True)

View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/7325155474532204544?project=746038361521
AutoMLImageTrainingJob projects/746038361521/locations/us-central1/trainingPipelines/7325155474532204544 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLImageTrainingJob projects/746038361521/locations/us-central1/trainingPipelines/7325155474532204544 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLImageTrainingJob projects/746038361521/locations/us-central1/trainingPipelines/7325155474532204544 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLImageTrainingJob projects/746038361521/locations/us-central1/trainingPipelines/7325155474532204544 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLImageTrainingJob projects/746038361521/locations/us-central1/trainingPipelines/7325155474532204544 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLImageTrainingJob projects/746038361521/locations/us-central1/trainingPipeline

In [None]:
model.wait()

## Evaluate Model


refs:

* https://cloud.google.com/vertex-ai/docs/image-data/classification/evaluate-model

In [35]:
# # get model resource ID
# models = aiplatform.Model.list(filter=f"display_name={SERIES}_{EXPERIMENT}_{TIMESTAMP}")

model_evaluations = model.list_model_evaluations()

for model_evaluation in model_evaluations:
    print(model_evaluation.to_dict())

{'name': 'projects/746038361521/locations/us-central1/models/3254898565356453888@1/evaluations/1553650045042032640', 'metricsSchemaUri': 'gs://google-cloud-aiplatform/schema/modelevaluation/classification_metrics_1.0.0.yaml', 'metrics': {'logLoss': 0.031703793, 'confusionMatrix': {'rows': [[170.0, 0.0, 0.0, 0.0, 1.0, 0.0], [0.0, 92.0, 0.0, 0.0, 0.0, 4.0], [1.0, 0.0, 153.0, 5.0, 2.0, 3.0], [0.0, 0.0, 3.0, 175.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 238.0, 0.0], [0.0, 0.0, 2.0, 0.0, 1.0, 171.0]], 'annotationSpecs': [{'id': '1047400549055463424', 'displayName': 'Banana__Rotten'}, {'id': '3353243558269157376', 'displayName': 'Apple__Healthy'}, {'id': '3929704310572580864', 'displayName': 'Bellpepper__Rotten'}, {'id': '5659086567482851328', 'displayName': 'Bellpepper__Healthy'}, {'displayName': 'Banana__Healthy', 'id': '6235547319786274816'}, {'displayName': 'Apple__Rotten', 'id': '8541390328999968768'}]}, 'confidenceMetrics': [{'confidenceThreshold': 0.0, 'maxPredictions': 0.0, 'precision': 0.1

## Get Predictions 

https://cloud.google.com/vertex-ai/docs/image-data/classification/get-predictions



### Deploy model

In [38]:
endpoint = model.deploy()

Creating Endpoint
Create Endpoint backing LRO: projects/746038361521/locations/us-central1/endpoints/4047822370843394048/operations/1056818862579777536
Endpoint created. Resource name: projects/746038361521/locations/us-central1/endpoints/4047822370843394048
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/746038361521/locations/us-central1/endpoints/4047822370843394048')
Deploying model to Endpoint : projects/746038361521/locations/us-central1/endpoints/4047822370843394048
Deploy Endpoint model backing LRO: projects/746038361521/locations/us-central1/endpoints/4047822370843394048/operations/1897866092991217664
Endpoint model deployed. Resource name: projects/746038361521/locations/us-central1/endpoints/4047822370843394048


### Make prediction

In [37]:
test_item = !gsutil cat $DATASET_CSV | head -n1
if len(str(test_item[0]).split(",")) == 3:
    _, test_item, test_label = str(test_item[0]).split(",")
else:
    test_item, test_label = str(test_item[0]).split(",")

print(test_item, test_label)

gs://demos-vertex-ai-fruit-and-veg-image-model/02-kaggle-vertex-ai/02-automl/data/Apple__Healthy/FreshApple (1).jpg Apple__Healthy


In [39]:
import base64

import tensorflow as tf

with tf.io.gfile.GFile(test_item, "rb") as f:
    content = f.read()

# The format of each instance should conform to the deployed model's prediction input schema.
instances = [{"content": base64.b64encode(content).decode("utf-8")}]

prediction = endpoint.predict(instances=instances)

print(prediction)

Prediction(predictions=[{'confidences': [1.08934461e-09, 1.0, 4.14907761e-11, 1.8561274e-11, 3.0299411e-08, 1.9773011e-10], 'ids': ['1047400549055463424', '3353243558269157376', '3929704310572580864', '5659086567482851328', '6235547319786274816', '8541390328999968768'], 'displayNames': ['Banana__Rotten', 'Apple__Healthy', 'Bellpepper__Rotten', 'Bellpepper__Healthy', 'Banana__Healthy', 'Apple__Rotten']}], deployed_model_id='303897317335891968', metadata=None, model_version_id='1', model_resource_name='projects/746038361521/locations/us-central1/models/3254898565356453888', explanations=None)


## Cleanup - danger zone

In [None]:
# undeploy endpoint only 
endpoint.undeploy_all()

In [None]:
## ! warning - running the code below deletes objects that require 
## long running processes to recreate

# Delete the dataset using the Vertex dataset object
## dataset.delete()

# Delete the endpoint using the Vertex endpoint object
## endpoint.delete()

# Delete the model using the Vertex model object
##model.delete()

# Delete the AutoML trainig job
##dag.delete()

# Delete Cloud Storage objects that were created
## delete_bucket = False  # Set True for deletion
## if delete_bucket:
##     ! gsutil -m rm -r $BUCKET_URI

### Data pre-processing

## Download Model for local inference

### Helper functions 

For downloading model, a sample image and finally making a prediction

In [None]:
def download_blobs_with_prefix(bucket_name, prefix, local_directory):
    
    bucket = storage_client.bucket(bucket_name)
    blobs = bucket.list_blobs(prefix=prefix)

    for blob in blobs:
        # Skip "directory" objects
        if blob.name.endswith("/"):
            continue

        # Calculate the relative path within the prefix
        relative_path = blob.name[len(prefix):] 

        # Create the local directory for the relative path
        local_file_directory = os.path.join(local_directory, os.path.dirname(relative_path))
        os.makedirs(local_file_directory, exist_ok=True)

        # Download the blob
        local_file_path = os.path.join(local_directory, relative_path)
        blob.download_to_filename(local_file_path)
        print(f"Blob {blob.name} downloaded to {local_file_path}.")

        
def download_random_jpg(bucket_name, pattern):

    bucket = storage_client.bucket(bucket_name)
    # Get list of blobs (files) with the pattern
    blobs = [blob for blob in bucket.list_blobs() if re.search(pattern, blob.name)]
    
    if not blobs:
        print("No files found with the pattern:", pattern)
        return None
    
    # Choose a random blob
    random_blob = random.choice(blobs)

    # Download the blob
    local_filename = random_blob.name 
    local_directory = os.path.dirname(local_filename)
    os.makedirs(local_directory, exist_ok=True)  # Ensure directory exists
    
    random_blob.download_to_filename(local_filename)
    print(f"Downloaded {local_filename} from bucket {bucket_name}")

    return local_filename


def preprocess_image(image_path, target_size=(224, 224)):
    """Preprocesses an image for model prediction."""
    img = Image.open(image_path).convert('RGB')  # Ensure RGB format
    img = img.resize(target_size)
    img_array = np.array(img, dtype=np.float32) / 255.0  # Normalize & set to float32
    return img_array  # Remove extra dimension (model handles batching)

### 4. Prepare an image for prediction

#### download a random image 

Filter to only 3 foods for demonstration purposes 

In [None]:
# same set of labels as before
downloaded_file = download_random_jpg(
    bucket_name=BUCKET_NAME, 
    pattern=f'({"|".join(DESIRED_LABELS)})(?!\.png$)') 

In [None]:
## display image to sanity check
display(Image.open(downloaded_file))

In [None]:
## and pre-process image for prediction
preprocessed_image = preprocess_image(downloaded_file)
# preprocessed_image

### 5. Make a prediction

#### Get predicted class 

And finally download the `label_map.json` to lookup the predicted class name when making prediction so we have a useful output

### Cleanup downloaded image

Delete the downloaded image file to keep local directory clean

In [None]:
if os.path.exists(downloaded_file):  # Check if the file exists
    os.remove(downloaded_file)
    print(f"Deleted downloaded image file: {downloaded_file}")
else:
    print(f"Downloaded image file not found: {downloaded_file}")