"""

# Interactive Image Classification with Jupyter Notebooks in Red Hat OpenShift AI



**Date**: Nov 14, 2024  

**Authors**: Alex Krikos & Ramakrishna Yekulla  



### Related Topics:



- Developer Tools

- Event-Driven



### Related Products:



- Red Hat OpenShift

- Red Hat OpenShift AI



## Introduction



This tutorial demonstrates how to use Jupyter Notebooks within Red Hat OpenShift AI to interactively classify images of cats and dogs. We leverage TensorFlow and ipywidgets to simulate real-time data streaming and visualization.



## Prerequisites:



- Access to Red Hat Developer Sandbox.

- An active Red Hat OpenShift cluster.

- Basic knowledge of Python programming.

- A GitHub account for accessing code repositories.



---

# JupyterLab Setup and Dataset Preparation

## Step 1: Launch JupyterLab

1. **Navigate to the OpenShift AI Dashboard:**
   - Visit the **OpenShift AI Dashboard** within the OpenShift AI platform.

2. **Access Data Science Projects:**
   - Go to **"Data Science Projects"**.
   - Select your project from the list.

3. **Create a New Workbench:**
   - Navigate to the **"Workbenches"** tab.
   - Click **"Create Workbench"**.
   
   - **Configuration:**
     - **Name**: Set a suitable name for your workbench.
     - **Notebook Image**: Choose **TensorFlow**.
     - **Deployment Size**: Select **"Medium"**.
     - **Cluster Storage**: Allocate **20Gi**.

   - Click **"Create Workbench"** and wait for the status to show **"Running"**.

## Step 2: Obtain and Prepare the Dataset

### Purpose:
- Install the Kaggle library and configure the Kaggle API key for downloading datasets programmatically.

### Why:
- Using the Kaggle API key enables direct access to large datasets for machine learning, enhancing data acquisition efficiency.

### Steps:
### Configure Kaggle API Key:

**Steps to configure:**

1. **Download your Kaggle API key:** 
   - Access your Kaggle account settings and download the `kaggle.json` file.





In [None]:
!pip install kaggle
import os

# Construct the relative path to kaggle.json
kaggle_json_path = os.path.join("../../", "kaggle.json") 

# Create the .kaggle directory in your home directory
os.makedirs(os.path.expanduser("~/.kaggle"), exist_ok=True)

# Copy kaggle.json to the .kaggle directory
!cp "{kaggle_json_path}" ~/.kaggle/

# Set the correct permissions
!chmod 600 ~/.kaggle/kaggle.json

# Download the dataset
!kaggle datasets download -d salader/dogs-vs-cats --force

# Unzip the dataset
!unzip -oq dogs-vs-cats.zip -d ./data


## Step 3: Build and Train the Model

### **Why This Step Matters**:
- **Monitoring Performance**: Observing training logs allows us to assess if the model is effectively learning from the data.
- **Identifying Problems**: We can spot signs of overfitting or underfitting, which are critical for model optimization.
- **Guiding Adjustments**: Insights from logs help in making informed decisions about tweaking model parameters or architecture.

### **Process**:

1. **Data Preparation**:
   - Setup data generators to preprocess images, which includes scaling, augmenting, or normalizing data for better training.

2. **Model Architecture**:
   - Design the CNN with layers like convolutional, pooling, and fully connected layers to learn image features.

3. **Model Training**:
   - Compile the model with an optimizer, loss function, and metrics like accuracy.
   - Train the model using the prepared dataset, adjusting weights through backpropagation to minimize loss.

#### **Dataset Overview**:

- **Output**: `"Found 20000 images belonging to 2 classes."`
  - **Explanation**: Your dataset comprises 20,000 images split into two classes, likely representing 'cats' and 'dogs'.
  - **Significance**: This ensures you have enough data to train on and that the classes are balanced, which is crucial for model fairness.

### **Training Metrics**:

- **Epochs**: The model goes through the entire dataset 5 times, allowing it multiple chances to learn.
- **Steps per Epoch**: Each epoch is divided into 50 steps, where each step processes a batch of data.
- **Accuracy**: Indicates the percentage of correct classifications, with 0.5723 at the first epoch suggesting the models initial performance.
- **Loss**: A measure of prediction error, where a decrease over time signifies improvement.

#### **Epoch by Epoch Analysis**:

- **Epoch 1**: Begins with a high loss (0.7998) and an accuracy of 57.23%, marking the baseline performance.
- **Epochs 2-5**: Demonstrate slight progress in accuracy and a trend of reducing loss, with some normal fluctuations.

#### **Key Insights**:

- **Model Learning**: A decreasing loss shows the model is absorbing information, but the improvement rate hints at potential for further optimization.
- **Overfitting/Underfitting**: Although not directly visible, understanding these concepts is vital for model validation.

#### **Strategies for Enhancement**:

- **Model Architecture**: Consider different or additional layers, or altering existing layer parameters.
- **Learning Rate**: Tuning this can lead to better convergence and learning stability.
- **Data Augmentation**: Enhances the datasets variety, helping the model generalize better from the training data.

#### **Continuous Monitoring**:

- Regularly check these metrics to make timely adjustments in your training strategy.

**This step-by-step analysis equips learners with the tools to interpret training outputs, troubleshoot issues like overfitting or underfitting, and refine their models to achieve better performance.**
**This summary and analysis from training logs are essential for learners to understand model performance dynamics, diagnose issues, and strategize improvements in machine learning projects.**


In [None]:
from tensorflow.keras.layers import Input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Setup data generators with a smaller batch size for faster execution
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
        './data/train',
        target_size=(150, 150),
        batch_size=8,  # Further reduced batch size
        class_mode='binary')

# Define the CNN with an explicit Input layer
model = Sequential([
    Input(shape=(150, 150, 3)),  # Explicit input shape definition
    Conv2D(32, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),
    Conv2D(128, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),
    Flatten(),
    Dense(128, activation='relu'),  # Reduced the number of neurons for faster processing
    Dense(1, activation='sigmoid')
])

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Reduce the number of epochs to speed up the training
# Reduced epochs and steps per epoch
model.fit(train_generator, steps_per_epoch=50, epochs=5)

### Step 4: Interactive Real-Time Data Streaming and Visualization

**Purpose**: Simulate real-time data interaction and demonstrate how AI can be used interactively in a simplified context. We leverage TensorFlow and ipywidgets to simulate real-time data streaming and visualization.

**Process**:
1. **Create Interactive Dropdown Menus**: Implement dropdown menus to allow users to select and visually interact with predictions on cat or dog images.
2. **Utilize Widgets for Simulation**: Use Jupyter Notebook widgets to simulate real-time data streaming. This approach is particularly useful for educational and demonstration purposes, showing how data can be dynamically processed and visualized in an interactive environment.




In [None]:
# Import necessary libraries for handling images
from tensorflow.keras.preprocessing import image
import numpy as np
import matplotlib.pyplot as plt
import os
import random

# Function to load and preprocess images
def load_and_preprocess_image(file_path):
    img = image.load_img(file_path, target_size=(150, 150))
    img_tensor = image.img_to_array(img)                    # Convert image to array
    img_tensor = np.expand_dims(img_tensor, axis=0)         # Adjust shape for model input
    img_tensor /= 255.0                                    # Normalize the image
    return img_tensor

# Function to display predictions
def predict_and_visualize(model, file_path):
    img_tensor = load_and_preprocess_image(file_path)
    prediction = model.predict(img_tensor)
    plt.imshow(img_tensor[0])
    plt.title(f'Prediction: {"Dog" if prediction[0][0] > 0.5 else "Cat"}')
    plt.show()

# Function to select a random image and make a prediction
def predict_random_image(model, directory):
    if os.path.exists(directory):
        random_image = random.choice(os.listdir(directory))
        full_path = os.path.join(directory, random_image)
        predict_and_visualize(model, full_path)
    else:
        print("Directory not found:", directory)
        
# Update these paths according to your actual directory structure
cat_path = './data/test/cats/'
dog_path = './data/test/dogs/'

# Assuming 'model' is your trained model, and 'cat_path' and 'dog_path' are defined
# Example usage
predict_random_image(model, cat_path)


In [None]:
predict_random_image(model, dog_path)


### Step 5: Testing the Dataset with Random Image Prediction

**Purpose**: This step is crucial for evaluating the trained model's performance by testing it on unseen data in a real-world-like scenario. By randomly selecting images and predicting their classes, we can visually assess the model's accuracy and reliability.

**Process**:
1. **Random Image Selection**: The function `predict_random_image` takes a directory path as input and checks if the path exists. It then randomly selects an image from this directory, ensuring that each test is unbiased and represents a realistic use case.
2. **Image Loading and Preprocessing**: The randomly chosen image file is then loaded and preprocessed to match the input format expected by the model. This involves resizing the image and normalizing its pixel values.
3. **Model Prediction**: The preprocessed image tensor is fed into the model to predict whether the image is of a 'Cat' or 'Dog'. This step directly utilizes the neural network to interpret the image data.
4. **Visualization**: The image along with its predicted class is displayed. This visual feedback is crucial for understanding the model's decision-making process and immediately seeing the result of the prediction.
5. **Interactive Testing**: By running the function with different directories (e.g., cats and dogs), users can interactively test how the model performs across varied inputs, making this a dynamic tool for demonstration and educational purposes.

**Why This Step**:
Testing the model with a random selection of images simulates how the model might perform in a production environment where inputs are not predetermined. It helps in identifying potential biases, underfitting, or overfitting issues in the model. Additionally, visual feedback from test predictions is an excellent way to demonstrate the model's capabilities to a non-technical audience, making complex machine learning concepts more accessible and understandable.


In [None]:
import matplotlib.pyplot as plt
import random

def predict_random_image(model, directory):
    if os.path.exists(directory):
        random_image = random.choice(os.listdir(directory))
        full_path = os.path.join(directory, random_image)
        img_tensor = load_and_preprocess_image(full_path)
        prediction = model.predict(img_tensor)
        plt.imshow(img_tensor[0])
        plt.title(f'Prediction: {"Dog" if prediction[0][0] > 0.5 else "Cat"}')
        plt.show()

# Example usage
predict_random_image(model, './data/test/cats/')
predict_random_image(model, './data/test/dogs/')


### Step 6: Interactive Real-Time Image Prediction with Widgets

**Purpose**: This step integrates interactive web widgets to provide a user-friendly interface for real-time image prediction, showcasing how TensorFlow and Jupyter Notebook widgets can be used to enhance the interactivity and accessibility of AI applications.

**Process**:
1. **Import Necessary Libraries**:
   - Libraries such as `ipywidgets` for interactive controls, `matplotlib.pyplot` for visualization, `os` for operating system interface, `random` for randomness, `tensorflow.keras.preprocessing.image` for image handling, and `numpy` for numerical operations are essential for the functionality of this code.
2. **Define Image Loading and Preprocessing Function**:
   - `load_and_preprocess_image`: Loads and processes images to match the input specifications of the neural network, including resizing to 150x150 pixels and normalizing pixel values to the range [0,1].
3. **Define Prediction and Visualization Function**:
   - `predict_and_visualize`: Utilizes the trained model to classify the image as either 'Cat' or 'Dog' and visualizes the image alongside its classification, providing immediate visual feedback on the prediction outcome.
4. **Setup Interactive Widgets for User Input and Display**:
   - A button widget labeled "Inceptial" is used to initiate the prediction of a randomly selected image from either the cats or dogs directory.
   - An output widget displays the results and handles any necessary clearing of previous outputs for clarity.
5. **Implement Random Image Prediction Functionality**:
   - `predict_random_image`: Collects all image paths from specified directories, randomly selects one, and performs prediction and visualization. This simulates a realistic scenario where the model might be used in a production environment to classify new, unseen images.
6. **Widget Interaction Setup**:
   - Link the button to trigger the random image prediction function, allowing users to interactively test the model’s performance on various images with a single click.

**Why This Step**:
- **Enhancing User Engagement**: Using interactive widgets makes the application more accessible and engaging for users, who can actively participate in the testing process.
- **Demonstrating Model Capabilities**: This setup provides a practical demonstration of the model's capabilities in a dynamic, real-world application, allowing for the assessment of its robustness and accuracy.
- **Educational Tool**: It serves as an excellent educational tool, helping users understand machine learning concepts through direct interaction and immediate feedback.

**Output**:
- The interactive session will display images with their predicted labels in real-time as the user clicks the "Inceptial" button. This dynamic interaction helps in understanding how well the model performs across a random set of images and provides insights into potential improvements for model training.


In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
import os
import random
from tensorflow.keras.preprocessing import image
import numpy as np

# Assuming 'model' is your trained model
def load_and_preprocess_image(file_path):
    img = image.load_img(file_path, target_size=(150, 150))
    img_tensor = image.img_to_array(img)
    img_tensor = np.expand_dims(img_tensor, axis=0)
    img_tensor /= 255.0
    return img_tensor

def predict_and_visualize(model, file_path):
    img_tensor = load_and_preprocess_image(file_path)
    prediction = model.predict(img_tensor)
    plt.imshow(img_tensor[0])
    plt.title(f'Prediction: {"Dog" if prediction[0][0] > 0.5 else "Cat"}')
    plt.show()

# Path settings
base_path = './data/test/'
animal_types = {'Cats': 'cats/', 'Dogs': 'dogs/'}

# Widgets
predict_button = widgets.Button(description="Predict")
output = widgets.Output()

def predict_random_image(b):
    # Gather all images from both categories
    all_images = []
    for animal, folder in animal_types.items():
        full_path = os.path.join(base_path, folder)
        all_images.extend([os.path.join(full_path, file) for file in os.listdir(full_path)])
    
    if all_images:
        random_image_path = random.choice(all_images)
        with output:
            clear_output(wait=True)
            predict_and_visualize(model, random_image_path)
    else:
        with output:
            clear_output(wait=True)
            print("No images found.")

# Link the button to the random image prediction function
predict_button.on_click(predict_random_image)

# Display widgets
display(predict_button, output)



## Step 7: Call to Action: Addressing Misclassification in Your AI Model

Misclassification in machine learning models can significantly hinder your model's accuracy and reliability. To combat this, it's crucial to **verify dataset balance**, **align preprocessing methods**, and **tweak model parameters**. These steps are essential for ensuring that your model not only learns well but also generalizes well to new, unseen data.

### Why You Should Experiment with Training Adjustments
Before making these changes, ensure you go back to **Step 3: Build and Train the Model** in your workflow. Adjusting the training process, such as the number of epochs and steps per epoch, can provide quicker feedback on model performance, allowing you to iteratively improve your model in a more controlled and informed manner. Here’s what you can do:

1. **Adjust the Number of Epochs to Optimize Training Speed**  
   Changing the number of epochs can help you find the sweet spot where your model learns enough to perform well without overfitting. This is crucial for building a robust model that performs consistently.

2. **Try Different Values for Steps per Epoch**  
   Modifying `steps_per_epoch` affects how many batches of samples are used in one epoch. This can influence the granularity of the model updates and can help in dealing with imbalanced datasets or overfitting.

### Example Code to Modify Your Training Process
Make these modifications in your Jupyter Notebook or another Python environment as part of **"Step 3: Build and Train the Model"**. Here’s how you might modify the training to see how these changes can impact your model's learning curve and overall performance:

```python
# Adjust the number of epochs and steps per epoch
model.fit(train_generator, steps_per_epoch=100, epochs=10)
