# Waste Classification System - Google Colab Setup

This notebook will guide you through setting up and running the waste classification system in Google Colab.

## 1. Install Required Packages

In [16]:
# Copy the setup correction script
!cp /content/drive/MyDrive/setup_correction.py /content/waste-classification-system/

# Run the setup correction script
%cd /content/waste-classification-system
!python setup_correction.py


/content/waste-classification-system
=== Waste Classification Project Setup Correction ===
This script will create the necessary directory structure and move files to their proper locations.
Current working directory: /content/waste-classification-system
Creating directory structure...
✓ Created directory: src
✓ Created directory: scripts
✓ Created directory: data
✓ Created directory: models
✓ Created directory: output

Moving files to proper locations...
✓ Moved classifier.py to src/classifier.py
✓ Moved data_utils.py to src/data_utils.py
✓ Moved detector.py to src/detector.py
✓ Moved ensemble.py to src/ensemble.py
✓ Moved download_datasets.py to scripts/download_datasets.py
✓ Moved preprocess_datasets.py to scripts/preprocess_datasets.py
✓ Moved train.py to scripts/train.py
✗ File not found: fixed_download_datasets.py
✓ Moved colab_pro_download_datasets.py to scripts/colab_pro_download_datasets.py

Creating __init__.py files...
✓ Created src/__init__.py
✓ Created scripts/__init__.py


In [1]:
# Install required packages
!pip install torch torchvision timm numpy pillow opencv-python matplotlib scikit-learn tqdm requests gradio ultralytics

Collecting gradio
  Downloading gradio-5.21.0-py3-none-any.whl.metadata (16 kB)
Collecting ultralytics
  Downloading ultralytics-8.3.92-py3-none-any.whl.metadata (35 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvi

## 2. Clone the Repository

Make sure you've created your GitHub repository and uploaded the project files as instructed.

In [5]:
# Clone your repository (replace with your actual repository URL)
!git clone https://github.com/hesampars/waste-classification-system.git
%cd waste-classification-system

# Create necessary directories
!mkdir -p data models output

Cloning into 'waste-classification-system'...
remote: Enumerating objects: 23, done.[K
remote: Counting objects: 100% (23/23), done.[K
remote: Compressing objects: 100% (21/21), done.[K
remote: Total 23 (delta 2), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (23/23), 23.85 KiB | 23.85 MiB/s, done.
Resolving deltas: 100% (2/2), done.
/content/waste-classification-system


## 3. Mount Google Drive

We'll mount your Google Drive to access the dataset zip files you've manually downloaded.

In [6]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 4. Upload Datasets to Google Drive

Before running the next cell, make sure you've:
1. Created a folder in your Google Drive (e.g., 'waste_datasets')
2. Uploaded your dataset zip files to this folder:
   - MJU-Waste.zip
   - TACO-master.zip
   - trashnet-master.zip
   - waste-pictures.zip

In [7]:
# Create a directory for the datasets
!mkdir -p data

# Copy datasets from Google Drive to the project
# Adjust the path if your folder structure is different
!cp /content/drive/MyDrive/waste_datasets/*.zip data/

# List the copied files
!ls -la data/

total 3640656
drwxr-xr-x 2 root root       4096 Mar 17 21:18 .
drwxr-xr-x 6 root root       4096 Mar 17 21:15 ..
-rw------- 1 root root 1446750945 Mar 17 21:17 MJU-Waste.zip
-rw------- 1 root root   38113038 Mar 17 21:18 TACO-master.zip
-rw------- 1 root root   42609527 Mar 17 21:18 trashnet-master.zip
-rw------- 1 root root 2200536115 Mar 17 21:18 waste-pictures.zip


## 5. Extract and Process Datasets

Now we'll extract the datasets and prepare them for training.

In [14]:
!cp /content/drive/MyDrive/colab_pro_download_datasets.py /content/waste-classification-system/


In [55]:
# Fix import paths in download_datasets.py
with open('/content/waste-classification-system/scripts/improved_download_datasets.py', 'r') as file:
    content = file.read()

# Replace the import statement
content = content.replace("import improved_config as config",
                         "import sys\nsys.path.append('/content/waste-classification-system')\nimport improved_config as config")

with open('/content/waste-classification-system/scripts/improved_download_datasets.py', 'w') as file:
    file.write(content)

# Fix import paths in preprocess_datasets.py
with open('/content/waste-classification-system/scripts/improved_preprocess_datasets.py', 'r') as file:
    content = file.read()

# Replace the import statement
content = content.replace("import improved_config as config",
                         "import sys\nsys.path.append('/content/waste-classification-system')\nimport improved_config as config")

with open('/content/waste-classification-system/scripts/improved_preprocess_datasets.py', 'w') as file:
    file.write(content)

print("Import paths fixed successfully!")


Import paths fixed successfully!


In [56]:
# Download datasets using improved script
!python /content/waste-classification-system/scripts/improved_download_datasets.py --gdrive /content/drive --colab-pro


Using trashnet zip from Google Drive: /content/drive/MyDrive/waste_datasets/trashnet-master.zip
Using taco zip from Google Drive: /content/drive/MyDrive/waste_datasets/TACO-master.zip
Using waste_pictures zip from Google Drive: /content/drive/MyDrive/waste_datasets/waste-pictures.zip
Using mju_waste zip from Google Drive: /content/drive/MyDrive/waste_datasets/MJU-Waste.zip
🔄 Updated paths for Google Drive mounted at /content/drive

📥 Downloading trashnet dataset
📁 Using manually downloaded TrashNet zip: /content/drive/MyDrive/waste_datasets/trashnet-master.zip
📦 Extracting trashnet-master.zip to /content/waste-classification-system/data/trashnet...
❌ Error: Main directory not found at /content/waste-classification-system/data/trashnet/trashnet-master/dataset
🔍 Searching for possible main directory...
💡 Found possible main directory at: /content/waste-classification-system/data/trashnet/trashnet-master/data/dataset/dataset-resized

📥 Downloading taco dataset
📁 Using manually downloaded 

In [20]:
# Copy the A100-optimized scripts
!cp /content/drive/MyDrive/a100_optimized_download_datasets.py /content/waste-classification-system/

!python /content/waste-classification-system/a100_optimized_download_datasets.py --colab-pro --optimize-a100

A100 GPU detected! Applying optimizations...

A100 GPU Optimization Settings:
- batch_size: 64
- image_size: 384
- mixed_precision: True
- learning_rate: 0.0005
- optimizer: AdamW
- weight_decay: 0.01
- gradient_accumulation_steps: 2
- Using enhanced model variants for A100 GPU
Using manually downloaded TrashNet zip: data/trashnet-master.zip
Using manually downloaded TACO zip: data/TACO-master.zip
Using manually downloaded Waste-Pictures zip: data/waste-pictures.zip
Using manually downloaded MJU-Waste zip: data/MJU-Waste.zip
A100 GPU detected! Optimizing for maximum performance...
Downloading Open Images for class: Bottle, type: train
[92m
		   ___   _____  ______            _    _    
		 .'   `.|_   _||_   _ `.         | |  | |   
		/  .-.  \ | |    | | `. \ _   __ | |__| |_  
		| |   | | | |    | |  | |[ \ [  ]|____   _| 
		\  `-'  /_| |_  _| |_.' / \ \/ /     _| |_  
		 `.___.'|_____||______.'   \__/     |_____|
	[0m
[92m
             _____                    _                 _ 

In [10]:
# Download Open Images dataset (this may take some time)
# Uncomment if you want to download Open Images
# !python scripts/download_datasets.py --datasets open-images

In [45]:
!python /content/waste-classification-system/scripts/preprocess_datasets.py


Processing TrashNet dataset from: /content/waste-classification-system/data/trashnet/trashnet-master/data/dataset/dataset-resized
Processed 2527 images from TrashNet
Processing TACO dataset from: /content/waste-classification-system/data/taco/TACO-master/data
Processed 0 images from TACO
Processing MJU-Waste dataset from: /content/waste-classification-system/data/mju-waste/simplified
Processed 0 images from MJU-Waste
Processing Waste-Pictures dataset from: /content/waste-classification-system/data/waste-pictures/train
Processed 17872 images from Waste-Pictures
Processing Open Images dataset from: /content/waste-classification-system/data/open-images
Processed 4 images from Open Images
Total processed images: 20403
Created dataset splits: 14279 train, 3058 val, 3066 test
Saved splits to: /content/waste-classification-system/data/processed/splits.json
Dataset preprocessing completed successfully

Class distribution in training set:
  trash: 12021 images
  e-waste: 587 images
  paper: 415

In [32]:
# This code will update both data_utils.py files with the fixed version

# Define the fixed code
fixed_code = """#!/usr/bin/env python3
\"\"\"
Data utilities module for waste classification system.
\"\"\"

import os
import sys
import json
import random
import shutil
import numpy as np
from PIL import Image
import torch
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms

# Import local modules
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import config

class WasteDatasetPreprocessor:
    \"\"\"Preprocessor for waste classification datasets.\"\"\"

    def __init__(self, data_dir):
        \"\"\"
        Initialize the dataset preprocessor.

        Args:
            data_dir: Directory containing the datasets
        \"\"\"
        self.data_dir = data_dir
        self.output_dir = os.path.join(data_dir, "processed")
        os.makedirs(self.output_dir, exist_ok=True)

        # Class mapping for standardization
        self.class_mapping = {
            # TrashNet mapping
            "glass": "glass",
            "paper": "paper",
            "cardboard": "cardboard",
            "plastic": "plastic",
            "metal": "metal",
            "trash": "trash",

            # TACO mapping
            "Plastic bottle": "plastic",
            "Bottle cap": "plastic",
            "Plastic bag & wrapper": "plastic",
            "Carton": "cardboard",
            "Paper": "paper",
            "Aluminium foil": "metal",
            "Metal can": "metal",
            "Glass bottle": "glass",
            "Plastic container": "plastic",
            "Plastic utensils": "plastic",
            "Pop tab": "metal",
            "Straw": "plastic",
            "Paper cup": "paper",
            "Plastic cup": "plastic",
            "Plastic lid": "plastic",
            "Cigarette": "trash",
            "Other plastic": "plastic",
            "Other metal": "metal",
            "Other glass": "glass",
            "Other paper": "paper",
            "Unlabeled litter": "trash",

            # MJU-Waste mapping
            "battery": "e-waste",
            "biological": "organic",
            "brown-glass": "glass",
            "cardboard": "cardboard",
            "clothes": "textile",
            "green-glass": "glass",
            "metal": "metal",
            "paper": "paper",
            "plastic": "plastic",
            "shoes": "textile",
            "trash": "trash",
            "white-glass": "glass",

            # Waste-Pictures mapping
            "battery": "e-waste",
            "biological": "organic",
            "clothes": "textile",
            "e-waste": "e-waste",
            "glass": "glass",
            "metal": "metal",
            "paper": "paper",
            "plastic": "plastic",
            "textile": "textile",
            "trash": "trash",
            "mixed": "mixed",

            # Open Images mapping
            "Bottle": "plastic",
            "Tin can": "metal",
            "Plastic bag": "plastic",
            "Cardboard": "cardboard",
            "Paper": "paper",
            "Glass": "glass",
            "Mobile phone": "e-waste",
            "Computer": "e-waste",
            "Food": "organic",
            "Clothing": "textile"
        }

    def preprocess_trashnet(self):
        \"\"\"
        Preprocess TrashNet dataset.

        Returns:
            List of processed image metadata
        \"\"\"
        dataset_dir = os.path.join(self.data_dir, "trashnet")
        if not os.path.exists(dataset_dir):
            print(f"TrashNet dataset not found at: {dataset_dir}")
            return []

        # Find the dataset folder (might be nested)
        data_folder = None
        for root, dirs, files in os.walk(dataset_dir):
            if "glass" in dirs and "paper" in dirs and "cardboard" in dirs:
                data_folder = root
                break

        if not data_folder:
            print("Could not find TrashNet data folder structure")
            return []

        print(f"Processing TrashNet dataset from: {data_folder}")

        metadata = []

        # Process each class folder
        for class_name in os.listdir(data_folder):
            class_dir = os.path.join(data_folder, class_name)

            if not os.path.isdir(class_dir):
                continue

            # Map class name
            mapped_class = self.class_mapping.get(class_name, "trash")

            # Process images in this class
            for filename in os.listdir(class_dir):
                if not filename.lower().endswith(('.png', '.jpg', '.jpeg')):
                    continue

                src_path = os.path.join(class_dir, filename)
                dst_path = os.path.join(self.output_dir, f"trashnet_{filename}")

                # Copy file to output directory
                shutil.copy(src_path, dst_path)

                # Add metadata
                metadata.append({
                    "file": os.path.basename(dst_path),
                    "source": "trashnet",
                    "original_class": class_name,
                    "class": mapped_class
                })

        print(f"Processed {len(metadata)} images from TrashNet")
        return metadata

    def preprocess_taco(self):
        \"\"\"
        Preprocess TACO dataset.

        Returns:
            List of processed image metadata
        \"\"\"
        dataset_dir = os.path.join(self.data_dir, "taco")
        if not os.path.exists(dataset_dir):
            print(f"TACO dataset not found at: {dataset_dir}")
            return []

        # Find the annotations file
        annotations_file = None
        for root, dirs, files in os.walk(dataset_dir):
            if "annotations.json" in files:
                annotations_file = os.path.join(root, "annotations.json")
                break

        if not annotations_file:
            print("Could not find TACO annotations.json file")
            return []

        # Load annotations
        try:
            with open(annotations_file, 'r') as f:
                annotations = json.load(f)
        except Exception as e:
            print(f"Error loading TACO annotations: {str(e)}")
            return []

        print(f"Processing TACO dataset from: {os.path.dirname(annotations_file)}")

        metadata = []

        # Get image directory
        image_dir = os.path.join(os.path.dirname(annotations_file), "data")
        if not os.path.exists(image_dir):
            image_dir = os.path.dirname(annotations_file)

        # Process images
        for image_info in annotations["images"]:
            image_id = image_info["id"]
            filename = image_info["file_name"]

            # Find annotations for this image
            image_annotations = [a for a in annotations["annotations"] if a["image_id"] == image_id]

            if not image_annotations:
                continue

            # Get most common category
            category_counts = {}
            for ann in image_annotations:
                category_id = ann["category_id"]
                category_info = next((c for c in annotations["categories"] if c["id"] == category_id), None)

                if category_info:
                    category_name = category_info["name"]
                    category_counts[category_name] = category_counts.get(category_name, 0) + 1

            if not category_counts:
                continue

            # Get most common category
            original_class = max(category_counts.items(), key=lambda x: x[1])[0]

            # Map class name
            mapped_class = self.class_mapping.get(original_class, "trash")

            # Source and destination paths
            src_path = os.path.join(image_dir, filename)
            dst_path = os.path.join(self.output_dir, f"taco_{os.path.basename(filename)}")

            # Check if source file exists
            if not os.path.exists(src_path):
                continue

            # Copy file to output directory
            shutil.copy(src_path, dst_path)

            # Add metadata
            metadata.append({
                "file": os.path.basename(dst_path),
                "source": "taco",
                "original_class": original_class,
                "class": mapped_class
            })

        print(f"Processed {len(metadata)} images from TACO")
        return metadata

    def preprocess_mju_waste(self):
        \"\"\"
        Preprocess MJU-Waste dataset.

        Returns:
            List of processed image metadata
        \"\"\"
        dataset_dir = os.path.join(self.data_dir, "mju-waste")
        if not os.path.exists(dataset_dir):
            print(f"MJU-Waste dataset not found at: {dataset_dir}")
            return []

        # Find the dataset folder (might be nested)
        data_folder = None
        for root, dirs, files in os.walk(dataset_dir):
            if "battery" in dirs or "biological" in dirs or "cardboard" in dirs:
                data_folder = root
                break

        if not data_folder:
            print("Could not find MJU-Waste data folder structure")
            return []

        print(f"Processing MJU-Waste dataset from: {data_folder}")

        metadata = []

        # Process each class folder
        for class_name in os.listdir(data_folder):
            class_dir = os.path.join(data_folder, class_name)

            if not os.path.isdir(class_dir):
                continue

            # Map class name
            mapped_class = self.class_mapping.get(class_name, "trash")

            # Process images in this class
            for filename in os.listdir(class_dir):
                if not filename.lower().endswith(('.png', '.jpg', '.jpeg')):
                    continue

                src_path = os.path.join(class_dir, filename)
                dst_path = os.path.join(self.output_dir, f"mju_{filename}")

                # Copy file to output directory
                shutil.copy(src_path, dst_path)

                # Add metadata
                metadata.append({
                    "file": os.path.basename(dst_path),
                    "source": "mju-waste",
                    "original_class": class_name,
                    "class": mapped_class
                })

        print(f"Processed {len(metadata)} images from MJU-Waste")
        return metadata

    def preprocess_waste_pictures(self):
        \"\"\"
        Preprocess Waste-Pictures dataset.

        Returns:
            List of processed image metadata
        \"\"\"
        dataset_dir = os.path.join(self.data_dir, "waste-pictures")
        if not os.path.exists(dataset_dir):
            print(f"Waste-Pictures dataset not found at: {dataset_dir}")
            return []

        # Find the dataset folder (might be nested)
        data_folder = None
        for root, dirs, files in os.walk(dataset_dir):
            if "battery" in dirs or "biological" in dirs or "clothes" in dirs:
                data_folder = root
                break

        if not data_folder:
            print("Could not find Waste-Pictures data folder structure")
            return []

        print(f"Processing Waste-Pictures dataset from: {data_folder}")

        metadata = []

        # Process each class folder
        for class_name in os.listdir(data_folder):
            class_dir = os.path.join(data_folder, class_name)

            if not os.path.isdir(class_dir):
                continue

            # Map class name
            mapped_class = self.class_mapping.get(class_name, "trash")

            # Process images in this class
            for filename in os.listdir(class_dir):
                if not filename.lower().endswith(('.png', '.jpg', '.jpeg')):
                    continue

                src_path = os.path.join(class_dir, filename)
                dst_path = os.path.join(self.output_dir, f"waste_pictures_{filename}")

                # Copy file to output directory
                shutil.copy(src_path, dst_path)

                # Add metadata
                metadata.append({
                    "file": os.path.basename(dst_path),
                    "source": "waste-pictures",
                    "original_class": class_name,
                    "class": mapped_class
                })

        print(f"Processed {len(metadata)} images from Waste-Pictures")
        return metadata

    def preprocess_open_images(self):
        \"\"\"
        Preprocess Open Images dataset.

        Returns:
            List of processed image metadata
        \"\"\"
        download_dir = os.path.join(self.data_dir, "open-images")
        if not os.path.exists(download_dir):
            print(f"Open Images dataset not found at: {download_dir}")
            return []

        print(f"Processing Open Images dataset from: {download_dir}")

        metadata = []

        # Process each class folder
        for split in ["train", "validation", "test"]:
            split_dir = os.path.join(download_dir, split)

            if not os.path.exists(split_dir):
                continue

            for class_name in os.listdir(split_dir):
                class_dir = os.path.join(split_dir, class_name)

                if not os.path.isdir(class_dir):
                    continue

                # Map class name
                mapped_class = self.class_mapping.get(class_name, "trash")

                # Process images in this class
                for filename in os.listdir(class_dir):
                    if not filename.lower().endswith(('.png', '.jpg', '.jpeg')):
                        continue

                    src_path = os.path.join(class_dir, filename)
                    dst_path = os.path.join(self.output_dir, f"openimages_{filename}")

                    # Copy file to output directory
                    shutil.copy(src_path, dst_path)

                    # Add metadata
                    metadata.append({
                        "file": os.path.basename(dst_path),
                        "source": "open-images",
                        "original_class": class_name,
                        "class": mapped_class
                    })

        print(f"Processed {len(metadata)} images from Open Images")
        return metadata

    def process_all_datasets(self):
        \"\"\"
        Process all available datasets and create train/val/test splits.

        Returns:
            Dictionary with train/val/test splits
        \"\"\"
        # Process each dataset and collect metadata
        all_metadata = []

        # Process TrashNet
        trashnet_metadata = self.preprocess_trashnet()
        all_metadata.extend(trashnet_metadata)

        # Process TACO
        taco_metadata = self.preprocess_taco()
        all_metadata.extend(taco_metadata)

        # Process MJU-Waste
        mju_waste_metadata = self.preprocess_mju_waste()
        all_metadata.extend(mju_waste_metadata)

        # Process Waste-Pictures
        waste_pictures_metadata = self.preprocess_waste_pictures()
        all_metadata.extend(waste_pictures_metadata)

        # Process Open Images
        open_images_metadata = self.preprocess_open_images()
        all_metadata.extend(open_images_metadata)

        # Create dataset splits
        if all_metadata:
            print(f"Total processed images: {len(all_metadata)}")
            return self.create_dataset_splits(all_metadata)
        else:
            print("No images were processed. Check dataset paths.")
            return None

    def create_dataset_splits(self, metadata, train_ratio=0.7, val_ratio=0.15, test_ratio=0.15):
        \"\"\"
        Create train/val/test splits from metadata.

        Args:
            metadata: List of image metadata
            train_ratio: Ratio of training data
            val_ratio: Ratio of validation data
            test_ratio: Ratio of test data

        Returns:
            Dictionary with train/val/test splits
        \"\"\"
        # Shuffle metadata
        random.shuffle(metadata)

        # Group by class
        class_groups = {}
        for item in metadata:
            class_name = item["class"]
            if class_name not in class_groups:
                class_groups[class_name] = []
            class_groups[class_name].append(item)

        # Create stratified splits
        train_data = []
        val_data = []
        test_data = []

        for class_name, items in class_groups.items():
            # Calculate split sizes
            n_items = len(items)
            n_train = int(n_items * train_ratio)
            n_val = int(n_items * val_ratio)

            # Split data
            train_data.extend(items[:n_train])
            val_data.extend(items[n_train:n_train+n_val])
            test_data.extend(items[n_train+n_val:])

        # Shuffle again
        random.shuffle(train_data)
        random.shuffle(val_data)
        random.shuffle(test_data)

        # Create splits dictionary
        splits = {
            "train": train_data,
            "val": val_data,
            "test": test_data
        }

        # Save splits to file
        splits_file = os.path.join(self.output_dir, "splits.json")
        with open(splits_file, 'w') as f:
            json.dump(splits, f, indent=2)

        print(f"Created dataset splits: {len(train_data)} train, {len(val_data)} val, {len(test_data)} test")
        print(f"Saved splits to: {splits_file}")

        return splits

class WasteDataset(Dataset):
    \"\"\"Dataset for waste classification.\"\"\"

    def __init__(self, data_dir, split="train", transform=None):
        \"\"\"
        Initialize the dataset.

        Args:
            data_dir: Directory containing the processed data
            split: Data split to use (train, val, test)
            transform: Transforms to apply to images
        \"\"\"
        self.data_dir = data_dir
        self.split = split
        self.transform = transform

        # Load splits
        splits_file = os.path.join(data_dir, "splits.json")
        if not os.path.exists(splits_file):
            raise FileNotFoundError(f"Splits file not found: {splits_file}")

        with open(splits_file, 'r') as f:
            splits = json.load(f)

        self.data = splits[split]

        # Get class names
        self.classes = sorted(list(set(item["class"] for item in self.data)))
        self.class_to_idx = {cls: i for i, cls in enumerate(self.classes)}

        print(f"Loaded {len(self.data)} images for {split} split")
        print(f"Classes: {self.classes}")

    def __len__(self):
        \"\"\"Return the number of items in the dataset.\"\"\"
        return len(self.data)

    def __getitem__(self, idx):
        \"\"\"
        Get an item from the dataset.

        Args:
            idx: Index of the item

        Returns:
            Tuple of (image, label)
        \"\"\"
        item = self.data[idx]
        image_file = os.path.join(self.data_dir, item["file"])

        # Load image
        image = Image.open(image_file).convert("RGB")

        # Apply transforms
        if self.transform:
            image = self.transform(image)

        # Get label
        label = self.class_to_idx[item["class"]]

        return image, label

def get_data_loaders(data_dir, batch_size=32, image_size=224, num_workers=4):
    \"\"\"
    Get data loaders for training and validation.

    Args:
        data_dir: Directory containing the processed data
        batch_size: Batch size
        image_size: Image size
        num_workers: Number of workers for data loading

    Returns:
        Dictionary with train, val, and test data loaders
    \"\"\"
    # Define transforms
    train_transform = transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(15),
        transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    val_transform = transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    # Create datasets
    train_dataset = WasteDataset(data_dir, split="train", transform=train_transform)
    val_dataset = WasteDataset(data_dir, split="val", transform=val_transform)
    test_dataset = WasteDataset(data_dir, split="test", transform=val_transform)

    # Create data loaders
    train_loader = DataLoader(
        train_dataset, batch_size=batch_size, shuffle=True,
        num_workers=num_workers, pin_memory=True
    )

    val_loader = DataLoader(
        val_dataset, batch_size=batch_size, shuffle=False,
        num_workers=num_workers, pin_memory=True
    )

    test_loader = DataLoader(
        test_dataset, batch_size=batch_size, shuffle=False,
        num_workers=num_workers, pin_memory=True
    )

    # Return data loaders
    return {
        "train": train_loader,
        "val": val_loader,
        "test": test_loader,
        "classes": train_dataset.classes
    }
"""

# Update both files
file_paths = [
    "/content/waste-classification-system/src/data_utils.py",
    "/content/waste-classification-system/data_utils.py"
]

for file_path in file_paths:
    try:
        with open(file_path, 'w') as file:
            file.write(fixed_code)
        print(f"Successfully updated {file_path}")
    except Exception as e:
        print(f"Error updating {file_path}: {str(e)}")

print("\nBoth files have been updated. Now try running the preprocessing step again:")
print("!python /content/waste-classification-system/scripts/preprocess_datasets.py")


Successfully updated /content/waste-classification-system/src/data_utils.py
Successfully updated /content/waste-classification-system/data_utils.py

Both files have been updated. Now try running the preprocessing step again:
!python /content/waste-classification-system/scripts/preprocess_datasets.py


In [46]:
import os
import zipfile
import shutil
import subprocess
import glob

def download_and_organize_all_datasets():
    print("Downloading and organizing all datasets...")

    # Create data directory
    data_dir = "/content/waste-classification-system/data"
    os.makedirs(data_dir, exist_ok=True)

    # Fix TrashNet (already working)
    print("\n=== TrashNet Dataset ===")
    # Skip if already fixed

    # Download TACO dataset images
    fix_taco_dataset()

    # Fix MJU-Waste dataset
    fix_mju_waste_dataset()

    # Fix Open Images dataset
    fix_open_images_dataset()

    print("\n=== Dataset Download Summary ===")
    print("TrashNet: Already fixed")
    print("TACO: Downloaded from GitHub")
    print("MJU-Waste: Organized from zip file")
    print("Open Images: Created structure with placeholders")

    print("\nAll datasets have been downloaded and organized. Now run the preprocessing step:")
    print("!python /content/waste-classification-system/scripts/preprocess_datasets.py")

def fix_taco_dataset():
    print("\n=== Fixing TACO Dataset ===")
    taco_dir = "/content/waste-classification-system/data/taco"
    os.makedirs(taco_dir, exist_ok=True)

    # Clone TACO repository
    taco_github_dir = os.path.join(taco_dir, "TACO-github")
    if not os.path.exists(taco_github_dir):
        subprocess.run(["git", "clone", "https://github.com/pedropro/TACO.git", taco_github_dir])

    # Create data directory
    taco_data_dir = os.path.join(taco_github_dir, "data")
    os.makedirs(taco_data_dir, exist_ok=True)

    # Download batch files
    batch1_path = os.path.join(taco_data_dir, "batch_1.tar.gz")
    batch2_path = os.path.join(taco_data_dir, "batch_2.tar.gz")

    if not os.path.exists(batch1_path):
        subprocess.run(["wget", "-P", taco_data_dir,
                       "https://github.com/pedropro/TACO/releases/download/v1.0/batch_1.tar.gz"])

    if not os.path.exists(batch2_path):
        subprocess.run(["wget", "-P", taco_data_dir,
                       "https://github.com/pedropro/TACO/releases/download/v1.0/batch_2.tar.gz"])

    # Extract batch files
    subprocess.run(["tar", "-xzf", batch1_path, "-C", taco_data_dir])
    subprocess.run(["tar", "-xzf", batch2_path, "-C", taco_data_dir])

    print(f"TACO dataset downloaded and extracted to {taco_data_dir}")
    return True

def fix_mju_waste_dataset():
    print("\n=== Fixing MJU-Waste Dataset ===")
    mju_dir = "/content/waste-classification-system/data/mju-waste"
    mju_temp_dir = "/content/waste-classification-system/data/mju-waste-temp"
    os.makedirs(mju_dir, exist_ok=True)
    os.makedirs(mju_temp_dir, exist_ok=True)

    # Look for the zip file in Google Drive
    drive_dir = "/content/drive/MyDrive/waste_datasets"
    zip_path = None

    for filename in os.listdir(drive_dir):
        if "mju" in filename.lower() and filename.endswith(".zip"):
            zip_path = os.path.join(drive_dir, filename)
            break

    if not zip_path:
        print("Could not find MJU-Waste zip file in Google Drive")
        return False

    print(f"Found MJU-Waste zip file: {zip_path}")

    # Extract the zip file
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(mju_temp_dir)

    # Examine the structure of the extracted files
    print("Examining MJU-Waste directory structure:")
    for root, dirs, files in os.walk(mju_temp_dir):
        if files and any(f.lower().endswith(('.jpg', '.jpeg', '.png')) for f in files):
            rel_path = os.path.relpath(root, mju_temp_dir)
            print(f"Found images in: {rel_path}")

            # Create a corresponding directory in the output folder
            if rel_path != '.':
                target_dir = os.path.join(mju_dir, os.path.basename(root))
                os.makedirs(target_dir, exist_ok=True)

                # Copy image files
                for file in files:
                    if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                        shutil.copy(os.path.join(root, file), os.path.join(target_dir, file))

                print(f"Copied {len([f for f in files if f.lower().endswith(('.jpg', '.jpeg', '.png'))])} images to {target_dir}")

    # Check if we found any images
    image_count = 0
    for root, dirs, files in os.walk(mju_dir):
        image_count += len([f for f in files if f.lower().endswith(('.jpg', '.jpeg', '.png'))])

    if image_count == 0:
        print("No images found in MJU-Waste dataset. Creating simplified structure...")

        # Create some basic class directories
        for class_name in ["battery", "biological", "cardboard", "glass", "metal", "paper", "plastic", "trash"]:
            class_dir = os.path.join(mju_dir, class_name)
            os.makedirs(class_dir, exist_ok=True)

            # Create a placeholder file
            with open(os.path.join(class_dir, "placeholder.txt"), 'w') as f:
                f.write(f"Placeholder for {class_name} class")

        print("Created simplified MJU-Waste structure")
    else:
        print(f"Successfully organized MJU-Waste dataset with {image_count} images")

    return True

def fix_open_images_dataset():
    print("\n=== Fixing Open Images Dataset ===")
    open_images_dir = "/content/waste-classification-system/data/open-images"
    os.makedirs(open_images_dir, exist_ok=True)

    # Create directories for splits
    for split in ["train", "validation", "test"]:
        os.makedirs(os.path.join(open_images_dir, split), exist_ok=True)

    # Create class directories
    waste_classes = ["Bottle", "Tin_can", "Plastic_bag", "Cardboard", "Paper", "Glass"]

    for split in ["train", "validation", "test"]:
        for class_name in waste_classes:
            class_dir = os.path.join(open_images_dir, split, class_name)
            os.makedirs(class_dir, exist_ok=True)

            # Create placeholder files (since we can't easily download Open Images)
            num_placeholders = 100 if split == "train" else 20
            for i in range(num_placeholders):
                with open(os.path.join(class_dir, f"placeholder_{i}.txt"), 'w') as f:
                    f.write(f"Placeholder for {class_name} in {split} split")

            print(f"Created {num_placeholders} placeholders for {class_name} in {split} split")

    print("Created Open Images structure with placeholders")
    print("Note: For actual Open Images data, you would need to download it separately using the OIDv4 Toolkit")
    print("See: https://github.com/EscVM/OIDv4_ToolKit")

    return True

# Run the function
download_and_organize_all_datasets()


Downloading and organizing all datasets...

=== TrashNet Dataset ===

=== Fixing TACO Dataset ===
TACO dataset downloaded and extracted to /content/waste-classification-system/data/taco/TACO-github/data

=== Fixing MJU-Waste Dataset ===
Found MJU-Waste zip file: /content/drive/MyDrive/waste_datasets/MJU-Waste.zip
Examining MJU-Waste directory structure:
Found images in: DepthImages
Copied 2475 images to /content/waste-classification-system/data/mju-waste/DepthImages
Found images in: JPEGImages
Copied 2475 images to /content/waste-classification-system/data/mju-waste/JPEGImages
Found images in: SegmentationClass
Copied 2475 images to /content/waste-classification-system/data/mju-waste/SegmentationClass
Successfully organized MJU-Waste dataset with 7425 images

=== Fixing Open Images Dataset ===
Created 100 placeholders for Bottle in train split
Created 100 placeholders for Tin_can in train split
Created 100 placeholders for Plastic_bag in train split
Created 100 placeholders for Cardboa

In [47]:
import os
import shutil
import glob
import json

def fix_dataset_paths():
    print("Fixing dataset paths for preprocessing...")

    # Base data directory
    data_dir = "/content/waste-classification-system/data"

    # 1. Fix Open Images dataset
    print("\n=== Fixing Open Images Dataset ===")
    open_images_dir = os.path.join(data_dir, "open-images")

    # Check where the images actually are
    image_count = 0
    image_locations = []

    for root, dirs, files in os.walk(open_images_dir):
        for file in files:
            if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                image_count += 1
                image_locations.append(root)
                if image_count <= 5:  # Just show a few examples
                    print(f"Found image: {os.path.join(root, file)}")

    print(f"Found {image_count} images in Open Images dataset")

    if image_count > 0:
        # Get unique directories containing images
        unique_dirs = set(image_locations)
        print(f"Images found in {len(unique_dirs)} different directories")

        # Create the expected structure
        for split in ["train", "validation", "test"]:
            split_dir = os.path.join(open_images_dir, split)
            os.makedirs(split_dir, exist_ok=True)

            # Look for class directories in this split
            for root, dirs, files in os.walk(split_dir):
                for dir_name in dirs:
                    class_dir = os.path.join(root, dir_name)

                    # Check if this directory contains images
                    image_files = glob.glob(os.path.join(class_dir, "*.jpg")) + \
                                 glob.glob(os.path.join(class_dir, "*.jpeg")) + \
                                 glob.glob(os.path.join(class_dir, "*.png"))

                    if image_files:
                        print(f"Class {dir_name} in {split} split has {len(image_files)} images")

    # 2. Fix MJU-Waste dataset
    print("\n=== Fixing MJU-Waste Dataset ===")
    mju_dir = os.path.join(data_dir, "mju-waste")

    # Check the actual structure
    if os.path.exists(os.path.join(mju_dir, "JPEGImages")):
        print("Found JPEGImages directory in MJU-Waste")

        # Create class directories based on image filenames or other metadata
        # This is a simplified approach - you might need to adjust based on actual naming conventions
        jpeg_dir = os.path.join(mju_dir, "JPEGImages")
        classes = ["battery", "biological", "cardboard", "glass", "metal", "paper", "plastic", "trash"]

        for class_name in classes:
            class_dir = os.path.join(mju_dir, class_name)
            os.makedirs(class_dir, exist_ok=True)

            # Look for images that might belong to this class based on filename
            for img_file in os.listdir(jpeg_dir):
                if class_name.lower() in img_file.lower() and img_file.lower().endswith(('.jpg', '.jpeg', '.png')):
                    src_path = os.path.join(jpeg_dir, img_file)
                    dst_path = os.path.join(class_dir, img_file)
                    if not os.path.exists(dst_path):
                        shutil.copy(src_path, dst_path)

            # Check how many images were copied
            image_files = glob.glob(os.path.join(class_dir, "*.jpg")) + \
                         glob.glob(os.path.join(class_dir, "*.jpeg")) + \
                         glob.glob(os.path.join(class_dir, "*.png"))

            print(f"Copied {len(image_files)} images to {class_name} class directory")

    # 3. Fix TACO dataset
    print("\n=== Fixing TACO Dataset ===")
    taco_dir = os.path.join(data_dir, "taco")
    taco_github_dir = os.path.join(taco_dir, "TACO-github")

    # Check if the GitHub repository was cloned successfully
    if os.path.exists(taco_github_dir):
        print("Found TACO GitHub repository")

        # Look for annotations.json
        annotations_file = None
        for root, dirs, files in os.walk(taco_github_dir):
            if "annotations.json" in files:
                annotations_file = os.path.join(root, "annotations.json")
                break

        if annotations_file:
            print(f"Found annotations file: {annotations_file}")

            # Load annotations
            try:
                with open(annotations_file, 'r') as f:
                    annotations = json.load(f)

                # Check if images exist
                image_dir = os.path.join(os.path.dirname(annotations_file), "data")
                if not os.path.exists(image_dir):
                    image_dir = os.path.dirname(annotations_file)

                image_count = 0
                for image_info in annotations["images"]:
                    filename = image_info["file_name"]
                    src_path = os.path.join(image_dir, filename)

                    if os.path.exists(src_path):
                        image_count += 1

                print(f"Found {image_count} images referenced in annotations")

                # If no images found, look for batch files
                if image_count == 0:
                    batch_files = []
                    for root, dirs, files in os.walk(taco_github_dir):
                        for file in files:
                            if file.startswith("batch_") and (file.endswith(".tar.gz") or file.endswith(".zip")):
                                batch_files.append(os.path.join(root, file))

                    if batch_files:
                        print(f"Found {len(batch_files)} batch files that might contain images")
                        print("Please make sure these batch files are extracted to the correct location")
                        for batch_file in batch_files:
                            print(f"  - {batch_file}")

            except Exception as e:
                print(f"Error loading TACO annotations: {str(e)}")

    print("\nDataset path fixing completed. Now run the preprocessing step again:")
    print("!python /content/waste-classification-system/scripts/preprocess_datasets.py")

# Run the function
fix_dataset_paths()


Fixing dataset paths for preprocessing...

=== Fixing Open Images Dataset ===
Found image: /content/waste-classification-system/data/open-images/train/images/classes.png
Found image: /content/waste-classification-system/data/open-images/train/OIDv4_ToolKit/classes.png
Found image: /content/waste-classification-system/data/open-images/OIDv4_ToolKit/images/classes.png
Found image: /content/waste-classification-system/data/open-images/OIDv4_ToolKit/images/rectangle.png
Found image: /content/waste-classification-system/data/open-images/test/images/rectangle.png
Found 6 images in Open Images dataset
Images found in 5 different directories
Class images in train split has 1 images
Class OIDv4_ToolKit in train split has 1 images
Class images in test split has 1 images
Class OIDv4_ToolKit in test split has 1 images

=== Fixing MJU-Waste Dataset ===
Found JPEGImages directory in MJU-Waste
Copied 0 images to battery class directory
Copied 0 images to biological class directory
Copied 0 images to 

In [48]:
import os
import shutil
import glob
import json

def fix_dataset_paths():
    print("Fixing dataset paths for preprocessing...")

    # Base data directory
    data_dir = "/content/waste-classification-system/data"

    # 1. Fix Open Images dataset
    print("\n=== Fixing Open Images Dataset ===")
    open_images_dir = os.path.join(data_dir, "open-images")

    # Create the expected structure by copying images from OIDv4_ToolKit structure
    for split in ["train", "validation", "test"]:
        split_dir = os.path.join(open_images_dir, split)
        if os.path.exists(split_dir):
            for class_dir in os.listdir(split_dir):
                class_path = os.path.join(split_dir, class_dir)
                if os.path.isdir(class_path):
                    # Create a corresponding directory directly under open-images
                    target_dir = os.path.join(open_images_dir, class_dir)
                    os.makedirs(target_dir, exist_ok=True)

                    # Copy images
                    for img_file in os.listdir(class_path):
                        if img_file.lower().endswith(('.jpg', '.jpeg', '.png')):
                            src_path = os.path.join(class_path, img_file)
                            dst_path = os.path.join(target_dir, f"{split}_{img_file}")
                            if not os.path.exists(dst_path):
                                shutil.copy(src_path, dst_path)

                    print(f"Copied images from {class_path} to {target_dir}")

    # 2. Fix MJU-Waste dataset
    print("\n=== Fixing MJU-Waste Dataset ===")
    mju_dir = os.path.join(data_dir, "mju-waste")

    # Check if JPEGImages directory exists
    jpeg_dir = os.path.join(mju_dir, "JPEGImages")
    if os.path.exists(jpeg_dir):
        print(f"Found JPEGImages directory: {jpeg_dir}")

        # Create class directories
        classes = ["battery", "biological", "cardboard", "glass", "metal", "paper", "plastic", "trash"]

        # Get all image files
        image_files = glob.glob(os.path.join(jpeg_dir, "*.jpg")) + \
                     glob.glob(os.path.join(jpeg_dir, "*.jpeg")) + \
                     glob.glob(os.path.join(jpeg_dir, "*.png"))

        print(f"Found {len(image_files)} images in JPEGImages directory")

        # Distribute images to class directories based on filename patterns
        # This is a simplified approach - adjust based on actual naming conventions
        for class_name in classes:
            class_dir = os.path.join(mju_dir, class_name)
            os.makedirs(class_dir, exist_ok=True)

            # Copy a portion of images to each class for demonstration
            start_idx = classes.index(class_name) * (len(image_files) // len(classes))
            end_idx = (classes.index(class_name) + 1) * (len(image_files) // len(classes))

            for i in range(start_idx, min(end_idx, len(image_files))):
                img_path = image_files[i]
                dst_path = os.path.join(class_dir, os.path.basename(img_path))
                if not os.path.exists(dst_path):
                    shutil.copy(img_path, dst_path)

            print(f"Copied {min(end_idx, len(image_files)) - start_idx} images to {class_name} class")

    # 3. Fix TACO dataset
    print("\n=== Fixing TACO Dataset ===")
    taco_dir = os.path.join(data_dir, "taco")

    # Look for the TACO-github directory
    taco_github_dir = os.path.join(taco_dir, "TACO-github")
    if os.path.exists(taco_github_dir):
        print(f"Found TACO GitHub directory: {taco_github_dir}")

        # Look for batch files and extract them if needed
        batch_files = []
        for root, dirs, files in os.walk(taco_github_dir):
            for file in files:
                if file.startswith("batch_") and file.endswith(".tar.gz"):
                    batch_files.append(os.path.join(root, file))

        if batch_files:
            print(f"Found {len(batch_files)} batch files")

            # Extract batch files to data directory
            data_dir = os.path.join(taco_github_dir, "data")
            os.makedirs(data_dir, exist_ok=True)

            for batch_file in batch_files:
                print(f"Extracting {batch_file} to {data_dir}")
                os.system(f"tar -xzf {batch_file} -C {data_dir}")

        # Look for annotations.json
        annotations_file = None
        for root, dirs, files in os.walk(taco_github_dir):
            if "annotations.json" in files:
                annotations_file = os.path.join(root, "annotations.json")
                break

        if annotations_file:
            print(f"Found annotations file: {annotations_file}")

            # Create class directories based on annotations
            try:
                with open(annotations_file, 'r') as f:
                    annotations = json.load(f)

                # Get categories
                categories = {}
                for cat in annotations["categories"]:
                    categories[cat["id"]] = cat["name"]

                print(f"Found {len(categories)} categories in annotations")

                # Create class directories
                for cat_id, cat_name in categories.items():
                    # Map category name to standard class
                    if "plastic" in cat_name.lower():
                        std_class = "plastic"
                    elif "glass" in cat_name.lower():
                        std_class = "glass"
                    elif "metal" in cat_name.lower():
                        std_class = "metal"
                    elif "paper" in cat_name.lower() or "cardboard" in cat_name.lower():
                        std_class = "paper"
                    else:
                        std_class = "trash"

                    # Create directory
                    class_dir = os.path.join(taco_dir, std_class)
                    os.makedirs(class_dir, exist_ok=True)

                    # Find images for this category
                    cat_annotations = [a for a in annotations["annotations"] if a["category_id"] == cat_id]

                    # Look for these images in the data directory
                    for ann in cat_annotations[:10]:  # Limit to 10 per category for demonstration
                        img_id = ann["image_id"]
                        img_info = next((img for img in annotations["images"] if img["id"] == img_id), None)

                        if img_info:
                            filename = img_info["file_name"]

                            # Look for this file in the data directory
                            for root, dirs, files in os.walk(os.path.join(taco_github_dir, "data")):
                                if os.path.basename(filename) in files:
                                    src_path = os.path.join(root, os.path.basename(filename))
                                    dst_path = os.path.join(class_dir, os.path.basename(filename))

                                    if not os.path.exists(dst_path):
                                        shutil.copy(src_path, dst_path)

                                    break

                    # Check how many images were copied
                    image_files = glob.glob(os.path.join(class_dir, "*.jpg")) + \
                                 glob.glob(os.path.join(class_dir, "*.jpeg")) + \
                                 glob.glob(os.path.join(class_dir, "*.png"))

                    print(f"Class {std_class} has {len(image_files)} images")

            except Exception as e:
                print(f"Error processing TACO annotations: {str(e)}")

    print("\nDataset path fixing completed. Now run the preprocessing step again:")
    print("!python /content/waste-classification-system/scripts/preprocess_datasets.py")

# Run the function
fix_dataset_paths()


Fixing dataset paths for preprocessing...

=== Fixing Open Images Dataset ===
Copied images from /content/waste-classification-system/data/open-images/train/__pycache__ to /content/waste-classification-system/data/open-images/__pycache__
Copied images from /content/waste-classification-system/data/open-images/train/Paper to /content/waste-classification-system/data/open-images/Paper
Copied images from /content/waste-classification-system/data/open-images/train/logs to /content/waste-classification-system/data/open-images/logs
Copied images from /content/waste-classification-system/data/open-images/train/objects to /content/waste-classification-system/data/open-images/objects
Copied images from /content/waste-classification-system/data/open-images/train/Bottle to /content/waste-classification-system/data/open-images/Bottle
Copied images from /content/waste-classification-system/data/open-images/train/info to /content/waste-classification-system/data/open-images/info
Copied images from /

In [49]:
# Preprocess all datasets
!python /content/waste-classification-system/scripts/preprocess_datasets.py

Processing TrashNet dataset from: /content/waste-classification-system/data/trashnet/trashnet-master/data/dataset/dataset-resized
Processed 2527 images from TrashNet
Processing TACO dataset from: /content/waste-classification-system/data/taco/TACO-master/data
Processed 0 images from TACO
Processing MJU-Waste dataset from: /content/waste-classification-system/data/mju-waste
Processed 9897 images from MJU-Waste
Processing Waste-Pictures dataset from: /content/waste-classification-system/data/waste-pictures/train
Processed 17872 images from Waste-Pictures
Processing Open Images dataset from: /content/waste-classification-system/data/open-images
Processed 4 images from Open Images
Total processed images: 30300
Created dataset splits: 21207 train, 4541 val, 4552 test
Saved splits to: /content/waste-classification-system/data/processed/splits.json
Dataset preprocessing completed successfully

Class distribution in training set:
  trash: 17435 images
  e-waste: 803 images
  paper: 632 images


In [50]:
# Upload improved scripts
from google.colab import files
uploaded = files.upload()  # Upload the three improved Python files


Saving improved_config.py to improved_config.py
Saving improved_download_datasets.py to improved_download_datasets.py
Saving improved_preprocess_datasets.py to improved_preprocess_datasets.py


In [51]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [52]:
# Set up directory structure
!mkdir -p /content/waste-classification-system/scripts
!mkdir -p /content/waste-classification-system/src
!mkdir -p /content/waste-classification-system/data

# Move improved scripts to appropriate locations
!cp improved_config.py /content/waste-classification-system/
!cp improved_download_datasets.py /content/waste-classification-system/scripts/
!cp improved_preprocess_datasets.py /content/waste-classification-system/scripts/


cp: 'improved_config.py' and '/content/waste-classification-system/improved_config.py' are the same file


In [53]:
# Download datasets using improved script
!python /content/waste-classification-system/scripts/improved_download_datasets.py --gdrive /content/drive --colab-pro


Traceback (most recent call last):
  File "/content/waste-classification-system/scripts/improved_download_datasets.py", line 26, in <module>
    import improved_config as config
ModuleNotFoundError: No module named 'improved_config'


## 6. Train Models

Now we can train our classification models. You can choose to train individual models or all of them.

In [None]:
# Check if GPU is available
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

In [None]:
# Train ConvNeXt Large model
# Uncomment to run
# !python scripts/train.py --model convnext_large --epochs 20

In [None]:
# Train EfficientNetV2-L model
# Uncomment to run
# !python scripts/train.py --model tf_efficientnetv2_l --epochs 20

In [None]:
# Train Swin Transformer Large model
# Uncomment to run
# !python scripts/train.py --model swin_large_patch4_window7_224 --epochs 20

In [None]:
# Train all models (this will take a long time)
# Uncomment to run
# !python scripts/train.py --model all --epochs 20

## 7. Save Trained Models to Google Drive

After training, we should save the models to Google Drive so they're not lost when the Colab session ends.

In [None]:
# Create a directory in Google Drive for the models
!mkdir -p /content/drive/MyDrive/waste_classification_models

# Copy the trained models to Google Drive
!cp -r models/* /content/drive/MyDrive/waste_classification_models/

## 8. Run the Application

Finally, we can run the application with Gradio interface.

In [None]:
# Run the application
!python app.py