<a href="https://colab.research.google.com/github/hackdavid/recipe-generation-using-fridge-image/blob/main/course_work_ai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!git clone https://github.com/hackdavid/recipe-generation-using-fridge-image.git

Cloning into 'recipe-generation-using-fridge-image'...
remote: Enumerating objects: 52, done.[K
remote: Counting objects: 100% (52/52), done.[K
remote: Compressing objects: 100% (36/36), done.[K
remote: Total 52 (delta 19), reused 49 (delta 16), pack-reused 0 (from 0)[K
Receiving objects: 100% (52/52), 61.14 KiB | 6.11 MiB/s, done.
Resolving deltas: 100% (19/19), done.


In [None]:
# Install required packages
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
%pip install datasets transformers pillow pyyaml wandb scikit-learn matplotlib seaborn tqdm psutil

print("✓ Dependencies installed")

Looking in indexes: https://download.pytorch.org/whl/cu118
✓ Dependencies installed


In [None]:
# Check GPU availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
else:
    print("⚠️  GPU not available. Please enable GPU: Runtime → Change runtime type → GPU")


CUDA available: False
⚠️  GPU not available. Please enable GPU: Runtime → Change runtime type → GPU


In [None]:
import sys
sys.path.append('/content/recipe-generation-using-fridge-image')

In [None]:
from trainer.hf_dataset import HuggingFaceStreamDataset, get_hf_data_loaders
from trainer.config import load_config
from trainer.metrics import calculate_metrics
from trainer.validation import validate
from models import create_resnet50, create_se_resnet50

In [None]:
import os

# Create checkpoints directory
os.makedirs('checkpoints', exist_ok=True)
print("\n✓ Checkpoints directory created")


✓ Checkpoints directory created


In [None]:
# Set your config file path here
CONFIG_PATH = '/content/exp1.yaml'  # Change this to your config file

# Verify config exists
import os
if os.path.exists(CONFIG_PATH):
    print(f"✓ Config file found: {CONFIG_PATH}")
    # Display config preview
    with open(CONFIG_PATH, 'r') as f:
        lines = f.readlines()[:15]  # Show first 15 lines
        print("\nConfig file preview:")
        print("="*50)
        print(''.join(lines))
        if len(lines) == 15:
            print("... (truncated)")
else:
    print(f"⚠️  Config file not found: {CONFIG_PATH}")
    print("Please check the path or update CONFIG_PATH")

✓ Config file found: /content/exp1.yaml

Config file preview:
# ResNet-50 Training Configuration
# Ingredient Recognition Model

# Dataset Configuration
data:
  data_source: "huggingface"  # Options: "huggingface" or "folder"
  dataset_name: "ibrahimdaud/raw-food-recognition"  # HuggingFace dataset name
  train_split: "train"  # Training split name (ignored if use_custom_split=true)
  val_split: "validation"  # Validation split name (ignored if use_custom_split=true)
  use_custom_split: false  # If true, ignore predefined splits and create 80/20 train/val split in streaming mode
  train_ratio: 0.8  # Ratio for training split when use_custom_split=true (default: 0.8 = 80%)
  data_dir: ""  # Only used if data_source is "folder"
  image_size: 224
  num_workers: 4
  class_mapping_path: "trainer/class_mapping.json"  # Path to class mapping JSON file (generate first using generate_class_mapping.py)

... (truncated)


In [None]:
# Load configuration
print(f"Loading configuration from: {CONFIG_PATH}")
print(f"Current working directory: {os.getcwd()}")
cfg = load_config(CONFIG_PATH)

# Print configuration summary
print("\n" + "="*50)
print("Configuration Summary")
print("="*50)
print(f"Model: {cfg['model']}")
print(f"Dataset: {cfg.get('dataset_name', 'N/A')}")
print(f"Epochs: {cfg['epochs']}")
print(f"Batch size: {cfg['batch_size']}")
print(f"Learning rate: {cfg['lr']}")
print(f"Optimizer: {cfg['optimizer']}")
print(f"Scheduler: {cfg['scheduler'].get('type', 'StepLR')}")
print(f"Wandb: {'Enabled' if cfg['use_wandb'] else 'Disabled'}")
print("="*50)


Loading configuration from: /content/exp1.yaml
Current working directory: /content

Configuration Summary
Model: resnet50
Dataset: ibrahimdaud/raw-food-recognition
Epochs: 50
Batch size: 32
Learning rate: 0.001
Optimizer: Adam
Scheduler: StepLR
Wandb: Enabled


In [None]:
# huggingface login
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
The token `write_access` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `write_acces

In [None]:
# Set up sys.argv to simulate command line call
original_argv = sys.argv.copy()
sys.argv = ['train.py', CONFIG_PATH]

try:
    # Import and run main function
    from trainer.train import main

    print("="*50)
    print("Starting Training")
    print("="*50)
    print(f"Using GPU: {torch.cuda.is_available()}")
    if torch.cuda.is_available():
        print(f"GPU: {torch.cuda.get_device_name(0)}")
    print("="*50)

    # Run training
    main()
finally:
    # Restore original argv
    sys.argv = original_argv

Starting Training
Using GPU: False
Logging to file: /content/recipe-generation-using-fridge-image/logs/resnet50_2025-12-05_21-44-02.log
Loading configuration from: /content/exp1.yaml

Configuration Summary
Config file: /content/exp1.yaml
Model: resnet50
Data directory: 
Dataset name: ibrahimdaud/raw-food-recognition
Epochs: 50
Batch size: 32
Learning rate: 0.001
Optimizer: Adam
Scheduler: StepLR
Wandb: Enabled
Using device: cpu
✓ Wandb API key set from config


  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: Currently logged in as: [33mibrahimdaud03[0m ([33mibrahimdaud[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


✓ Wandb initialized: https://wandb.ai/ibrahimdaud/ingredient-recognition/runs/ac8tzcks

Loading datasets...
Class mapping file not found: trainer/class_mapping.json
Training will proceed without mapping (may fail if labels are strings)
Loading HuggingFace dataset: ibrahimdaud/raw-food-recognition


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/7.79k [00:00<?, ?B/s]

data/train-00000-of-00006.parquet:   0%|          | 0.00/423M [00:00<?, ?B/s]

data/train-00001-of-00006.parquet:   0%|          | 0.00/424M [00:00<?, ?B/s]

data/train-00002-of-00006.parquet:   0%|          | 0.00/425M [00:00<?, ?B/s]

data/train-00003-of-00006.parquet:   0%|          | 0.00/389M [00:00<?, ?B/s]

data/train-00004-of-00006.parquet:   0%|          | 0.00/418M [00:00<?, ?B/s]

KeyboardInterrupt: 

In [None]:
from google.colab import drive
drive.mount('/content/drive')