<a href="https://colab.research.google.com/github/plg2001/FondamentiAI/blob/main/Neural_Network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

üî∑ Project Introduction

Breast cancer is one of the leading causes of cancer-related morbidity worldwide, and early detection remains a crucial factor for improving patient outcomes. In this project, we aim to develop a multimodal deep learning framework capable of analyzing mammography and breast ultrasound images for tumor classification, representation learning, and synthetic data generation.

The project is structured into three main components:

1. Mammography Analysis (CBIS-DDSM & INBreast)

We first extract radiomic feature vectors from the CBIS-DDSM dataset and use them to train classification models capable of distinguishing benign from malignant lesions. INBreast is employed as an external testing set to evaluate the generalization capability of the trained models.

2. Ultrasound Image Generation (BUSI Dataset)

To complement the mammography-based analysis, we implement a diffusion-model-based generator (‚ÄúMini-BUSGen‚Äù) trained on the BUSI breast ultrasound dataset. This module enables the creation of realistic synthetic ultrasound images conditioned on lesion type.

3. Multimodal Integration (Future Stage)

Synthetic and real ultrasound images can be integrated with mammographic representations to build a multimodal diagnostic framework inspired by recent foundational models for breast cancer screening.

This notebook serves as the starting point of the pipeline.
We begin by importing all necessary libraries and loading the datasets from Google Drive.

In [16]:
# --- 1. Core Libraries & File System ---
import os
import glob
import logging
from pathlib import Path

# --- 2. Data Handling ---
import pandas as pd
import numpy as np

# --- 3. Image Processing & DICOM ---
import SimpleITK as sitk  # Per PyRadiomics
import pydicom             # Per l'ispezione standard dei file DICOM

# --- 4. Radiomics ---
from radiomics import featureextractor, setVerbosity

# --- 5. Deep Learning (PyTorch) ---
# Importati ora per l'uso nelle Fasi 2, 3 e 4
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms

# --- 6. Transformers (Hugging Face) ---
# Per i modelli ViT, Swin, MMT (Fase 3)
# pip install transformers
try:
    import transformers
except ImportError:
    print("Transformers library not found. Install with: pip install transformers")

# --- 7. Utilities & Plotting ---
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt

# --- Setup Logging ---
# Imposta la verbosit√† di PyRadiomics per evitare output eccessivi
setVerbosity(logging.CRITICAL)

print("Tutte le librerie del progetto sono state importate.")
print(f"PyTorch versione: {torch.__version__}")
# Google Colab integration
from google.colab import drive



Tutte le librerie del progetto sono state importate.
PyTorch versione: 2.6.0+cu124


In [15]:
# ============================
# üìÅ MOUNT GOOGLE DRIVE
# ============================

drive.mount('/content/drive')

# Base path for datasets
DATASET_ROOT = "/content/drive/MyDrive/datasets"

# Paths for each dataset
CBIS_PATH = os.path.join(DATASET_ROOT, "CBIS-DDSM")
BUSI_PATH = os.path.join(DATASET_ROOT, "BUSI")
INBREAST_PATH = os.path.join(DATASET_ROOT, "INBreast")

print("‚úî Google Drive mounted.")
print("‚úî Dataset paths:")
print("   CBIS-DDSM  ‚Üí", CBIS_PATH)
print("   BUSI       ‚Üí", BUSI_PATH)
print("   INBreast   ‚Üí", INBREAST_PATH)

# ============================
# üìÇ CHECK CONTENTS
# ============================
print("\nüîç Checking dataset folders...")

for path in [CBIS_PATH, BUSI_PATH, INBREAST_PATH]:
    if os.path.exists(path):
        print(f"‚úî Found: {path}")
        print("  Files:", len(glob.glob(path + '/**/*', recursive=True)))
    else:
        print(f"‚ùå NOT FOUND: {path} ‚Äî please verify your Google Drive structure.")



Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
‚úî Google Drive mounted.
‚úî Dataset paths:
   CBIS-DDSM  ‚Üí /content/drive/MyDrive/datasets/CBIS-DDSM
   BUSI       ‚Üí /content/drive/MyDrive/datasets/BUSI
   INBreast   ‚Üí /content/drive/MyDrive/datasets/INBreast

üîç Checking dataset folders...
‚ùå NOT FOUND: /content/drive/MyDrive/datasets/CBIS-DDSM ‚Äî please verify your Google Drive structure.
‚úî Found: /content/drive/MyDrive/datasets/BUSI
  Files: 1581
‚ùå NOT FOUND: /content/drive/MyDrive/datasets/INBreast ‚Äî please verify your Google Drive structure.
