# Porodet workflow for running on local machine
This notebook guides you through the complete workflow of the PoroDet package on your local machine. It uses pop-up windows (GUI) to make selecting files and folders easy. Run this script **prefrebly run on VS Code (Visual studio code)**. Download and install the VS Code here: https://code.visualstudio.com/download

**Step 1: Install the porodet repository**  
Install the porodet repository in the terminal if you are using   
**Windows:**  
Run the following coomand in the terminal (a VS code provide the terminal)  
C:\users\home> pip install git+https://github.com/Deep7285/Porodet.git  
 
**Linux:**  
1. Create the conda enviroment. read how to create the conda environment here: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
2. Run the following coomand in the terminal in created environment (a VS code provide the terminal).  
(base) user1@my_pc:~$ conda activate my_new_env  
(my_new_env) user1@my_pc:~& pip install git+https://github.com/Deep7285/Porodet.git


**Step 2: Import & Setup**  
Run the following code cell to load the package and set and dependencyies.  

**Note: User can directly jump to detection step if you alrady trained the model. after running this cell code.**

In [1]:
import os
import porodet
import tkinter as tk
from tkinter import filedialog, messagebox

# Helper to hide the root tkinter window
def get_gui_root():
    root = tk.Tk()
    root.withdraw() 
    root.attributes('-topmost', True) # Bring the dialog to the front
    return root

print(f"PoroDet Package Loaded (Version: {porodet.__version__})")
print("Ready for the next command...")

  from .autonotebook import tqdm as notebook_tqdm


PoroDet Package Loaded (Version: 0.1.0)
Ready for the next command...


**Step 3: Augmentation files checkpoint**  
The following Cell code helps users to check whether the augmented data already exist or not. It will help users to avoid generating mulpltile augmented datasets by selecting the  augmented folder where code will check if folder has augmented data or not.  
1. If augmented data is availble then it asks you to locate it and then can proceed for model training.  
2. If not then it tells you to run the next Augmentating the data step 4.  

In [None]:
# Initialize the variable to track if augmentation is needed
augment_needed = True
augmented_data_dir = None

# Ask user if they already have augmented data
root = get_gui_root()
has_data = messagebox.askyesno("Data Check", "Do you already have an Augmented data in folder?")
root.destroy()

if has_data:
    print("Please select your existing Augmented Data folder")
    root = get_gui_root()
    augmented_data_dir = filedialog.askdirectory(title="Select Augmented Data Folder")
    root.destroy()
    
    if augmented_data_dir:
        print(f"Data Exists in: {augmented_data_dir}")
        print("Skipping Augmentation Step. Users can proceed to the model training step.")
        augment_needed = False
    else:
        print("No folder selected. You might need to run Augmentation.")
else:
    print("No augmented data found.")
    print("Please run the Step 4: Data Augmentation cell.")

Please select your existing Augmented Data folder
Data Exists in: C:/Users/deepa/Downloads/augmented_images_20260214_144918
Skipping Augmentation Step. Users can proceed to the model training step.


**Step 4: Data Augmentation (Run if needed)**  
This cell ask you to selec the raw file folder where user have saved the raw TEM images and it binary mask.  
**It is recommonded that user shoud name the raw file name properly, it will help in generating the augmented dataset. A sample datasets can be accessed in "PoroDet/Sample_Dataset/" in package repo.**  
1. Select your Raw Images folder (containing .tif images and _mask.png masks).  
2. It will create a new folder with augmented versions of raw datasets.  

In [5]:
if augment_needed:
    print("Initializing Data Augmentation")
    print("Please select your RAW images files folder.")
    
    # Import the main GUI function directly from the module
    from porodet.augmentation import main as run_augmentation_gui
    
    # Run the tool
    run_augmentation_gui()
    
    print("\n Augmentation Complete.")
    print("Check the output folder created. It usually ends in 'augmented_images_...').")
    print("Please run Step 3 again to select this new folder as your dataset to confirm it worked")
else:
    print("Skipping Data Augmentation (Data already provided).")

Initializing Data Augmentation
Please select your RAW images files folder.
TEM Image Augmentation
Waiting for folder selection dialog to appear
If no dialog appears, check your taskbar or behind other windows.
Dialog will timeout in 15 seconds if not used.
You Selected directory: C:/Users/deepa/Downloads/100_Res
10 augmentations per image will be created
Found 26 raw images
Generating 10 augmentations for each image


Processing Images: 100%|██████████| 26/26 [01:15<00:00,  2.89s/it]

Augmentation complete. Files saved to: C:/Users/deepa/Downloads\augmented_images_20260214_144918

 Augmentation Complete.
Check the output folder created. It usually ends in 'augmented_images_...').
Please run Step 3 again to select this new folder as your dataset to confirm it worked





**Step 5: Model Training**  
1. The following code cell trains the Unet model from scratch.  
2. The code cell allow users to adjust the hyperparameter for model training and fine-tuning. Check your system configuration to select the hyperparameter to avoid the kernel crash.  
2. After adjusting the hyperparameters run the cell and this will Pop-up a window asking to select the folder of training dataset (select the augmented_image_ folder).  

In [None]:
import os
import csv
import torch
import tkinter as tk
from tkinter import filedialog
from datetime import datetime
from sklearn.model_selection import KFold

# Import the internal tools directly
from porodet.training import train_nanopore_detector, get_original_and_augmented_groups

# Adjust the hyperparameters here as needed
BATCH_SIZE = 1        # Increased batch size (keep lower if you get memory errors or kernel crash)
EPOCHS = 10           # Increase epochs for better training 
LEARNING_RATE = 1e-5  # Adjust learning rate 
FOLDS = 2            # K-Fold Cross Validation (standard is 3)
PATIENCE = 5          # Early stopping patience if loss does not improve
RESIZE_TO = (512, 512) # Lower resolution to save memory (Default is 1024, 1024). 
                       # if system allows, you can increase this back to 1024 for better results but it will require more GPU memory.

# Select the data directory (with augmented data)
print("Initiallizing Training...")
root = tk.Tk()
root.withdraw()
root.attributes('-topmost', True)
print("Select the training set folder...")
data_dir = filedialog.askdirectory(title="Select Training Directory")
root.destroy()

if not data_dir:
    print("No directory selected. Training cancelled.")
else:
    # Ouput directory with timestamp to avoid overwriting
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    base_output_dir = os.path.join(data_dir, f'nanopore_model_{timestamp}_custom')
    os.makedirs(base_output_dir, exist_ok=True)

    print(f"\n Starting training with custom parameters:")
    print(f"   - Batch Size: {BATCH_SIZE}")
    print(f"   - Epochs: {EPOCHS}")
    print(f"   - Folds: {FOLDS}")
    print(f"   - Image Size: {RESIZE_TO}")
    print(f"   - Output: {base_output_dir}")

    # --- 4. PREPARE DATA GROUPS ---
    original_groups = get_original_and_augmented_groups(data_dir)
    original_images = list(original_groups.keys())
    
    if len(original_images) < FOLDS:
        print(f"Error: Not enough original images ({len(original_images)}) for {FOLDS} folds.")
    else:
        # Run the K-Fold Cross Validation
        kf = KFold(n_splits=FOLDS, shuffle=True, random_state=42) 
        fold_histories = []

        for fold_id, (train_idx, val_idx) in enumerate(kf.split(original_images), start=1):
            train_originals = [original_images[i] for i in train_idx]
            val_originals = [original_images[i] for i in val_idx]

            fold_output_dir = os.path.join(base_output_dir, f'fold_{fold_id}')
            os.makedirs(fold_output_dir, exist_ok=True)

            print(f"\nStarting Fold {fold_id}/{FOLDS}")
            
            # Call the internal function with defined  hyperparameters
            model, history = train_nanopore_detector(data_dir=data_dir, output_dir=fold_output_dir, train_originals=train_originals, val_originals=val_originals,batch_size=BATCH_SIZE,      
                             epochs=EPOCHS, learning_rate=LEARNING_RATE, patience=PATIENCE,         
                             weight_decay=1e-4,          # Keep default or change
                             prob_threshold=0.5,         # Adjust if you want to be more or less strict in detection 
                             resize_to=RESIZE_TO, fold_id=fold_id )
            
            best_loss = min(history["val_loss"]) if len(history["val_loss"]) > 0 else None
            fold_histories.append({ "fold": fold_id, "best_val_loss": best_loss})

            # Clean up GPU memory
            del model
            if torch.cuda.is_available():
                torch.cuda.empty_cache()

        # Saving the trained models and training history for each fold
        summary_csv = os.path.join(base_output_dir, "Model_training_summary.csv")
        with open(summary_csv, "w", newline="") as f:
            writer = csv.writer(f)
            writer.writerow(["fold", "best_val_loss"])
            for fh in fold_histories:
                writer.writerow([fh["fold"], fh["best_val_loss"]])
        
        print(f"\n Model Training Complete. Training Summary saved to: {summary_csv}")

Initiallizing Training...
Select the training set folder...

 Starting training with custom parameters:
   - Batch Size: 1
   - Epochs: 10
   - Folds: 2
   - Image Size: (512, 512)
   - Output: C:/Users/deepa/Downloads/augmented_images_20260214_144918\nanopore_model_20260214_151437_custom

Starting Fold 1/2

Fold 1: Using device cpu
Fold 1: 13 originals for training, 13 for validation
Fold 1: Validation originals: ['Image_1', 'Image_10', 'Image_11', 'Image_14', 'Image_17', 'Image_18', 'Image_2', 'Image_20', 'Image_23', 'Image_25', 'Image_26', 'Image_7', 'Image_8']
Training set contains 143 images
Validation set contains 143 images

Starting training with verbose metrics to monitor progress
Early stopping patience: 5
Learning rate: 1e-05
Weight decay: 0.0001


Fold 1 | Epoch 1/10 [train]: 100%|██████████| 143/143 [11:07<00:00,  4.67s/it, loss=0.6205]


Fold 1 & Epoch 1: Train Loss 0.6981, Train Acc 0.5598 | Val Loss 0.6564, Val Acc 0.8864 | Val Prec 0.1342, Rec 0.6229, F1 0.2208, IoU 0.1241PR-AUC 0.14935351264237393, ROC-AUC 0.8257952284324483
Fold 1: Saved new best model (val loss 0.6564)


Fold 1 | Epoch 2/10 [train]: 100%|██████████| 143/143 [13:08<00:00,  5.52s/it, loss=0.5256] 


Fold 1 & Epoch 2: Train Loss 0.5556, Train Acc 0.9308 | Val Loss 0.4960, Val Acc 0.9592 | Val Prec 0.3002, Rec 0.4353, F1 0.3553, IoU 0.2161PR-AUC 0.25911860922376156, ROC-AUC 0.8769097544762908
Fold 1: Saved new best model (val loss 0.4960)


Fold 1 | Epoch 3/10 [train]: 100%|██████████| 143/143 [09:03<00:00,  3.80s/it, loss=0.4533]


Fold 1 & Epoch 3: Train Loss 0.4788, Train Acc 0.9712 | Val Loss 0.4213, Val Acc 0.9720 | Val Prec 0.4255, Rec 0.2411, F1 0.3078, IoU 0.1819PR-AUC 0.25083973091231615, ROC-AUC 0.8638905298627639
Fold 1: Saved new best model (val loss 0.4213)


Fold 1 | Epoch 4/10 [train]: 100%|██████████| 143/143 [09:20<00:00,  3.92s/it, loss=0.4506]


**Step 6: Nanoporosity Detection (Inference on New Images)**  
This step uses the trained model to detect the nanoporosities in new images, generate the binary mask, probability heatmap.   
**A user don't need to train the model every time, a trained model can directly use for detection**  
1. Run the step 2 and directly jump to detection step if you have alrady trained model.  
2. Run the cell and a window will pop-up asking to select trained Model (.pth), Look in the training output folder.  
3. After selecting the trained model a window will pop-up asking to select the new TEM image/s for detection.  
3. Output will be saved in a new folder.

In [None]:
print("Initializing Detection...")
print("Select the trained model file (.pth).")
print("Select the new TEM image/s to for detection.")

# Import the detector GUI entry point
from porodet.detector import main as run_detector_gui

# Run it
run_detector_gui()

print("\nDetection Complete.")
print("Check the folder containing your image/s for the results.")

Initializing Detection...
Select the trained model file (.pth).
Select the new TEM image/s to for detection.


  from .autonotebook import tqdm as notebook_tqdm


**Step 7: Detailed Analysis**
Calculate porosity, histograms, and classify pores vs. cracks.
1. A pop-up window will ask to select the Mask image (_mask.png) generated in detection step.
2. The output will be saved in CSVs and histograms.

In [None]:
print("Initializing the Nanoporosity Analysis...")
print("Select the MASK file (_mask.png) generated in the detection step.")
print("Select an output folder to save the report.")

# Import the analyser GUI entry point
from porodet.analyser import main as run_analysis_gui

# Run it
run_analysis_gui()

print("\nAnalysis Complete. Check the folder")