This project began as a replication study of the "ScarNet" paper, which reported 92.5% accuracy in classifying acne scars. During the study, significant methodological challenges were identified, including reproducibility issues and a high risk of data leakage, which likely inflated the original results.
In response, a superior solution was engineered using Transfer Learning with a fine-tuned ResNet18. The final strategy involved merging visually similar classes, applying advanced RandAugment data augmentation, and implementing a 2-stage progressive fine-tuning process.
To ensure a scientifically sound measure of performance, the methodology was subjected to K-Fold Cross-Validation. This rigorous validation yielded a final, honest performance metric of 64.4% ± 5.85% average accuracy, which represents the true expected performance of the model on this challenging, limited dataset. A single training run produced a particularly effective model that achieved 86% accuracy on its dedicated test set, and it is this model that is provided in the repository.
- Core Technique: ResNet18 with Transfer Learning.
- Robust Performance Metric: 64.4% ± 5.85% (5-Fold Cross-Validation Average Accuracy).
- Peak Performance (Provided Model): 86% Accuracy on its clean test set.
- Data Strategy: A 4-class problem, strategically merging the 'Rolling' and 'Boxcar' classes due to their high visual similarity.
- Key Methods:
- Rigorous evaluation using K-Fold cross-validation.
- A two-stage progressive fine-tuning process.
- Advanced data augmentation with
RandAugment. - Handling of class imbalance with
WeightedRandomSampler.
/
|-- dataset/
|-- resnet_4class_randaugment_final_model.pth <-- A trained model from a single split
|-- train_scarnet.py <-- Script to train a single model
|-- cross_validation.py <-- Script for rigorous K-Fold Cross-Validation
|-- test_model.py <-- Script to evaluate a single model
|-- predict_single_image.py <-- Script to predict with a single model
|-- requirements.txt <-- Project dependencies
`-- README.md
It is recommended to use a virtual environment to run this project.
-
Clone the repository:
git clone https://github.com/isaigm/scarnet cd scarnet -
Create and activate a virtual environment:
python -m venv venv # On Windows .\venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install the dependencies:
pip install -r requirements.txt
The recommended workflow is to first verify the robust performance metric using cross-validation, and then train a single model for inference tasks.
This is the most important script for rigorously evaluating the model's performance. It runs the entire training and testing process 5 times on different data splits and reports the average accuracy and standard deviation, providing the final, unbiased performance metric.
python cross_validation.pyIf you want to generate a single .pth model file for prediction tasks, use this script. It will run the training process on a single 80/20 split. Note that the final accuracy will vary depending on the random split, as demonstrated by the cross-validation results.
python train_scarnet.pyTo analyze the performance of a saved model (like the one generated by train_scarnet.py) on all 250 images, run:
python test_model.pyThis is useful for generating a global confusion matrix and visualizing specific predictions.
To classify a new image with a trained model, use:
python predict_single_image.py --image "path/to/your/image.jpg"Example:
python predict_single_image.py --image "dataset/Ice Pick/ip(1).jpg"This project utilizes two levels of evaluation: a rigorous K-Fold Cross-Validation to determine the overall methodology's robustness, and a single-split evaluation for the specific model provided in this repository.
To obtain a reliable and unbiased measure of the methodology's true performance, a 5-Fold Cross-Validation was implemented. This process eliminates the "lucky split" bias by training and evaluating 5 separate models on different subsets of the data.
- Accuracy per Fold:
[72.0%, 58.0%, 70.0%, 58.0%, 64.0%] - Average Accuracy: 64.4%
- Standard Deviation: 5.85%
The final, scientifically rigorous result is 64.4% ± 5.85%. This is the most honest estimate of how this methodology is expected to perform on new, unseen data. The variance between folds confirms that the dataset is challenging and performance is sensitive to data distribution.
The resnet_4class_randaugment_final_model.pth file in this repository was generated from a single run of the train_scarnet.py script. This specific run represents a favorable data split where the model performed exceptionally well.
- Accuracy on its Test Set (50 images): 86.0%
- Accuracy on the Full Dataset (250 images, via
test_model.py): 84.4%
This demonstrates the peak performance achieved by the methodology and provides a useful model for inference, while the cross-validation result above represents the more conservative and realistic performance expectation.
- Irreproducibility of the "ScarNet" Paper: The original paper's results are likely a product of data leakage, a common pitfall that leads to inflated and non-generalizable results.
- Superiority of Transfer Learning: A Transfer Learning approach on the standard RGB color space was scientifically demonstrated to be a fundamentally superior strategy over training a custom CNN from scratch on this limited dataset.
- Importance of Rigorous Validation: This study underscores that a single train/test split can be misleading. K-Fold Cross-Validation provides a much more robust and realistic measure of a model's true capabilities.
- Fundamental Limitation of the Problem Formulation: The core challenge is not just the model, but the problem itself. A single image can contain multiple scar types. Forcing a model to assign one label to a multi-label image creates ambiguity and limits performance. This explains why the validated accuracy plateaus around 64.4% and why recent research has shifted towards object detection models to identify each scar individually.
This project is distributed under the MIT License.