Acne Scar Classification: A Rigorous Replication and Methodological Study

Abstract

This project began as a replication study of the "ScarNet" paper, which reported 92.5% accuracy in classifying acne scars. During the study, significant methodological challenges were identified, including reproducibility issues and a high risk of data leakage, which likely inflated the original results.

In response, a superior solution was engineered using Transfer Learning with a fine-tuned ResNet18. The final strategy involved merging visually similar classes, applying advanced RandAugment data augmentation, and implementing a 2-stage progressive fine-tuning process.

To ensure a scientifically sound measure of performance, the methodology was subjected to K-Fold Cross-Validation. This rigorous validation yielded a final, honest performance metric of 64.4% ± 5.85% average accuracy, which represents the true expected performance of the model on this challenging, limited dataset. A single training run produced a particularly effective model that achieved 86% accuracy on its dedicated test set, and it is this model that is provided in the repository.

Final Model Strategy

Core Technique: ResNet18 with Transfer Learning.
Robust Performance Metric: 64.4% ± 5.85% (5-Fold Cross-Validation Average Accuracy).
Peak Performance (Provided Model): 86% Accuracy on its clean test set.
Data Strategy: A 4-class problem, strategically merging the 'Rolling' and 'Boxcar' classes due to their high visual similarity.
Key Methods:
- Rigorous evaluation using K-Fold cross-validation.
- A two-stage progressive fine-tuning process.
- Advanced data augmentation with RandAugment.
- Handling of class imbalance with WeightedRandomSampler.

Project Structure

/
|-- dataset/
|-- resnet_4class_randaugment_final_model.pth  <-- A trained model from a single split
|-- train_scarnet.py                           <-- Script to train a single model
|-- cross_validation.py                        <-- Script for rigorous K-Fold Cross-Validation
|-- test_model.py                              <-- Script to evaluate a single model
|-- predict_single_image.py                    <-- Script to predict with a single model
|-- requirements.txt                           <-- Project dependencies
`-- README.md

Installation

It is recommended to use a virtual environment to run this project.

Clone the repository:

git clone https://github.com/isaigm/scarnet
cd scarnet

Create and activate a virtual environment:

python -m venv venv
# On Windows
.\venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate

Install the dependencies:
```
pip install -r requirements.txt
```

Usage

The recommended workflow is to first verify the robust performance metric using cross-validation, and then train a single model for inference tasks.

1. Run K-Fold Cross-Validation (Recommended for Performance Evaluation)

This is the most important script for rigorously evaluating the model's performance. It runs the entire training and testing process 5 times on different data splits and reports the average accuracy and standard deviation, providing the final, unbiased performance metric.

python cross_validation.py

2. Train a Single Model

If you want to generate a single .pth model file for prediction tasks, use this script. It will run the training process on a single 80/20 split. Note that the final accuracy will vary depending on the random split, as demonstrated by the cross-validation results.

python train_scarnet.py

3. Evaluate a Single Trained Model

To analyze the performance of a saved model (like the one generated by train_scarnet.py) on all 250 images, run:

python test_model.py

This is useful for generating a global confusion matrix and visualizing specific predictions.

4. Predict with a Single Image

To classify a new image with a trained model, use:

python predict_single_image.py --image "path/to/your/image.jpg"

Example:

python predict_single_image.py --image "dataset/Ice Pick/ip(1).jpg"

Performance Evaluation & Key Results

This project utilizes two levels of evaluation: a rigorous K-Fold Cross-Validation to determine the overall methodology's robustness, and a single-split evaluation for the specific model provided in this repository.

1. Rigorous K-Fold Cross-Validation (The Scientific Truth)

To obtain a reliable and unbiased measure of the methodology's true performance, a 5-Fold Cross-Validation was implemented. This process eliminates the "lucky split" bias by training and evaluating 5 separate models on different subsets of the data.

Accuracy per Fold: [72.0%, 58.0%, 70.0%, 58.0%, 64.0%]
Average Accuracy: 64.4%
Standard Deviation: 5.85%

The final, scientifically rigorous result is 64.4% ± 5.85%. This is the most honest estimate of how this methodology is expected to perform on new, unseen data. The variance between folds confirms that the dataset is challenging and performance is sensitive to data distribution.

2. Single Model Performance (The Provided `.pth` File)

The resnet_4class_randaugment_final_model.pth file in this repository was generated from a single run of the train_scarnet.py script. This specific run represents a favorable data split where the model performed exceptionally well.

Accuracy on its Test Set (50 images): 86.0%
Accuracy on the Full Dataset (250 images, via test_model.py): 84.4%

This demonstrates the peak performance achieved by the methodology and provides a useful model for inference, while the cross-validation result above represents the more conservative and realistic performance expectation.

Key Findings & Conclusion

Irreproducibility of the "ScarNet" Paper: The original paper's results are likely a product of data leakage, a common pitfall that leads to inflated and non-generalizable results.
Superiority of Transfer Learning: A Transfer Learning approach on the standard RGB color space was scientifically demonstrated to be a fundamentally superior strategy over training a custom CNN from scratch on this limited dataset.
Importance of Rigorous Validation: This study underscores that a single train/test split can be misleading. K-Fold Cross-Validation provides a much more robust and realistic measure of a model's true capabilities.
Fundamental Limitation of the Problem Formulation: The core challenge is not just the model, but the problem itself. A single image can contain multiple scar types. Forcing a model to assign one label to a multi-label image creates ambiguity and limits performance. This explains why the validated accuracy plateaus around 64.4% and why recent research has shifted towards object detection models to identify each scar individually.

License

This project is distributed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Acne Scar Classification: A Rigorous Replication and Methodological Study

Abstract

Final Model Strategy

Project Structure

Installation

Usage

1. Run K-Fold Cross-Validation (Recommended for Performance Evaluation)

2. Train a Single Model

3. Evaluate a Single Trained Model

4. Predict with a Single Image

Performance Evaluation & Key Results

1. Rigorous K-Fold Cross-Validation (The Scientific Truth)

2. Single Model Performance (The Provided `.pth` File)

Key Findings & Conclusion

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dataset		dataset
README.md		README.md
cross_validation.py		cross_validation.py
predict_single_image.py		predict_single_image.py
requirements.txt		requirements.txt
resnet_4class_randaugment_final_model.pth		resnet_4class_randaugment_final_model.pth
test_model.py		test_model.py
train_scarnet.py		train_scarnet.py

isaigm/scarnet

Folders and files

Latest commit

History

Repository files navigation

Acne Scar Classification: A Rigorous Replication and Methodological Study

Abstract

Final Model Strategy

Project Structure

Installation

Usage

1. Run K-Fold Cross-Validation (Recommended for Performance Evaluation)

2. Train a Single Model

3. Evaluate a Single Trained Model

4. Predict with a Single Image

Performance Evaluation & Key Results

1. Rigorous K-Fold Cross-Validation (The Scientific Truth)

2. Single Model Performance (The Provided .pth File)

Key Findings & Conclusion

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2. Single Model Performance (The Provided `.pth` File)

Packages