This model helps to group or clean all the duplicated or similar images, resulting either groups of them, or a directory with only unique images.
Inspired by siamese model, this model has the features below compared to vanilla siamese model:
- single model instead of twin
- efficientnet-b0
- mish activation
- semi-hard triplet loss
- tqdm
- numpy
- opencv
- onnxruntime-gpu (opset 11 or above)
Not necessary at this moment.
Run main.py. Arguments are:
command (required):
images, video or videos.
images: your input is images
video: your input is a single video
videos: your input is multiple videos
e.g. python3 main.py videos -ins ...
--ins:
default = ./demo/inputs/
input path to your images/video, e.g. /path/to/dir/input/
--outs:
default = ./demo/outputs/
directory to copy/move similar images, e.g. /path/to/dir/output/
--size:
default = 224
image size input for the model
--model:
default = ./model/model.onnx
path to the siamese model
--thres:
default = 0.75
range 0-1. lower the number, less strict the model, more similar images
--mode:
IMAGE COMMAND ONLY!
default = group_copy
group_copy, group_move, clean_move or clean_delete.
group: your aim is to group all the similar images together, including the anchor image
clean: your aim is to clean the dataset (the input path), resulting a dataset with unique images (keep largest file size's image)
copy: make a copy for similar images, will not make changes the the input path dataset
move: move similar images from inputs path to output path
delete: get image with largest size from each group. use with thresold >= 0.9 at least, others images will be deleted
--crop:
VIDEO COMMAND ONLY!
default = 0.85
range 0-1. central crop. area of the remaining center area. process before resize
--skip:
VIDEO COMMAND ONLY!
default = 10
take and process the current frame of the video every n frames being skipped
To help you to set a suitable thresold for your task, here is a little guideline:
setting thresold (--thres):
0.95 (recommended):
You define similarity very strictly, only a tiny geometic/quality-wise difference would be considered as similar
case: false alert filtering, cleaning valuable dataset
Pros: images in every groups are indeed similar, fewer similar images scattered to other groups
Cons: undergroup sometimes because the score is little bit lower than 0.95
0.85 (recommended):
A good balanced thresold, default thresold
Case: cleaning "not so valuable" dataset such as scraped images and site videos
Pros: balanced thresold, good point to start tuning your thresold for your task
Cons: Tuning may be needed as it may not satisfy your project needs
0.75:
A loose thresold, distinct objects with large white backgrounds are sometimes considered as similar
Case: clean highly duplicated dataset and you want a few representative images
Pros: Fewer groups to deal with as a result, save disk space the most after cleaning.
Cons: images in every groups are not even similar sometimes, more similar images scattered to other groups
setting model input size (--size):
None:
auto mode, input size will be dynamic according to the original image size
224:
for any images that are larger than 112x112, such as 720p
112:
for any images that are smaller than 112x112, such as 30x60
160:
generally either 224 or 112 is good for your task. you can try 160 you want to experiment a balance between 224 and 112.
command line example:
clean valuable dataset:
python3 main.py images -ins /path/to/input -thres 0.95 -mode clean_delete
group similar photos but keeping the original copy:
python3 main.py images -ins /path/to/input -outs /path/to/output -thres 0.85 -mode group_copy
get interesting frames from a cctv video:
python3 main.py video -ins /path/to/input -outs /path/to/output -thres 0.85 -crop 0.85 -skip 10
- video mode for single and multiple videos as inputs
- add central cropping and frame skipping to video mode
- dynamic model input size for user to control
- add n to n comparsion method instead of k times 1 to n method
loader.py, trainer.py and create_data.py are messy and unedited. Please edit before use, if you want to train your own model (replace efficientnet with any other models you want to use).