Solves mathematical captchas generated by the securimage PHP library using a CNN. Currently, it works on two colored captchas, so that the mathematical question and the noise can be easily separated from each other.
Generate sample captchas using the PHP library and place them into the preprocessing/CLASS
folder. The structure should look like this
- preprocessing
- train
- 1+1
- 20.png
- 37.png
- 26.png
- 24.png
- 2+2
- 90.png
- 1.png
- 55.png
- 24.png
- 1+1
- test
- 1+1
- 2.png
- 22.png
- 77.png
- 88.png
- 2+2
- 99.png
- 13.png
- 525.png
- 254.png
- 1+1
- train
Using 100000 images (80%-20% training, validation split) the network was able to reach 99.87% accuracy. It works by first removing the background noise using traditional thresholding so that we are left with only the mathematical question. Then the image is cropped and resized to 32x32 and fed into the network.