Detecting, Recognizing and Solving Sudokus using OpenCV, PIL and TensorFlow.
>pip install tensorflow>=2.2.0
>pip install opencv
>pip install scikit-learn
>pip install matplotlib
>pip install PIL
For image version:
>python sudoku_solver.py -i $path_to_img$
For webcam:
>python sudoku_solver_video.py
For exisiting video:
>python sudoku_solver_video.py -v $path_to_video$
- For the image version, displays and saves the input image with the sudoku solved.
- A digital image is also generated of the solved grid, with the empty cells in green.
- For the video and webcam version, it displays the video with the empty cells filled in.
- In both versions, the unsolved and solved grids are printed to the terminal.
- Preprocessing done using OpenCV, using the Preprocessor class, having two methods,
extract_grid
andextract_digit
. - The first one extracts the grid from the input image, and the second one extracts the digit from a given cell by thresholding, removing any cell lines, straightening and centralizing it.
- The grid is extracted and then the perspective is transformed, to give a straight view of the grid. Later, after solving, this grid is placed back into its original position using
findHomography
andwarpPerspective
functions of OpenCV. - If the
extract_grid
doesn't find a sudoku grid, it returns None, and if theextract_digit
finds the cell to be empty, it also returns None. - Before being passed into the model, digits are straightened and centralized to maintain a similar structure to that of the training data.
- The
DigitGenerator
class has been implemented to generate digits artificially using a multitude of different fonts. - The dataset was generated by combining the MNIST dataset and the generated dataset, to help improve recognition of different (written and printed) types of digits.
- Two different architectures were trained.
DigitNet
for images, andLeNet
for videos as DigitNet has a lot more parameters and thus is not suitable for real time video processing. - The prediction for images is done using an ensemble of 5 CNN's
DigitNet
, for more accurate predictions. - Custom Image Augmentation is also applied for the image version during prediction time using the
DigitAugmenter
class, so the models see multiple transformed versions of the same image for a single prediction, and the highest average prediction is chosen to improve generalization and robustness. - Each of the five CNN's of
DigitNet
gave a validation accuracy over 99.7% on the dataset, whileLeNet
gave 99.62%. - The
SolutionGenerator
class is implemented to generate a digital version of the solved sudoku grid using PIL, with the initially empty cells filled with green. - In the video version, a file
temp.txt
is created to make sure that the same grid and its solution doesn't get printed over and over again. After the video is ended, the file is deleted.
- Dr. Adrian Rosebrock - His book on OpenCV and his blog have helped tremendously.
imutils.py
is a modified version of Dr. Rosebrock's packageimutils
.- Architecture of the CNN used for the video was inspired by LeNet.
- Architecture of the CNN's DigitNet was inspired by this post.
- The algorithm for solving the Sudoku is by Peter Norvig and can be found here.
All contributions are welcome and appreciated. 👍
BYE!
Rishabh Gupta ©️