MNIST-Image-Recognition-Based-on-Xgboost-Algorithm-and-Features-Extraction

by Yingxin LIN

Different from the common practice of MNIST image recognition using CNN algorithm, I apply NumPy and OpenCV to extract relevant features from each MNIST figure, and then train a Xgboost recognition model. After gradually adjusting parameters, the accuracy of the optimal model on the test set can reach 88%.
In addition, since I've made extensive use of the broadcasting mechanism of NumPy instead of loops when coding, the code can run at an excellent speed.
I also define the handwritten numeral edge scanning function totally based on NumPy, which can scan the number of on pixels within image edge with excellent speed and precision in a short time. Some scanning results are shown below:

It's necessary to unzip files suffixed with '.gz' before running the code.
You can learn more details from the PDF file Data ming report & Userguide (in Simplified Chinese).pdf.

AUTHOR: Yingxin LIN
Company: School of Finance, Central University of Finance and Economics (CUFE)
Contact: lyxurthebest@163.com or lyxurthebest@outlook.com

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Data ming report & Userguide (in Simplified Chinese).pdf		Data ming report & Userguide (in Simplified Chinese).pdf
Main code.ipynb		Main code.ipynb
README.md		README.md
Scanning from right to left (The first 49 pictures in MNIST).png		Scanning from right to left (The first 49 pictures in MNIST).png
Scanning from top to bottom (The first 49 pictures in MNIST).png		Scanning from top to bottom (The first 49 pictures in MNIST).png
t10k-images-idx3-ubyte.gz		t10k-images-idx3-ubyte.gz
test-label.gz		test-label.gz
train-images-idx3-ubyte.gz		train-images-idx3-ubyte.gz
train-labels.gz		train-labels.gz

Provide feedback