Skip to content

piyushpathak03/document-classification-using-DL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Document-Classification-using-Deep-Learning

To download the dataset : https://www.cs.cmu.edu/~aharley/rvl-cdip/

Problem:

  • Lack of intelligent document classification system for Customer onboarding.

Pain Points :

  1. High labour costs , infrastructure maintenance costs and error and rework costs.
  2. Poor Agility and inability to launch new products rapidly.
  3. Delayed response time.
  4. Lack of seamless experience.

Target User :

  • Primary user : Backend operations teams in Banks.

Dataset Acquisition and Description :

  • The RVL-CDIP (Ryerson Vision Lab Complex Document Information Processing) dataset consists of 400,000 grayscale images in 16 classes, with 25,000 images per class. There are 320,000 training images, 40,000 validation images, and 40,000 test images. The images are sized so their largest dimension does not exceed 1000 pixels.

  • To download the dataset : https://www.cs.cmu.edu/~aharley/rvl-cdip/

  • Paper : https://www.cs.cmu.edu/~aharley/icdar15/

The 16 classes are as follows :

  • letter, form , email, handwritten, advertisement, scientific report, scientific publication, specification, file folder, news article, budget, invoice, presentation, questionnaire, resume, memo

Observation :

  • We got some good results using just 10,000 records each train, test and cv. ACCURACY = 88.9%
  • This can be increased further using better modelling techniques like InceptionNet , ResNets and thus building deep neural network models which would contribute to better accuracy.

About me

Piyush Pathak

PORTFOLIO

GITHUB

BLOG

📫 Follw me:

Linkedin Badge

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages