Skip to content

Algorithmic comment processing to automate the identification and summarization of sections in PDF documents.

Notifications You must be signed in to change notification settings

my2582/KPMGCapstone

Repository files navigation

KPMG Capstone (Spring 2019)

  • Project name: Algorithmic Comment Processing
  • Goal: to automate the identification and summarization of sections in PDF documents
  • Effects:
    • Prior to automation: 30 people for 12 to 20 weeks
    • Post to automation: 2 people for 2 weeks
  • My responsibilities include:
    • Feature engineering: Line space(LS) and Ratio of title word to total ranked at top among others
    • Model implementation: XGBoost
  • A copy of the final report is here

Contributors:

  • Gayani Perera (in alphabetical order)
  • Liliana Cruz-Lopez
  • Minsu Yeom
  • Pranjal Bajaj

Industry mentors:

  • Junghoon Woo, Director Data Scientist, Data & Analytics (The Lighthouse), KPMG LLP, US
  • Viral Chawda, Principal, Innovation & Enterprise Solutions (I&ES), Lighthouse and Global lead, AI & Analytics for Government & Infrastructure, KPMG LLP, US

Data Science Institute mentor:

  • Sining Chen, Lecturer, Columbia University

6 Takeawys of the Final Report

summary1

summary2

summary3

summary4

summary5

summary6

About

Algorithmic comment processing to automate the identification and summarization of sections in PDF documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •