Skip to content

Extracting Data from Document PDF and Converting to EDI211 Files Using GCP and Google Document AI

Notifications You must be signed in to change notification settings

masoudshab/Doc2Edi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Doc2Edi

alt text

First Engine:

Extracting Data from Document PDF Files Using GCP and Google Document AI

Steps for Engine-1

1- create new project on GCloud: https://codelabs.developers.google.com/codelabs/docai-ocr-python#1 2- create service account for this project: https://console.cloud.google.com/iam-admin/iam?walkthrough_id=iam--create-service-account&project=oko2-386015

NOTE: these roles should be selected for this service account: Document AI Administrator Document AI API User

3- create a processor for my project: https://cloud.google.com/document-ai/docs/create-processor?_ga=2.114028359.-1141760794.1683471749 NOTE: for this project I used Form Parser from G Doc AI

4- cloned this repo: https://github.com/anirbankonar123/documentai 5- added my auth key location into my windows env variables as: GOOGLE_APPLICATION_CREDENTIALS=’path to json file’ 6- copied my project and processor info into my local code 7- for each PDf file, the code ran and created one json file and multiple CSV files.

python doc_ai_table.py --pdf <pdf_path> --folder <output_path>

Second Engine:

Converting Data into EDI 211 Transmission Files

Steps for Engine-2

1- Edi Fields Finder 2- Find Best Edi Field & Process 3- DQ Checks & Valid Values 4- Put in EDI 211 Format

NOTE: required fields for EDI 211: • Shipment ID number • Date and time (of pick-up / delivery) • Status report request (upon delivery) • Business instructions • Handling requirements • Bill of lading rates and charges • Lading quantity / weight / freight Class / value • Contact information

Resources:

  1. Document AI tutorial: https://cloud.google.com/document-ai/
  2. How to setup my Doc AI project: https://codelabs.developers.google.com/codelabs/docai-ocr-python#0
  3. repo used for the first engine is from: https://github.com/anirbankonar123/documentai

About

Extracting Data from Document PDF and Converting to EDI211 Files Using GCP and Google Document AI

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages