Skip to content

Commit

Permalink
feat: added hints.language_hints field in OcrConfig
Browse files Browse the repository at this point in the history
feat: added enable_image_quality_scores field in OcrConfig
feat: added enable_symbol field in OcrConfig

PiperOrigin-RevId: 515136707
  • Loading branch information
Google APIs authored and Copybara-Service committed Mar 8, 2023
1 parent 7d81019 commit 236be30
Showing 1 changed file with 22 additions and 0 deletions.
22 changes: 22 additions & 0 deletions google/cloud/documentai/v1beta3/document_io.proto
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,32 @@ message DocumentOutputConfig {

// Config for Document OCR.
message OcrConfig {
// Hints for OCR Engine
message Hints {
// List of BCP-47 language codes to use for OCR. In most cases, not
// specifying it yields the best results since it enables automatic language
// detection. For languages based on the Latin alphabet, setting hints is
// not needed. In rare cases, when the language of the text in the
// image is known, setting a hint will help get better results (although it
// will be a significant hindrance if the hint is wrong).
repeated string language_hints = 1;
}

// Hints for the OCR model.
Hints hints = 2;

// Enables special handling for PDFs with existing text information. Results
// in better text extraction quality in such PDF inputs.
bool enable_native_pdf_parsing = 3;

// Enables intelligent document quality scores after OCR. Can help with
// diagnosing why OCR responses are of poor quality for a given input.
// Adds additional latency comparable to regular OCR to the process call.
bool enable_image_quality_scores = 4;

// A list of advanced OCR options to further fine-tune OCR behavior.
repeated string advanced_ocr_options = 5;

// Includes symbol level OCR information if set to true.
bool enable_symbol = 6;
}

0 comments on commit 236be30

Please sign in to comment.