Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: docai_utilities.py to return Optional #176

Merged
merged 4 commits into from Oct 6, 2023
Merged

fix: docai_utilities.py to return Optional #176

merged 4 commits into from Oct 6, 2023

Conversation

holtskinner
Copy link
Member

@holtskinner holtskinner commented Oct 3, 2023

@holtskinner holtskinner requested review from a team as code owners October 3, 2023 16:29
@conventional-commit-lint-gcf
Copy link

conventional-commit-lint-gcf bot commented Oct 3, 2023

🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use automerge label. Good luck human!

-- conventional-commit-lint bot
https://conventionalcommits.org/

@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Oct 3, 2023
- Should resolve customer reported issue in support case #47169701 relating to duplicate/inaccurate elements in hOCR output
- Followup to:
  - #161
  - #169
@holtskinner holtskinner changed the title fix: Update docai_utilities.py to return an Optional fix: docai_utilities.py to return Optional Oct 3, 2023
@holtskinner holtskinner added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 3, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 3, 2023
google/cloud/documentai_toolbox/wrappers/entity.py Outdated Show resolved Hide resolved
tests/unit/test_document.py Outdated Show resolved Hide resolved
@dizcology
Copy link
Collaborator

If possible, please also give some explanations about the duplicate/inaccurate elements in hOCR output in the PR description.

@holtskinner
Copy link
Member Author

If possible, please also give some explanations about the duplicate/inaccurate elements in hOCR output in the PR description.

I will once I get that information

@holtskinner holtskinner added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Oct 3, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 3, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 3, 2023
@holtskinner
Copy link
Member Author

holtskinner commented Oct 6, 2023

Note: In the customers code, they use this whenever the document is blank. Not sure if this is a standard structure for blank hOCR documents, but could be good to look into.

Note - It doesn't follow the corrected structure after #169

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="unknown" lang="unknown">
<head>
<title>hocr</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="ocr-system" content="Document AI OCR" />
<meta name="ocr-langs" content="unknown" />
<meta name="ocr-number-of-pages" content="1" />
<meta name="ocr-capabilities" content="ocr_page ocr_carea ocr_par ocr_line ocrx_word" />
</head>
<body>
<div class='ocr_page' lang='unknown' title='bbox 0 0 0 0'>
<span class='ocr_carea' id='block_1_0' title='bbox 0 0 0 0'>
<span class='ocr_par' id='par_1_0_0' title='bbox 0 0 0 0'>
<span class='ocr_line' id='line_1_0_0_0' title='bbox 0 0 0 0'></span>
<span class='ocrx_word' id='word_1_0_0_0_0' title='bbox 0 0 0 0'></span>
</span>
</span>
</div>
</body>
</html>

@holtskinner holtskinner merged commit 028bc37 into main Oct 6, 2023
23 checks passed
@holtskinner holtskinner deleted the hocr-fixes branch October 6, 2023 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants