-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Restructuring of classes to reduce duplication and increase readibility #110
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Fixed table sample
@dizcology i'd like to get a second opinion on these change, please take a look when you have time. |
dizcology
requested changes
May 1, 2023
- Add incomplete shard check - refactor some of the vision helpers to simplify imports - Fix documentation formatting issue
- Avoids some backwards incompatibility issues - Fixed bug where Entity page numbers would not line up with multi-shard documents
dizcology
requested changes
Jun 16, 2023
- Added a more specific type for dictionary. `Dict[str, Union[str, List[str]]]` - Rewrote Docstring for `Entity.page_offset` - Wrote unit test for `page_offset`
dizcology
approved these changes
Jun 23, 2023
holtskinner
added a commit
that referenced
this pull request
Jun 30, 2023
…age.py - Added in #110 Lost in Merge
holtskinner
added a commit
that referenced
this pull request
Jul 7, 2023
* refactor: Reorganize hocr functions - Use more jinja templating instead of hardcoding strings - Simplified bounding box function - Changed parameter name for `_get_hocr_bounding_box` to `page_dimension` for more clarity. * samples: Added sample for convert to hocr * refactor: Reordering of classes in page.py * refactor: Re-added refactoring to remove extra `get_*()` methods in page.py - Added in #110 Lost in Merge * fix: Moved `templates` directory into package. - Required for template to work in installed library
holtskinner
added a commit
that referenced
this pull request
Jul 20, 2023
…ciency and follow standard practices. (#139) * refactor: Reorganize hocr functions - Use more jinja templating instead of hardcoding strings - Simplified bounding box function - Changed parameter name for `_get_hocr_bounding_box` to `page_dimension` for more clarity. * samples: Added sample for convert to hocr * refactor: Reordering of classes in page.py * refactor: Re-added refactoring to remove extra `get_*()` methods in page.py - Added in #110 Lost in Merge * fix: Moved `templates` directory into package. - Required for template to work in installed library * chore: Ran isort and black * chore: Ran no-implicit-optional * refactor: Refactored document.py - improve readability, follow python conventions, and improve efficiency - Also, fixed a previously unknown bug where `Document.search_pages()` returned inaccurate results because it only searched paragraph.text, not page.text * refactor: Refactor gcs_utilities for readability/pythonic style * refactor: Refactor page.py to improve efficiency, readability and follow python conventions * refactor: Rename `Entity.documentai_entity` to `Entity.documentai_object` to match the page.py file * refactor: Move bounding box extraction to `docai_utilities.py` * refactor: Major Refactoring of converter_helpers.py to simplify/organize functions, reduce complexity, and increase readability * fix: Fixed refactor of export_images in document.py * refactor: Cleanup of blocks.py using `getattr()` * refactor: Refactoring of bbox_conversion.py to improve readability and efficiency * fix: Change _get_files() to send full gcs uri to _get_bytes() - Also reduce wait_time in tests * refactor: Move `converter_helpers.py` functions into `converter.py` - `converter.py` only had one external facing function that called an internal function with the same parameters. - Not sure if there was a specific reason for this setup, can be undone if needed. * chore(deps): update dependency google-cloud-documentai to v2.16.1 (#138) * fix: Change _get_files() to send full gcs uri to _get_bytes() - Also reduce wait_time in tests * refactor: Move `converter_helpers.py` functions into `converter.py` - `converter.py` only had one external facing function that called an internal function with the same parameters. - Not sure if there was a specific reason for this setup, can be undone if needed. * chore: Reran black formatting after merge conflict * refactor: Minor refactoring of test_bbox_conversion.py to improve readability * refactor: Changed blocks.py to block.py for consistency. - Changed how `Block` is initialized. - Changed `load_blocks_from_schema` into a `@classmethod` to simplify imports. * fix: Added Missing type annotations to `document.py` * fix: Add new filename for block.py into test_bbox_conversion.py * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * fix: Fix failing tests for Block class. Changed all fields to have types * fix: Changed `converter._get_bytes` to return a Tuple * chore: Addressed Code Review Comments - Removed FILES_TO_IGNORE - Simplification of logic in `_get_multiplier` `convert_bbox_to_docproto_bbox` - Addressed other lint errors - Adjusted function names to indicate not protected members. * fix: Remove extra reference to metadata_blob * fix: Change expected test output and remove references to `geometry` --------- Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
vision_helpers.py
to inputDocument.Page
to reduce extra imports.Entity
page numbers would be incorrect in multi-shard documents.Page
sub classes by using__post_init__()
Page.text
andTable.text
todocument_text
to be more descriptive and match the otherPage
elements.table_sample.py
to use more accurate/descriptive variable names.