Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Cope with UC2 fonts in text_extraction #1785

Merged
merged 1 commit into from
Apr 15, 2023

Conversation

pubpub-zz
Copy link
Collaborator

@pubpub-zz pubpub-zz commented Apr 11, 2023

fixes #1782
UCS2 are to be read as utf-16be

fixes py-pdf#1762
UCS2 are to be read as utf-16be
@codecov
Copy link

codecov bot commented Apr 11, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (6836174) 92.99% compared to head (f360bfc) 92.99%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1785   +/-   ##
=======================================
  Coverage   92.99%   92.99%           
=======================================
  Files          34       34           
  Lines        6609     6611    +2     
  Branches     1302     1303    +1     
=======================================
+ Hits         6146     6148    +2     
  Misses        302      302           
  Partials      161      161           
Impacted Files Coverage Δ
pypdf/_cmap.py 95.29% <100.00%> (+0.04%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@MartinThoma MartinThoma changed the title ENH : cope with UC2 fonts in text_extraction ENH: Cope with UC2 fonts in text_extraction Apr 14, 2023
@MartinThoma
Copy link
Member

@pubpub-zz Looks good to me :-) Should I merge?

@pubpub-zz
Copy link
Collaborator Author

@BriskyGates reported a successfull text: yes you can🙂

@MartinThoma MartinThoma merged commit 20fbe3f into py-pdf:main Apr 15, 2023
MartinThoma added a commit that referenced this pull request Apr 16, 2023
New Features (ENH)
-  Add transform method to Transformation class (#1765)
-  Cope with UC2 fonts in text_extraction (#1785)

Robustness (ROB)
-  Invalid startxref pointing 1 char before (#1784)

Maintenance (MAINT)
-  Mark code handling old parameters as deprecated (#1798)

[Full Changelog](3.7.1...3.8.0)
@pubpub-zz pubpub-zz deleted the iss1782 branch June 24, 2023 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: UniGB-UCS2-H encoding leads to garbled text extraction
2 participants