Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH : auto detect RTL for text extraction #1309

Merged
merged 2 commits into from Aug 31, 2022

Conversation

pubpub-zz
Copy link
Collaborator

will fix #1296
includes some customization capabilities to extend RTL
replaces #1305

will fix py-pdf#1296
includes some customization capabilities to extend RTL
@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

Merging #1309 (8540e4c) into main (c696192) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1309      +/-   ##
==========================================
- Coverage   95.02%   95.02%   -0.01%     
==========================================
  Files          30       30              
  Lines        4988     5024      +36     
  Branches     1026     1037      +11     
==========================================
+ Hits         4740     4774      +34     
  Misses        141      141              
- Partials      107      109       +2     
Impacted Files Coverage Δ
PyPDF2/_page.py 94.36% <100.00%> (+<0.01%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@pubpub-zz
Copy link
Collaborator Author

@MartinThoma
Don't think we will get further : it's ready

@MartinThoma MartinThoma merged commit 7a95708 into py-pdf:main Aug 31, 2022
@MartinThoma
Copy link
Member

Thank you for all the great work you put into this 🙏

@pubpub-zz pubpub-zz deleted the arabicRTL2 branch August 31, 2022 21:47
MartinThoma added a commit that referenced this pull request Sep 4, 2022
Version 2.10.5, 2022-09-04
--------------------------

New Features (ENH):
-  Process XRefStm (#1297)
-  Auto-detect RTL for text extraction (#1309)

Bug Fixes (BUG):
-  Avoid scaling cropbox twice (#1314)

Robustness (ROB):
-  Fix offset correction in revised PDF (#1318)
-  Crop data of /U and /O in encryption dictionary to 48 bytes (#1317)
-  MultiLine bfrange in cmap (#1299)
-  Cope with 2 digit codes in bfchar (#1310)
-  Accept '/annn' charset as ASCII code (#1316)
-  Log errors during Float / NumberObject initialization (#1315)
-  Cope with corrupted entries in xref table (#1300)

Documentation (DOC):
-  Migration guide (PyPDF2 1.x \xe2\x9e\x94 2.x) (#1324)
-  Creating a coverage report (#1319)
-  Fix AnnotationBuilder.free_text example (#1311)
-  Fix usage of page.scale by replacing it with page.scale_by (#1313)

Developer Experience (DEV):
-  Only run coverage for PyPDF2

Maintenance (MAINT):
-  PdfReaderProtocol (#1303)
-  Throw PdfReadError if Trailer can't be read (#1298)
-  Remove catching OverflowException (#1302)

Full Changelog: 2.10.4...2.10.5
@MasterOdin MasterOdin mentioned this pull request Nov 10, 2022
pubpub-zz added a commit to pubpub-zz/pypdf that referenced this pull request Nov 12, 2022
includes also reintroduction of py-pdf#1303 wrongly cancelled in py-pdf#1309
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arabic text is extracted in the wrong order
2 participants