ENH : auto detect RTL for text extraction #1309

pubpub-zz · 2022-08-31T14:19:58Z

will fix #1296
includes some customization capabilities to extend RTL
replaces #1305

will fix py-pdf#1296 includes some customization capabilities to extend RTL

codecov · 2022-08-31T16:01:56Z

Codecov Report

Merging #1309 (8540e4c) into main (c696192) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1309      +/-   ##
==========================================
- Coverage   95.02%   95.02%   -0.01%     
==========================================
  Files          30       30              
  Lines        4988     5024      +36     
  Branches     1026     1037      +11     
==========================================
+ Hits         4740     4774      +34     
  Misses        141      141              
- Partials      107      109       +2

Impacted Files	Coverage Δ
PyPDF2/_page.py	`94.36% <100.00%> (+<0.01%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

pubpub-zz · 2022-08-31T17:04:01Z

@MartinThoma
Don't think we will get further : it's ready

PyPDF2/_page.py

MartinThoma · 2022-08-31T20:15:42Z

Thank you for all the great work you put into this 🙏

Version 2.10.5, 2022-09-04 -------------------------- New Features (ENH): - Process XRefStm (#1297) - Auto-detect RTL for text extraction (#1309) Bug Fixes (BUG): - Avoid scaling cropbox twice (#1314) Robustness (ROB): - Fix offset correction in revised PDF (#1318) - Crop data of /U and /O in encryption dictionary to 48 bytes (#1317) - MultiLine bfrange in cmap (#1299) - Cope with 2 digit codes in bfchar (#1310) - Accept '/annn' charset as ASCII code (#1316) - Log errors during Float / NumberObject initialization (#1315) - Cope with corrupted entries in xref table (#1300) Documentation (DOC): - Migration guide (PyPDF2 1.x \xe2\x9e\x94 2.x) (#1324) - Creating a coverage report (#1319) - Fix AnnotationBuilder.free_text example (#1311) - Fix usage of page.scale by replacing it with page.scale_by (#1313) Developer Experience (DEV): - Only run coverage for PyPDF2 Maintenance (MAINT): - PdfReaderProtocol (#1303) - Throw PdfReadError if Trailer can't be read (#1298) - Remove catching OverflowException (#1302) Full Changelog: 2.10.4...2.10.5

includes also reintroduction of py-pdf#1303 wrongly cancelled in py-pdf#1309

pubpub-zz added 2 commits August 31, 2022 16:15

ENH : auto detect RTL for text extraction

533c48c

will fix py-pdf#1296 includes some customization capabilities to extend RTL

Typing

8540e4c

MartinThoma reviewed Aug 31, 2022

View reviewed changes

PyPDF2/_page.py Show resolved Hide resolved

MartinThoma merged commit 7a95708 into py-pdf:main Aug 31, 2022

pubpub-zz deleted the arabicRTL2 branch August 31, 2022 21:47

MasterOdin mentioned this pull request Nov 10, 2022

ENH: Add Cloning #1371

Merged

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this pull request Nov 12, 2022

Rewriting using Protocols

e1c3ed3

includes also reintroduction of py-pdf#1303 wrongly cancelled in py-pdf#1309

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH : auto detect RTL for text extraction #1309

ENH : auto detect RTL for text extraction #1309

pubpub-zz commented Aug 31, 2022

codecov bot commented Aug 31, 2022

pubpub-zz commented Aug 31, 2022

MartinThoma commented Aug 31, 2022

ENH : auto detect RTL for text extraction #1309

ENH : auto detect RTL for text extraction #1309

Conversation

pubpub-zz commented Aug 31, 2022

codecov bot commented Aug 31, 2022

Codecov Report

pubpub-zz commented Aug 31, 2022

MartinThoma commented Aug 31, 2022