Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text orientation #1071

Closed
matteodefra opened this issue Jul 7, 2022 · 6 comments
Closed

Text orientation #1071

matteodefra opened this issue Jul 7, 2022 · 6 comments
Labels
is-feature A feature request

Comments

@matteodefra
Copy link

Explanation

Hello, I have some PDF documents which contains normal text with portrait orientation and a lateral text on the page side which is oriented in landscape mode.
How can I tell PyPDF2 to extract only text that is oriented in portrait mode as the page orientation and to ignore the landscape text?

I attach an image example
Screenshot from 2022-07-07 12-00-54
Basically what I want is to extract the series of "example" above and ignore the three "example" rotated by 90 degrees

Thank you in advance for your help

@MartinThoma MartinThoma removed their assignment Jul 9, 2022
@MartinThoma
Copy link
Member

Some more evidence that people want this feature: https://stackoverflow.com/q/52530293/562769

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Jul 27, 2022

@MartinThoma, @MasterOdin , @mtd91429,
the current parameters are the following:
extract_text( self, Tj_sep: str = "", TJ_sep: str = "", space_width: float = 200.0) -> str

Tj_sep and TJ_sep are no more used : I would propose to take opportunity of introducing the orientation to remove them:
extract_text( self, orientations : Union[int, Tuple[int]] = (0,90,270,360), space_width: float = 200.0) -> str

Your opinion ?
edited to add a single int as acceptable

@pubpub-zz
Copy link
Collaborator

some examples of calls:
page.extract_text(0) => extract all text strings oriented up
page.extract_text((0,)) => extract all text strings oriented up (synonym)
page.extract_text((0,180)) => extract all text strings oriented up or down

@MartinThoma MartinThoma added the is-feature A feature request label Jul 29, 2022
@MartinThoma
Copy link
Member

Thank you so much @pubpub-zz ! I didn't think that this was possible 😲

@matteodefra We will have a release towmorrow with this change.

@matteodefra
Copy link
Author

Thank you so much @pubpub-zz and @MartinThoma !

@Liin159
Copy link

Liin159 commented Dec 7, 2022

@matteodefra eventually, does this solution work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-feature A feature request
Projects
None yet
Development

No branches or pull requests

4 participants