Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter text ignoring case #371

Open
ARandomPerson07 opened this issue May 26, 2023 · 2 comments · May be fixed by #372
Open

Filter text ignoring case #371

ARandomPerson07 opened this issue May 26, 2023 · 2 comments · May be fixed by #372

Comments

@ARandomPerson07
Copy link

Feature Request

When using the filter_by_text_contains or filter_by_text_equals, it would be nice to be able to have a parameter ignore_case which allows for caseless matching.

For example, currently filtering by "CONTENT" and "Content" returns two different sets of elements.

image

When processing less uniform PDFs, hard-coding the correct case is difficult (most commonly, such headers are either in sentence case or all-caps), and currently the workaround I am using is to chain together or logic with multiple versions of the string:

image

But this seems like a much more inefficient way of going about it than being able to directly access the element texts and using casefold() to match them.

@ARandomPerson07
Copy link
Author

I've edited the filtering.py file on my local machine adding the functionality, seems to work as expected. Might send in a tiny pull request soon.

@jstockwin
Copy link
Owner

Hi @ARandomPerson07,

I agree that this sounds like useful functionality - please do go ahead and submit a PR and I'd be happy to review.

I'd note you can also achieve this functionality with filter_by_regex (docs), to which you can pass the re.IGNORECASE flag.

I believe something like filter_by_regex("contents", regex_flags=re.IGNORECASE) should achieve what you're looking for (although I've not tested it).

Still happy to accept a PR on this, though.

@ARandomPerson07 ARandomPerson07 linked a pull request May 26, 2023 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants