Filter text ignoring case #371

ARandomPerson07 · 2023-05-26T05:09:12Z

Feature Request

When using the filter_by_text_contains or filter_by_text_equals, it would be nice to be able to have a parameter ignore_case which allows for caseless matching.

For example, currently filtering by "CONTENT" and "Content" returns two different sets of elements.

When processing less uniform PDFs, hard-coding the correct case is difficult (most commonly, such headers are either in sentence case or all-caps), and currently the workaround I am using is to chain together or logic with multiple versions of the string:

But this seems like a much more inefficient way of going about it than being able to directly access the element texts and using casefold() to match them.

The text was updated successfully, but these errors were encountered:

ARandomPerson07 · 2023-05-26T07:20:23Z

I've edited the filtering.py file on my local machine adding the functionality, seems to work as expected. Might send in a tiny pull request soon.

jstockwin · 2023-05-26T07:23:31Z

Hi @ARandomPerson07,

I agree that this sounds like useful functionality - please do go ahead and submit a PR and I'd be happy to review.

I'd note you can also achieve this functionality with filter_by_regex (docs), to which you can pass the re.IGNORECASE flag.

I believe something like filter_by_regex("contents", regex_flags=re.IGNORECASE) should achieve what you're looking for (although I've not tested it).

Still happy to accept a PR on this, though.

ARandomPerson07 added the enhancement label May 26, 2023

ARandomPerson07 linked a pull request May 26, 2023 that will close this issue

Ignore case for filter methods #372

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter text ignoring case #371

Filter text ignoring case #371

ARandomPerson07 commented May 26, 2023

ARandomPerson07 commented May 26, 2023

jstockwin commented May 26, 2023

Filter text ignoring case #371

Filter text ignoring case #371

Comments

ARandomPerson07 commented May 26, 2023

ARandomPerson07 commented May 26, 2023

jstockwin commented May 26, 2023