Skip to content

Conversation

alexanderadam
Copy link

Hi folks, 👋

this patch is actually a fix from @tomhughes.
See also the fix in OSM and Tom also opened issue #128 for that.

I added a minimal bzip file (echo -n '' | bzip2 > test/fixtures/magic/application/x-bzip2/bzip2.bz2) to the fixtures and the fix works just as expected.

PS: I'm looking for a new adventure in case anybody is looking for a Ruby/Rails/Crystal dev
PPS: would you be so kind and add the hacktoberfest-accepted label to this issue in case you find that PR helpful? 🥺

Co-authored-by: Tom Hughes <tom@compton.nu>
@tomhughes
Copy link

tomhughes commented Oct 10, 2025

That's not really the right way to fix it though - what's really needed is regex support in the code and then translating type="regex" rules in tika.xml into regular expression matches.

What I did downstream is a hack to make this one specific case work by expanding the regex to a set of fixed strings.

@alexanderadam
Copy link
Author

alexanderadam commented Oct 10, 2025

Oh, thank you for the super-quick response, @tomhughes 🙏 .

I'm sorry, I'll close this then.
But I'd assume that it should be better than before then at least, no? 🤔

@tomhughes
Copy link

I think those rules are just filling in gaps for things that the tika.xml file (which comes from https://github.com/apache/tika/blob/main/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml) doesn't have a rule for.

@alexanderadam
Copy link
Author

I created #132 now. I think this might be the solution that you were looking for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants