-
-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve PDF / AI (Adobe Illustrator) recognition #396
Conversation
What I was going to do (but didn't get to due to school and time constraints..) was peek around.. 5mb of data at a time, check if the sequence for AIPrivateData is found, and if so, return. Of course this'd mean that we could peek over the entire file... So, also not an optimal thing, however thanks to Adobe being adobe.. we gotta do it.. |
Sounds good @vladfrangu.
Yeah the embedded metadata looks amazingly bad. Why bothering to put that in if it is always the same? |
I'm aware Also, pro tip: you can skip the entire first metadata part if you parse the header (it includes a length of bytes you can skip to get past the whole xml in pdf ordeal) |
Exactly.
Bingo, en then use |
Processing the "PDF blocks" is easier said then done. The COS ("Carousel" Object Structure) format which PDF is based on, requires semi text line oriented processing. Which is complex and tends to cross the binary format scope boundaries. |
Can you fix the merge conflict? |
Is there any progress on this? Any help needed? |
8020a8d
to
ea97c62
Compare
Done. |
In line with what @vladfrangu suggested, it searched for
AIPrivateData
to detect.ai
(Adobe Illustrator) format.Not a perfect solution:
AIPrivateData
appearing in the content.I removed the
fixture.ai
because I suspect it is truncated. I don't own Adobe Illustrator myself, so I could not test it with that.But it is probably does the job for most cases.
Fix #360