-
-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not recognising application/pdf
file types
#228
Comments
First, you could specify the source of such files. What program on which platform (operating system) or what kind of website generated them?
The magic bytes should be |
Hi @jacor84 thanks for the help. As of the source of files I cannot really tell what generated them because I don't know. This issue happens with files coming from different customers and I cannot understand whether there is a pattern, the contents of the files are totally different and nothing indicates that they may have been generated in the same way. The result of
This instead is the result on a file that is not affected by the issue:
With BOM you mean BOM (file format)? 🤔 |
@andrea-del-popolo Sorry, but the hex content you pasted is not file content, but rather raw response with headers. Maybe that's where your problem originates. :-)
The last one is valid, you can see that it starts with magic bytes By BOM I meant Byte Order Mark. |
Now I see what could have gone wrong here. Your buffer should contain only file data, but it also contains HTTP headers. The reason is, there should be two consecutive new line markers (`\n') separating response headers from body, making one empty line. In your response, there is only one.
You should investigate how this happened and fix it there, or find some workaround to get rid of those headers. Then this library will recognize them. |
@andrea-del-popolo I think this case should be closed. |
Hello,
I have some PDF files where file-type fails to recognise the type. I cannot share these files publicly as they contain private information. However these files come from different sources and can be viewed correctly in any pdf viewer that I tried, so they do not appear to be malformed or corrupted in any way.
This is a buffer from one of these files:
And this is how I am using file-type:
Note: same behaviour on versions
10.9.0
and12.0.1
(latest)How can I help without sharing the entire file? 🤷♂
The text was updated successfully, but these errors were encountered: