You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey @j256 , I found these issues when I was parsing certain html content that start with BOM, like byte array of "-17, -69, -65, 60, 104, 116, 109, 108, 32" (the first three bytes are UTF-8 BOM and followed by <html tag) or "-1, -2, 60, 0, 104, 0, 116, 0, 109, 0, 108, 0" (the first two bytes are UTF-16 Little-Endian BOM and followed by <html tag), in these cases, the library failed to detect it as text/html, for it to be working, I think we need to fix the issues first and then add proper magic entries, something like
+0 byte 0xEF
+!:mime text/html
+>1 byte 0xBB
+>>2 byte 0xBF UTF-8 Unicode text with BOM
+>>>3 search/1/cb \<html
and
+# UTF-16 LE
+0 byte 0xFF
+!:mime text/html
+>1 byte 0xFE
+>>1 lestring16 \<html Little-endian UTF-16 Unicode text with BOM
I did not include the magic entries in the pull request as I feel those changes are not very generic, it could happen to other types like xml (i.e., different encoding), not too sure about the best solution?
Also I am not too sure lestring16/bestring16 support [Bbc] options or not, the magic5 spec does not say so, but I see lestring16/bestring16 extends from StringTypes, I mean can we do something like lestring16/cb or not?
It would be great if you can take a look and answer my two questions above, thanks a lot!
The text was updated successfully, but these errors were encountered:
From @yongminyan .
Hey @j256 , I found these issues when I was parsing certain html content that start with BOM, like byte array of "-17, -69, -65, 60, 104, 116, 109, 108, 32" (the first three bytes are UTF-8 BOM and followed by <html tag) or "-1, -2, 60, 0, 104, 0, 116, 0, 109, 0, 108, 0" (the first two bytes are UTF-16 Little-Endian BOM and followed by <html tag), in these cases, the library failed to detect it as text/html, for it to be working, I think we need to fix the issues first and then add proper magic entries, something like
and
I did not include the magic entries in the pull request as I feel those changes are not very generic, it could happen to other types like xml (i.e., different encoding), not too sure about the best solution?
Also I am not too sure lestring16/bestring16 support [Bbc] options or not, the magic5 spec does not say so, but I see lestring16/bestring16 extends from StringTypes, I mean can we do something like lestring16/cb or not?
It would be great if you can take a look and answer my two questions above, thanks a lot!
The text was updated successfully, but these errors were encountered: