-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non ascii characters in SBD mode #86
Comments
I have modified the code of util.py to include arabic, french, punctuation, roman numerals, hindi etc :
` |
Hi, sbd data is m2m (machine-to-machine) communication. So most of the communication will be in binary and without knowledge of the protocol and/or the participating endpoints it is difficult to understand. I don't think blindly printing characters will help with understanding these protocols. If you have concrete examples where this change helps understanding a protocol, please let me know. |
I have got few Short Burst Data msgs when using -m sbd. It does contain msg content which is sent from machine terminal to other over sbd mode. If its ascii, its readable in english. If msg sent in some other language then it prints hex values. PS: Out of 200-300 msgs, only 5-10 msgs contains readable text. Rest all comes in hex. |
I understand where you're coming from. Unfortunately without knowing the code page/encoding mappings like these will just amount to guessing. Case in point: most of your code references codepoints > 255 . which can't happen since the message is parsed byte-wise. Your decoding of the "french" characters works more or less by accident, since the iso-8859-1 standard (which is what I guess is being used in your case) matches the first 256 characters of unicode (which is what chr() uses). I guess decoding/displaying the accented characters of iso-8859-1 would not do much harm, and just be mildly confusing. I'll test it for a bit & see how I feel about it. However implementing speculative decoding of utf-8 (or other multi-byte encodings) is definitely out of scope here. |
reassembler.py in 'sbd' mode decodes ASCII characters to corresponding characters and rest are encoded as hex. This makes most of SBD data as garbled with no meaning. The code snippet from utils.py which converts int values to corresponding ASCII characters is as follows:
if( c>=32 and c<127): str1+=chr(c)
I investigated these hex values and found that they belong to other languages like arabic/french.
str = str.replace(r'\x{e2}\x{80}\x{99}',"'") str = str.replace(r'\x{e2}\x{80}\x{a6}',"…") str = str.replace(r'\x{f4}',"ô") str = str.replace(r'\x{c0}','À') str = str.replace(r'\x{c7}',"Ç") str = str.replace(r'\x{ea}',"ê") str = str.replace(r'\x{f9}',"ù") str = str.replace(r'\x{80}',"€") str = str.replace(r'\x{20}\x{A3}',"₣") str = str.replace(r'\x{c2}',"Â") str = str.replace(r'\x{e8}',"è") str = str.replace(r'\x{c9}',"É") str = str.replace(r'\x{ca}',"Ê")
How can we modify this code to view non ascii characters (french or arabic language). I tried to replace these non ascii hex values to corresponding characters but it is very time consuming. Is there any efficient way to convert these non ascii values to corresponding non english characters?
The text was updated successfully, but these errors were encountered: