Non ascii characters in SBD mode #86

alphapats · 2022-07-11T14:25:48Z

reassembler.py in 'sbd' mode decodes ASCII characters to corresponding characters and rest are encoded as hex. This makes most of SBD data as garbled with no meaning. The code snippet from utils.py which converts int values to corresponding ASCII characters is as follows:
if( c>=32 and c<127): str1+=chr(c)
I investigated these hex values and found that they belong to other languages like arabic/french.
str = str.replace(r'\x{e2}\x{80}\x{99}',"'") str = str.replace(r'\x{e2}\x{80}\x{a6}',"…") str = str.replace(r'\x{f4}',"ô") str = str.replace(r'\x{c0}','À') str = str.replace(r'\x{c7}',"Ç") str = str.replace(r'\x{ea}',"ê") str = str.replace(r'\x{f9}',"ù") str = str.replace(r'\x{80}',"€") str = str.replace(r'\x{20}\x{A3}',"₣") str = str.replace(r'\x{c2}',"Â") str = str.replace(r'\x{e8}',"è") str = str.replace(r'\x{c9}',"É") str = str.replace(r'\x{ca}',"Ê")
How can we modify this code to view non ascii characters (french or arabic language). I tried to replace these non ascii hex values to corresponding characters but it is very time consuming. Is there any efficient way to convert these non ascii values to corresponding non english characters?

The text was updated successfully, but these errors were encountered:

alphapats · 2022-07-25T19:14:39Z

I have modified the code of util.py to include arabic, french, punctuation, roman numerals, hindi etc :
`
for c in data:
if mask:
c=c&0x7f
if(c>=32 and c<126):
str1+=chr(c)
#elif( c in [128,130,132,135,136,137,138,139,145,146,147,148,149,152,153,154]):
# str1+=chr(c)
elif c in [233, 224, 232, 249, 226, 234, 238, 244, 251, 231, 235, 239, 252]: #french
str1+=chr(c)
#print('french')
elif(c>=8208 and c<=8231): #punctuation
str1+=chr(c)
elif(c>=8240 and c<=8231): #punctuation
str1+=chr(c)
elif(c>=8308 and c<=8334): #superscript
str1+=chr(c)
elif(c>=8531 and c<=8579): #roman
str1+=chr(c)
elif (c >= 1569 and c<=1791): #arabic
str1+=chr(c)
elif (c>=3840 and c<=4047): #tibetan
str1+=chr(c)
elif (c>=8528 and c<=8579): #number
str1+=chr(c)
elif (c>=4096 and c<=4185):#mynamar
str1+=chr(c)
elif(c>=2305 and c<=2416): #hindi
str1+=chr(c)
elif(c>=3584 and c<=3675): #thai
str1+=chr(c)
elif(c>=880 and c<=1011): #greek
str1+=chr(c)
elif(c>=3458 and c<=3572): #sinhala
str1+=chr(c)
elif(c>=8448 and c<=8506): #letterlikesymbol
str1+=chr(c)
else:
if dot:
str1+="."
elif escape:
if c==0x0d:
str1+='\r'
elif c==0x0a:
str1+='\n'
else:

                str1+='\\x{%02x}'%c    
        else:
            str1+="[%02x]"%c

`

Sec42 · 2022-07-28T20:10:42Z

Hi,

sbd data is m2m (machine-to-machine) communication. So most of the communication will be in binary and without knowledge of the protocol and/or the participating endpoints it is difficult to understand.

I don't think blindly printing characters will help with understanding these protocols.

If you have concrete examples where this change helps understanding a protocol, please let me know.

alphapats · 2022-07-29T16:57:26Z

I have got few Short Burst Data msgs when using -m sbd. It does contain msg content which is sent from machine terminal to other over sbd mode. If its ascii, its readable in english. If msg sent in some other language then it prints hex values.
04-06-2022T17:39:46,DL,<26:02:5b:01:00:47:96>,\x{87})C*\x{d9}#I\x{e2}€\x{99}ll check now. Yesterday was 118Q\x{01}R\x{01}U\x{d3}\x{00}\x{00}\x{01}\x{81}.\x{9e}\x{97}\x{bb}C\x{c4}\x{06}\x{13}\x{07}i\x{04}\x{83}O@\x{c4}\x{06}\x{17} pSx\x{8f} 04-06-2022T17:44:08,DL,<26:02:5c:02:00:19:cf>,\x{87})C*\x{d9}\x{d1}145 opened 21 clicked on various links but some of those links were to Wikipedia.. so approx 15 clicked on actual trips. 2 unsubscribed. I guess we will not know the results until you can check your mailbox Q\x{01}R\x{01}U\x{d3}\x{00}\x{00}\x{01}\x{81}.\x{a2}J\x{b4}C\x{c4}\x{06}\x{13}\x{07}i\x{04}\x{83}O@\x{c4}\x{06}\x{17} pSx\x{8f}
Above example is in english language, I also found out some msgs which were in french/spanish. So msg was readable for those spanish/ french characters which were common in english (falls in ascii range) for rest, it was showing hex so I tried to replace hex with its corresponding french/spanish character and i was able to get complete message.

PS: Out of 200-300 msgs, only 5-10 msgs contains readable text. Rest all comes in hex.

Sec42 · 2022-12-04T21:54:44Z

I understand where you're coming from. Unfortunately without knowing the code page/encoding mappings like these will just amount to guessing.

Case in point: most of your code references codepoints > 255 . which can't happen since the message is parsed byte-wise.

Your decoding of the "french" characters works more or less by accident, since the iso-8859-1 standard (which is what I guess is being used in your case) matches the first 256 characters of unicode (which is what chr() uses).

I guess decoding/displaying the accented characters of iso-8859-1 would not do much harm, and just be mildly confusing. I'll test it for a bit & see how I feel about it.

However implementing speculative decoding of utf-8 (or other multi-byte encodings) is definitely out of scope here.

Sec42 closed this as completed Dec 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non ascii characters in SBD mode #86

Non ascii characters in SBD mode #86

alphapats commented Jul 11, 2022 •

edited

Loading

alphapats commented Jul 25, 2022

Sec42 commented Jul 28, 2022

alphapats commented Jul 29, 2022 •

edited

Loading

Sec42 commented Dec 4, 2022

Non ascii characters in SBD mode #86

Non ascii characters in SBD mode #86

Comments

alphapats commented Jul 11, 2022 • edited Loading

alphapats commented Jul 25, 2022

Sec42 commented Jul 28, 2022

alphapats commented Jul 29, 2022 • edited Loading

Sec42 commented Dec 4, 2022

alphapats commented Jul 11, 2022 •

edited

Loading

alphapats commented Jul 29, 2022 •

edited

Loading